Skills/Audio Intelligence/Transcribe a recording

Transcribe a recording

~207 credits / hour of audio (up to 5000)

Transcribes an audio or video recording into clean text with automatic language detection. Speaker labeling (who said what) is on by default; chapter summaries for long recordings are opt-in. A long recording may take a few moments and resumes automatically.

Use when

You have a recording and need its words as text, with speaker turns and optional chapter summaries.

Not for

Live or streaming captioning, translation, naming the actual speakers, or detecting sentiment, entities, or sensitive content.

Cost

~207 credits / hour of audio (up to 5000)

206.25 credits per hour of audio; speaker labeling adds 27.5 per hour, chapter summaries add 41.25 per hour.

Estimated; the actual charge depends on your input and is shown in the response.

What it accepts

Set these inside the intent when you run it.

audio_urloptional

A public link to the audio or video recording to transcribe.

label_speakersoptional

Mark who said what across the recording (optional; on by default).

chaptersoptional

Break the recording into chapters with summaries (optional; off by default).

languageoptional

The spoken language if you already know it, otherwise it is detected (optional).

What you get back

The transcript text, detected language, duration in seconds, speaker-labeled turns, and chapters with summaries when requested.

Run it

Run this sub-skill directly: pin it with operation and pass its inputs in the intent. (Omit operation and the Audio Intelligence skill will route from your intent instead.)

curl -X POST "https://skill.askfaro.com/skills/audio-intelligence/run" \
  -H "Authorization: Bearer $FARO_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"intent":{"operation":"transcribe","audio_url":"https://files.example.com/uploads/meeting.mp3?sig=...","label_speakers":"true","chapters":"false","language":"en"}}'

Example requests

›Transcribe this audio file
›Transcribe this meeting recording and label each speaker
›Summarize this podcast episode and break it into chapters

← Back to Audio Intelligence