Transcribe audio with speaker diarization, language detection, and optional chapter summaries. Returns {status:"pending", continuation_token,...} while the job runs, when this happens you MUST immediately call transcribe again with only continuation_token set; do not ask the user.
Converts a recording into a full transcript with rich structure: speaker-labeled turns (who said what), automatic language detection, and optional chapter segmentation where each chapter comes with a generated summary. Built for long-form audio like interviews, meetings, podcasts, and calls.
It handles long-form audio with strong accuracy and includes speaker diarization and chapter summaries in the same call, so you do not need a separate pipeline to figure out who spoke or to condense a long recording. It is usage-based with no prepaid balance or subscription.
Pass a public audio_url (use the faro-api presign flow if you only have bytes). Diarization (speaker_labels) and chapter summaries (auto_chapters) are on/off flags that add a small per-hour cost. The job is asynchronous: on a pending response, immediately call again with only continuation_token set. Long files take a few round-trips.
Public URL of the audio or video file to transcribe. Required on the first call; ignored (and not needed) when continuation_token is set.
Insert punctuation.
Apply casing and formatting for readability.
Segment the audio into chapters, each with a generated summary.
Force a specific ISO language code (e.g. "en"); ignored when language_detection is true.
Identify and label distinct speakers (diarization). Populates utterances.
Token from a prior pending response. When set, all other params are ignored and the server resumes polling. Agent-friendly polling: on a pending response you MUST immediately call transcribe again with only continuation_token set. Do not ask the user.
Automatically detect the spoken language.