Skills/Speech to Text/Transcribe audio

Transcribe audio

~2 credits / minute (up to 45)

Convert the spoken words in an audio file to text. Accepts MP3, M4A, WAV, FLAC, OGG, and other common formats. Language is detected automatically, so there is no need to specify it. Returns the full transcript with punctuation and capitalization, along with the audio duration.

Use when

You have a recording, voice note, call, meeting, or any audio and want the words as text.

Not for

Text-to-speech, translation, or per-speaker diarization.

Cost

~2 credits / minute (up to 45)

1.5 credits per minute of audio. A few-minute clip is the usual case.

Estimated; the actual charge depends on your input and is shown in the response.

What it accepts

Set these inside the intent when you run it.

filerequired

The audio file to transcribe. Provide a URL to the audio file.

What you get back

The full text transcript of the audio, with punctuation and capitalization, plus the audio duration.

Run it

Run this sub-skill directly: pin it with operation and pass its inputs in the intent. (Omit operation and the Speech to Text skill will route from your intent instead.)

curl -X POST "https://skill.askfaro.com/skills/speech-to-text/run" \
  -H "Authorization: Bearer $FARO_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"intent":{"operation":"transcribe","file":"https://example.com/meeting-recording.mp3"}}'

Example requests

  • Transcribe this voice note for me.
  • Convert this meeting recording to text.
  • What does this audio file say?