Transcribe a video or audio file with the OpenAI Whisper API, in one command.
ffmpeg strips the audio down to 16 kHz mono MP3 at 32 kbps before upload. Whisper resamples everything to 16 kHz mono internally anyway, so this doesn't cost quality — it just makes uploads small. A 1-hour recording ends up around 14 MB. If it's still over the API's 25 MB limit, the script splits into 10-minute chunks and concatenates the results.
git clone git@github.com:menzhik/transcript.git
chmod +x transcript/transcript
ln -s "$PWD/transcript/transcript" ~/.local/bin/transcript # or /usr/local/binPut your key somewhere your shell loads — ~/.zshrc / ~/.bashrc:
export OPENAI_API_KEY=sk-...transcript <input> [-f txt|srt|vtt|json|verbose_json] [-l <lang>] [-o <out>] [-k]
Examples:
transcript talk.mp4 # -> talk.txt
transcript talk.mp4 -f srt # -> talk.srt (subtitles)
transcript talk.mp4 -l pl # language hint: faster and more accurate
transcript talk.mp4 -k # keep the intermediate .mp3Whisper API is $0.006 per minute of audio. One hour ≈ $0.36.
- Needs
ffmpeg,curl,bash, and a validOPENAI_API_KEY. - The chunked path splits on a fixed 10-minute boundary, so a word spoken across a boundary can get clipped. Fine for notes, not ideal for legal transcripts.
- Error output from the API is dumped straight into the output file. If your transcript starts with
{"error":, that's what happened.
This is a thin wrapper around work done by other people:
- FFmpeg — does all the audio extraction and encoding.
- OpenAI Whisper — the speech recognition model, accessed via the OpenAI API.