Skip to content

menzhik/transcript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

transcript

Transcribe a video or audio file with the OpenAI Whisper API, in one command.

ffmpeg strips the audio down to 16 kHz mono MP3 at 32 kbps before upload. Whisper resamples everything to 16 kHz mono internally anyway, so this doesn't cost quality — it just makes uploads small. A 1-hour recording ends up around 14 MB. If it's still over the API's 25 MB limit, the script splits into 10-minute chunks and concatenates the results.

Setup

git clone git@github.com:menzhik/transcript.git
chmod +x transcript/transcript
ln -s "$PWD/transcript/transcript" ~/.local/bin/transcript   # or /usr/local/bin

Put your key somewhere your shell loads — ~/.zshrc / ~/.bashrc:

export OPENAI_API_KEY=sk-...

Usage

transcript <input> [-f txt|srt|vtt|json|verbose_json] [-l <lang>] [-o <out>] [-k]

Examples:

transcript talk.mp4              # -> talk.txt
transcript talk.mp4 -f srt       # -> talk.srt (subtitles)
transcript talk.mp4 -l pl        # language hint: faster and more accurate
transcript talk.mp4 -k           # keep the intermediate .mp3

Cost

Whisper API is $0.006 per minute of audio. One hour ≈ $0.36.

Caveats

  • Needs ffmpeg, curl, bash, and a valid OPENAI_API_KEY.
  • The chunked path splits on a fixed 10-minute boundary, so a word spoken across a boundary can get clipped. Fine for notes, not ideal for legal transcripts.
  • Error output from the API is dumped straight into the output file. If your transcript starts with {"error":, that's what happened.

Credits

This is a thin wrapper around work done by other people:

About

Audio/video to text transcription. Bash + ffmpeg + whisper

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages