Add transcription to your app in 5 minutes
Drop the transcribe.so API into AI agents, video editors, meeting bots, call-center pipelines, voice memo apps — anywhere you need accurate speech-to-text without managing models or workers. Same engine as our dashboard, accessed with one Bearer token. Sign up gets $0.50 in trial credit; no card required.
No credit card required.·Pay only for what you use.
See it in action
Real output from a real transcription
Browse chapters, ask questions, and explore search results from an actual transcript.
Command Palette
Search for a command to run...
The audio integration problem
- Whisper API hits 25 MB file caps and rate limits the moment you scale past hobby use
- AssemblyAI and similar managed APIs charge per minute and tax multilingual workloads heavily
- Self-hosted Whisper / Whisper-large eats GPU budget and breaks on long files or non-English audio
- Polling every few seconds blocks workers and burns rate-limit budget you'd rather spend on real traffic
What you get from the transcribe.so API
One Bearer token, three input shapes
POST a YouTube URL, an external audio link, or a file you upload via presigned S3 PUT. Same response shape regardless. No SDK required to start; pure HTTP.
Webhooks, not polling
Register a URL once and we POST you when transcription completes. HMAC-signed (Stripe-style verification). Auto-retry with exponential backoff. Works with Cloudflare Workers, Lambda, n8n, anywhere.
Multilingual without compromise
Qwen3-ASR-Flash covers 50+ languages plus 22 Chinese dialects. GPT-4o-transcribe-diarize for speaker labels. Voxtral for cost-sensitive batch jobs. Pick per request.
Per-minute pricing, no minimums
Wallet-funded; same per-minute rate as the dashboard. No commits. No surprises. Per-key spend visibility — share a key with a teammate and see exactly what it cost.
Built-in chapters, topics, and Q&A with citations
/result returns segments + chapters + topics + cited Q&A — not just a wall of text. Skip the post-processing pipeline you'd otherwise build on top of raw ASR output.
Idempotent POSTs, structured errors, request_id on every response
Idempotency-Key header. Stripe-style error envelope with code, message, request_id, and doc_url pointing to the relevant docs section. Predictable retries; debuggable failures.
What people use this for
- AI agents that read audio — drop a transcript into your LLM context and let it reason over hours of recordings
- Meeting bots — transcribe Zoom/Twilio recordings into searchable notes the moment a call ends
- Voice memo apps on iPhone or Android — turn raw audio into journals with auto-generated chapters and topics
- Podcast pipelines — auto-process new episodes from RSS into show notes
- Video editors — generate burn-in captions with word-level timestamps, export SRT/VTT directly
- Language learning apps — accurate transcripts for shadowing and dictation drills
- Customer support — surface call topics and follow-ups automatically from recorded calls
- Journalist workflows — drop interview audio in, get back chapters, quotes, and a searchable archive
FAQ
Frequently asked questions
Want a deeper comparison? Read the launch announcement →
Ship it today
Create a key, paste it into your script, and you're transcribing inside a minute. The dashboard shows per-key spend, lets you configure webhooks, and rotates keys when needed.