Add transcription to your app in 5 minutes

Drop the transcribe.so API into AI agents, video editors, meeting bots, call-center pipelines, voice memo apps — anywhere you need accurate speech-to-text without managing models or workers. Same engine as our dashboard, accessed with one Bearer token. Sign up gets $0.50 in trial credit; no card required.

No credit card required.·Pay only for what you use.

See it in action

Real output from a real transcription

Browse chapters, ask questions, and explore search results from an actual transcript.

44 Harsh Truths About The Game Of Life - Naval Ravikant (4K)
Chris Williamson
Contents
8 chapters · 513 topics
1Happiness Versus Success: Philosophical Reflections on Contentment, Desire, and Motivation
2Optimizing Sleep: Smart Temperature Regulation and the Foundations of Self-Esteem
3Decisive Action and Iterative Practice: Keys to Optimal Choices and Mastery
4Wealth Management: From Materialism to Value Creation and Fair Compensation
5Evaluating LLMs: Capabilities, Limitations, and Their Role in AI's Evolving Landscape
6Pathogens, Evolution, and Knowledge: How Humans Adapt and Defend
7Agency, Power, and the Individual: From Child Development to Cultural Conflict
8Unseen Trends: Media Oversights, Medical Limitations, and the Primitive State of Modern Biology
Q&A preview
Answer
Naval explains two distinct paths to happiness using the story of Alexander and Diogenes. The first path is through success—conquering the world, satisfying material needs, and getting what you want. The second path, exemplified by Diogenes living in a barrel, is simply not wanting in the first place. As Socrates said when shown luxuries: 'How many things there are in this world that I do not want.' Naval suggests not wanting something is as good as having it—both paths lead to the same destination of contentment [00:38–01:10]. He's not sure which path is more valid, noting it depends on how you define success [01:10–01:25].

Command Palette

Search for a command to run...

The audio integration problem

  • Whisper API hits 25 MB file caps and rate limits the moment you scale past hobby use
  • AssemblyAI and similar managed APIs charge per minute and tax multilingual workloads heavily
  • Self-hosted Whisper / Whisper-large eats GPU budget and breaks on long files or non-English audio
  • Polling every few seconds blocks workers and burns rate-limit budget you'd rather spend on real traffic

What you get from the transcribe.so API

One Bearer token, three input shapes

POST a YouTube URL, an external audio link, or a file you upload via presigned S3 PUT. Same response shape regardless. No SDK required to start; pure HTTP.

Webhooks, not polling

Register a URL once and we POST you when transcription completes. HMAC-signed (Stripe-style verification). Auto-retry with exponential backoff. Works with Cloudflare Workers, Lambda, n8n, anywhere.

Multilingual without compromise

Qwen3-ASR-Flash covers 50+ languages plus 22 Chinese dialects. GPT-4o-transcribe-diarize for speaker labels. Voxtral for cost-sensitive batch jobs. Pick per request.

Per-minute pricing, no minimums

Wallet-funded; same per-minute rate as the dashboard. No commits. No surprises. Per-key spend visibility — share a key with a teammate and see exactly what it cost.

Built-in chapters, topics, and Q&A with citations

/result returns segments + chapters + topics + cited Q&A — not just a wall of text. Skip the post-processing pipeline you'd otherwise build on top of raw ASR output.

Idempotent POSTs, structured errors, request_id on every response

Idempotency-Key header. Stripe-style error envelope with code, message, request_id, and doc_url pointing to the relevant docs section. Predictable retries; debuggable failures.

What people use this for

  • AI agents that read audio — drop a transcript into your LLM context and let it reason over hours of recordings
  • Meeting bots — transcribe Zoom/Twilio recordings into searchable notes the moment a call ends
  • Voice memo apps on iPhone or Android — turn raw audio into journals with auto-generated chapters and topics
  • Podcast pipelines — auto-process new episodes from RSS into show notes
  • Video editors — generate burn-in captions with word-level timestamps, export SRT/VTT directly
  • Language learning apps — accurate transcripts for shadowing and dictation drills
  • Customer support — surface call topics and follow-ups automatically from recorded calls
  • Journalist workflows — drop interview audio in, get back chapters, quotes, and a searchable archive

FAQ

Frequently asked questions

Same per-minute rate as our dashboard ($0.0362/min on Qwen3-ASR-Flash today, lower than AssemblyAI Best for the same accuracy band). Wallet-funded — pay only for what you transcribe, no monthly commit. No file-size caps; we presign uploads up to 500MB.

Sign up grants $0.50 in trial credit — enough to transcribe ~14 minutes on Qwen3-ASR-Flash. No card required to start. After that, top up your wallet from the billing page.

Yes. Bearer auth means no cookies/CSRF. CORS is open on every /api/v1/* endpoint. Webhooks remove the need for long-running polling. Works from Cloudflare Workers, Lambda, Vercel, n8n, edge functions — anything that can make an HTTP call.

Qwen3-ASR-Flash supports 50+ languages and 22 Chinese dialects out of the box. GPT-4o-transcribe-diarize supports 30+ with speaker identification. Voxtral covers 9 European languages at the lowest price tier. Set language: 'auto' to let us detect.

Every modern agent framework can call HTTP APIs directly with a Bearer token. Cursor and Claude Code can invoke curl from their tool-use modes. ChatGPT Custom GPTs use OpenAPI Actions (spec coming soon). MCP server in roadmap for Claude Desktop. Recipes in the API reference.

POST /api/v1/transcriptions returns 402 insufficient_funds with a doc_url pointing to the billing page. Existing in-flight jobs complete normally. Top up and retry — idempotency keys prevent duplicate submissions.

X-Transcribe-Signature header is t=<unix-seconds>,v1=<hmac_sha256(secret, '${t}.${rawBody}')>. Verify on the raw body. Code examples in TypeScript and Python at /developers/docs#webhooks.

500 MB per file via the upload endpoint. No length cap — we chunk long audio internally. A 4-hour podcast transcribes in 4-8 minutes wall time on Qwen3-ASR-Flash.

Want a deeper comparison? Read the launch announcement →

Ship it today

Create a key, paste it into your script, and you're transcribing inside a minute. The dashboard shows per-key spend, lets you configure webhooks, and rotates keys when needed.