Add transcription to your app in 5 minutes

Drop the transcribe.so API into AI agents, video editors, meeting bots, call-center pipelines, voice memo apps — anywhere you need accurate speech-to-text without managing models or workers. Same engine as our dashboard, accessed with one Bearer token. Sign up gets $0.50 in trial credit; no card required.

No credit card required.·Pay only for what you use.

See it in action

Real output from a real transcription

Browse chapters, ask questions, and explore search results from an actual transcript.

44 Harsh Truths About The Game Of Life - Naval Ravikant (4K)

Chris Williamson

Contents

8 chapters · 513 topics

1Happiness Versus Success: Philosophical Reflections on Contentment, Desire, and Motivation

2Optimizing Sleep: Smart Temperature Regulation and the Foundations of Self-Esteem

3Decisive Action and Iterative Practice: Keys to Optimal Choices and Mastery

4Wealth Management: From Materialism to Value Creation and Fair Compensation

5Evaluating LLMs: Capabilities, Limitations, and Their Role in AI's Evolving Landscape

6Pathogens, Evolution, and Knowledge: How Humans Adapt and Defend

7Agency, Power, and the Individual: From Child Development to Cultural Conflict

8Unseen Trends: Media Oversights, Medical Limitations, and the Primitive State of Modern Biology

Q&A preview

Answer

Naval explains two distinct paths to happiness using the story of Alexander and Diogenes. The first path is through success—conquering the world, satisfying material needs, and getting what you want. The second path, exemplified by Diogenes living in a barrel, is simply not wanting in the first place. As Socrates said when shown luxuries: 'How many things there are in this world that I do not want.' Naval suggests not wanting something is as good as having it—both paths lead to the same destination of contentment [00:38–01:10]. He's not sure which path is more valid, noting it depends on how you define success [01:10–01:25].

The audio integration problem

Whisper API hits 25 MB file caps and rate limits the moment you scale past hobby use
AssemblyAI and similar managed APIs charge per minute and tax multilingual workloads heavily
Self-hosted Whisper / Whisper-large eats GPU budget and breaks on long files or non-English audio
Polling every few seconds blocks workers and burns rate-limit budget you'd rather spend on real traffic

What you get from the transcribe.so API

One Bearer token, three input shapes

POST a YouTube URL, an external audio link, or a file you upload via presigned S3 PUT. Same response shape regardless. No SDK required to start; pure HTTP.

Webhooks, not polling

Register a URL once and we POST you when transcription completes. HMAC-signed (Stripe-style verification). Auto-retry with exponential backoff. Works with Cloudflare Workers, Lambda, n8n, anywhere.

Multilingual without compromise

Qwen3-ASR-Flash covers 50+ languages plus 22 Chinese dialects. GPT-4o-transcribe-diarize for speaker labels. Voxtral for cost-sensitive batch jobs. Pick per request.

Per-minute pricing, no minimums

Wallet-funded; same per-minute rate as the dashboard. No commits. No surprises. Per-key spend visibility — share a key with a teammate and see exactly what it cost.

Built-in chapters, topics, and Q&A with citations

/result returns segments + chapters + topics + cited Q&A — not just a wall of text. Skip the post-processing pipeline you'd otherwise build on top of raw ASR output.

Idempotent POSTs, structured errors, request_id on every response

Idempotency-Key header. Stripe-style error envelope with code, message, request_id, and doc_url pointing to the relevant docs section. Predictable retries; debuggable failures.

What people use this for

AI agents that read audio — drop a transcript into your LLM context and let it reason over hours of recordings
Meeting bots — transcribe Zoom/Twilio recordings into searchable notes the moment a call ends
Voice memo apps on iPhone or Android — turn raw audio into journals with auto-generated chapters and topics
Podcast pipelines — auto-process new episodes from RSS into show notes
Video editors — generate burn-in captions with word-level timestamps, export SRT/VTT directly
Language learning apps — accurate transcripts for shadowing and dictation drills
Customer support — surface call topics and follow-ups automatically from recorded calls
Journalist workflows — drop interview audio in, get back chapters, quotes, and a searchable archive

FAQ

Frequently asked questions

Same per-minute rate as our dashboard ($0.0362/min on Qwen3-ASR-Flash today, lower than AssemblyAI Best for the same accuracy band). Wallet-funded — pay only for what you transcribe, no monthly commit. No file-size caps; we presign uploads up to 500MB.

Sign up grants $0.50 in trial credit — enough to transcribe ~14 minutes on Qwen3-ASR-Flash. No card required to start. After that, top up your wallet from the billing page.

Yes. Bearer auth means no cookies/CSRF. CORS is open on every /api/v1/* endpoint. Webhooks remove the need for long-running polling. Works from Cloudflare Workers, Lambda, Vercel, n8n, edge functions — anything that can make an HTTP call.

Qwen3-ASR-Flash supports 50+ languages and 22 Chinese dialects out of the box. GPT-4o-transcribe-diarize supports 30+ with speaker identification. Voxtral covers 9 European languages at the lowest price tier. Set language: 'auto' to let us detect.

Every modern agent framework can call HTTP APIs directly with a Bearer token. Cursor and Claude Code can invoke curl from their tool-use modes. ChatGPT Custom GPTs use OpenAPI Actions (spec coming soon). MCP server in roadmap for Claude Desktop. Recipes in the API reference.

POST /api/v1/transcriptions returns 402 insufficient_funds with a doc_url pointing to the billing page. Existing in-flight jobs complete normally. Top up and retry — idempotency keys prevent duplicate submissions.

X-Transcribe-Signature header is t=<unix-seconds>,v1=<hmac_sha256(secret, '${t}.${rawBody}')>. Verify on the raw body. Code examples in TypeScript and Python at /developers/docs#webhooks.

500 MB per file via the upload endpoint. No length cap — we chunk long audio internally. A 4-hour podcast transcribes in 4-8 minutes wall time on Qwen3-ASR-Flash.

Want a deeper comparison? Read the launch announcement →

Ship it today

Create a key, paste it into your script, and you're transcribing inside a minute. The dashboard shows per-key spend, lets you configure webhooks, and rotates keys when needed.

Add transcription to your app in 5 minutes

Real output from a real transcription

Command Palette

The audio integration problem

What you get from the transcribe.so API

One Bearer token, three input shapes

Webhooks, not polling

Multilingual without compromise

Per-minute pricing, no minimums

Built-in chapters, topics, and Q&A with citations

Idempotent POSTs, structured errors, request_id on every response

What people use this for

Frequently asked questions

How is pricing different from Whisper API or AssemblyAI?

Is there a free tier to test it?

Can I use this from a serverless function?

What languages do you support?

How do I integrate with my AI agent — Claude, ChatGPT, Cursor?

What happens when my wallet runs out?

How do I verify webhook signatures?

What's the largest file or longest audio I can submit?

Ship it today