Introducing the transcribe.so API: speech-to-text as a Bearer token
You can now transcribe audio with transcribe.so without ever opening the dashboard.
curl -X POST https://transcribe.so/api/v1/transcriptions \
-H "Authorization: Bearer tsk_live_..." \
-H "Content-Type: application/json" \
-d '{
"source": "external_url",
"url": "https://example.com/podcast.mp3",
"pipeline_code": "qwen3-asr-flash-filetrans"
}'
That's the whole thing. One Bearer token, one POST, you're transcribing.
Why an API
We built transcribe.so for people who care about transcript quality — meeting notes, podcasts, courseware, voice memos. But the more we shipped the dashboard, the more we kept hearing the same request from the people who use it most:
"I love the output. I just don't want to drag a file into a browser tab forty times a day."
Fair. Some workflows want a UI. Most automation wants a curl.
The use cases we've heard so far:
- Pipelines that ingest podcasts — auto-transcribe new episodes the moment they hit your S3 bucket.
- Meeting bots — transcribe Twilio recordings or Zoom dumps as they come in, no human in the loop.
- Journalist workflows — drop interview audio in a folder, get back a searchable transcript with chapters.
- Voice-memo automations — your phone records, your laptop transcribes, your second brain stores it.
So here it is.
What's the same as the dashboard
Almost everything.
- Same models. Qwen3-ASR-Flash, GPT-4o-transcribe-diarize, Voxtral. Pick the one that fits the request — pass
pipeline_codeand you're set. - Same per-minute pricing. No "API tier" markup. The
$X/minyou see on the pricing page is exactly what an API call costs. - Same wallet. Monthly credit drains first; top-up balance covers the rest. If you've got a Pro membership, that 30%-off applies to API calls just like it does to web jobs.
- Same downstream pipeline. Topics, chapters, summaries, semantic search, Q&A with citations — all of it lives behind
GET /transcriptions/:id/result. You're not getting a stripped-down API output. You're getting the full pipeline.
The whole point of the API is that we didn't fork it. The Bearer token swaps in for the session cookie, and you call the same code path the web UI does.
What's different
The handful of things you'd expect.
Bearer auth. Every request carries Authorization: Bearer tsk_live_.... No cookies, no CSRF, no SDK, no setup beyond pasting a key into your env. Keys live at transcribe.so/settings/api-keys, and we show the plaintext exactly once.
Async with polling. Submit, get back a transcription ID and status: "processing", then poll GET /transcriptions/:id until you see "completed" or "failed". We tried to make this honest: the same statuses you see on the dashboard are the statuses the API returns. (Webhooks ship next — you won't have to poll forever.)
Idempotency-Key header. Send a UUID and we'll cache the response for 24 hours. Retrying on a network blip won't double-charge you or queue the job twice. Standard Stripe pattern.
Per-key visibility. Each key shows you this month: $X.XX and all time: $X.XX on the dashboard. If you hand a key to a teammate or paste one into a script you're not sure about, you can see exactly what it spent. Revoke one and the others keep working.
We deliberately didn't ship per-key spending caps in v1. The wallet itself is the cap — you can't spend money you don't have, and an account-level monthly hard cap is the cleaner pattern when we get there. (If your team needs per-key caps before then, tell us.)
A worked example
Here's a Python script that watches a folder, transcribes every new audio file by URL, and saves the JSON result next to it. It's small enough to read in one breath:
import os, time, requests, json, pathlib
API = "https://transcribe.so/api/v1"
HEADERS = {"Authorization": f"Bearer {os.environ['TRANSCRIBE_API_KEY']}"}
def transcribe(url: str) -> dict:
r = requests.post(f"{API}/transcriptions", headers=HEADERS, json={
"source": "external_url",
"url": url,
"pipeline_code": "qwen3-asr-flash-filetrans",
})
r.raise_for_status()
job_id = r.json()["id"]
while True:
s = requests.get(f"{API}/transcriptions/{job_id}", headers=HEADERS).json()
if s["status"] in ("completed", "failed"):
break
time.sleep(3)
if s["status"] == "failed":
raise RuntimeError(f"transcription failed: {s.get('error')}")
return requests.get(f"{API}/transcriptions/{job_id}/result", headers=HEADERS).json()
if __name__ == "__main__":
out = transcribe("https://example.com/podcast.mp3")
pathlib.Path("transcript.json").write_text(json.dumps(out, indent=2))
print(f"saved {len(out['segments'])} segments")
Six lines of glue, full transcript with chapters and topics on disk. The same shape works in a Cloudflare Worker (Twilio webhook → API call → write to KV), a GitHub Action (new podcast episode in a release → transcribe → comment on the PR), or a long-running n8n flow.
What's next
The roadmap, in priority order:
- File uploads. v1 ships with external-URL transcription. Direct uploads via a presigned S3 PUT land next — useful when you don't want to host the audio yourself.
- Webhooks.
transcription.completedandtranscription.failed, signed with HMAC, with exponential-backoff retries. Polling works; webhooks are nicer. - OpenAPI spec + SDKs. Once the surface stops moving, we'll publish a proper OpenAPI 3.1 spec and generate first-party Python and TypeScript SDKs.
- Account-level monthly cap. A single "don't let the whole account spend more than $X this month" hard limit. Applies equally to web UI and API.
If you have a use case that doesn't fit any of the above, we want to hear it. The API is going to be shaped by what people actually build with it.
Until then — grab a key, paste it into your script, and let us know what you ship.