Introducing Qwen3-ASR-Flash: Top-Tier AI Transcription with Leaderboard-Leading Accuracy

Transcribe.so(Updated May 19, 2026)
qwen3 transcriptionbest asr modelchinese dialect transcriptionspeech to textai transcriptionqwen3-asr-flashalibaba qwenopen asr leaderboard

We've added a second transcription pipeline: Qwen3-ASR-Flash from Alibaba — ranked #1 on the HuggingFace Open ASR Leaderboard with a 4.25% average Word Error Rate. It runs alongside our existing GPT-4o Transcribe Diarize pipeline.

Both pipelines include the same downstream AI analysis — sections, chapters, summaries, semantic search, and Q&A with citations. The difference is in the transcription step itself.

Why Two Pipelines?

Not every transcription job needs the same model. A 3-hour lecture by a single speaker has different needs than a 20-minute podcast with 4 guests.

  • GPT-4o Transcribe Diarize is the best choice when you need speaker identification (diarization).
  • Qwen3-ASR-Flash is the best choice for top-tier accuracy, long single-speaker audio, word-level timestamps, or Chinese dialect support.

Head-to-Head Comparison

FeatureGPT-4o DiarizeQwen3-ASR-Flash
ProviderOpenAIAlibaba (Qwen)
Speaker diarizationYesNo
Timestamp typeSegment-levelSentence + word-level (10 langs)
Languages5733 + 22 Chinese dialects
Max audio durationUnlimited (chunked)12 hours native
Emotion detectionNoYes
Price (Free tier)$3.88/hr$1.71/hr
Price (Pro tier)~$3.18/hr~$1.40/hr

Where the Cost Difference Comes From

The Qwen3 transcription API is priced lower than GPT-4o ($0.13/hr vs $1.80/hr). The rest of the pipeline (LLM processing, semantic search embeddings, and infrastructure) is shared between both pipelines.

ComponentGPT-4o PipelineQwen3 Pipeline
Transcription$1.80/hr$0.13/hr
LLM analysis$0.48/hr$0.48/hr
Embeddings$0.06/hr$0.06/hr
Infrastructure$1.00/hr$1.00/hr
Provider total$3.34/hr$1.67/hr

When to Use Each Pipeline

Choose GPT-4o Diarize when:

  • You have multiple speakers (meetings, interviews, podcasts). For podcast transcription best practices, see the podcast guide.
  • You need speaker labels in your transcript
  • Maximum transcription accuracy matters most
  • You're working in languages where GPT-4o excels

Choose Qwen3-ASR-Flash when:

  • You want leaderboard-leading accuracy (#1 on HuggingFace Open ASR Leaderboard)
  • You have long recordings (lectures, webinars, audiobooks) — try chapters for long recordings
  • The audio has a single speaker or you don't need speaker labels
  • You need word-level timestamps for precise subtitle generation
  • You need Chinese dialect support (Cantonese, Sichuanese, and 20 more)
  • You want emotion detection in the transcript

Chinese Dialect Support

Qwen3-ASR-Flash stands out with support for 22 Chinese dialects including Cantonese, Sichuanese, Fujian, Henan, Hubei, and more — far beyond what most transcription services offer. If your content includes regional Chinese speech, this pipeline is a significant upgrade.

Same AI Analysis, Different Starting Point

Regardless of which pipeline you choose, every transcription gets the same enrichment:

  • Section detection using advanced NLP
  • Chapter generation with titles and summaries
  • Semantic search across your transcript
  • Q&A with citations that link to exact timestamps
  • AI summary with takeaways, quotes, and speaker profiles

How to Use It

  1. Go to Transcribe
  2. In the pipeline selector, choose Qwen3-ASR-Flash
  3. Upload your file or paste a YouTube URL
  4. Review the quote — you'll see the lower rate automatically

You can switch between pipelines for each transcription. Use GPT-4o for your podcast interviews and Qwen3 for your lecture recordings.

Pricing with Subscriptions

Subscription tiers apply the same discount structure to both pipelines. Pro subscribers get the lowest rates on both:

TierGPT-4o DiarizeQwen3-ASR-Flash
Free$3.88/hr$1.71/hr
Basic ($12/mo)~$3.61/hr~$1.59/hr
Plus ($39/mo)~$3.45/hr~$1.52/hr
Pro ($99/mo)~$3.18/hr~$1.40/hr

All plans are prepaid — no surprise bills. Your monthly credit and wallet balance work with both pipelines.

Try Qwen3-ASR-Flash today at transcribe.so/transcribe. Choose the pipeline that fits your job — speaker diarization with GPT-4o, or leaderboard-leading accuracy with Qwen3.

Ready to transcribe your own content?

No credit card required. Pay only for what you use.

See it in action

Real output from a real transcription

Browse chapters, ask questions, and explore search results from an actual transcript.

44 Harsh Truths About The Game Of Life - Naval Ravikant (4K)
Chris Williamson
Contents
8 chapters · 513 sections
1Happiness Versus Success: Philosophical Reflections on Contentment, Desire, and Motivation
2Optimizing Sleep: Smart Temperature Regulation and the Foundations of Self-Esteem
3Decisive Action and Iterative Practice: Keys to Optimal Choices and Mastery
4Wealth Management: From Materialism to Value Creation and Fair Compensation
5Evaluating LLMs: Capabilities, Limitations, and Their Role in AI's Evolving Landscape
6Pathogens, Evolution, and Knowledge: How Humans Adapt and Defend
7Agency, Power, and the Individual: From Child Development to Cultural Conflict
8Unseen Trends: Media Oversights, Medical Limitations, and the Primitive State of Modern Biology
Q&A preview
Answer
Naval explains two distinct paths to happiness using the story of Alexander and Diogenes. The first path is through success—conquering the world, satisfying material needs, and getting what you want. The second path, exemplified by Diogenes living in a barrel, is simply not wanting in the first place. As Socrates said when shown luxuries: 'How many things there are in this world that I do not want.' Naval suggests not wanting something is as good as having it—both paths lead to the same destination of contentment [00:38–01:10]. He's not sure which path is more valid, noting it depends on how you define success [01:10–01:25].

Command Palette

Search for a command to run...

No credit card required. Pay only for what you use.