Introducing Qwen3-ASR-Flash: Top-Tier AI Transcription with Leaderboard-Leading Accuracy
We've added a second transcription pipeline: Qwen3-ASR-Flash from Alibaba — ranked #1 on the HuggingFace Open ASR Leaderboard with a 4.25% average Word Error Rate. It runs alongside our existing GPT-4o Transcribe Diarize pipeline.
Both pipelines include the same downstream AI analysis — topics, chapters, summaries, semantic search, and Q&A with citations. The difference is in the transcription step itself.
Why Two Pipelines?
Not every transcription job needs the same model. A 3-hour lecture by a single speaker has different needs than a 20-minute podcast with 4 guests.
- GPT-4o Transcribe Diarize is the best choice when you need speaker identification (diarization).
- Qwen3-ASR-Flash is the best choice for top-tier accuracy, long single-speaker audio, word-level timestamps, or Chinese dialect support.
Head-to-Head Comparison
| Feature | GPT-4o Diarize | Qwen3-ASR-Flash |
|---|---|---|
| Provider | OpenAI | Alibaba (Qwen) |
| Speaker diarization | Yes | No |
| Timestamp type | Segment-level | Sentence + word-level (10 langs) |
| Languages | 50+ | 52 + 22 Chinese dialects |
| Max audio duration | Unlimited (chunked) | 12 hours native |
| Emotion detection | No | Yes |
| Price (Free tier) | $3.88/hr | $1.71/hr |
| Price (Pro tier) | ~$3.18/hr | ~$1.40/hr |
Where the Cost Difference Comes From
The Qwen3 transcription API is priced lower than GPT-4o ($0.13/hr vs $1.80/hr). The rest of the pipeline — GPT-4.1 for LLM processing, text-embedding-3-large for semantic search, and infrastructure — is shared between both pipelines.
| Component | GPT-4o Pipeline | Qwen3 Pipeline |
|---|---|---|
| Transcription | $1.80/hr | $0.13/hr |
| LLM (GPT-4.1) | $0.48/hr | $0.48/hr |
| Embeddings | $0.06/hr | $0.06/hr |
| Infrastructure | $1.00/hr | $1.00/hr |
| Provider total | $3.34/hr | $1.67/hr |
When to Use Each Pipeline
Choose GPT-4o Diarize when:
- You have multiple speakers (meetings, interviews, podcasts). For podcast transcription best practices, see the podcast guide.
- You need speaker labels in your transcript
- Maximum transcription accuracy matters most
- You're working in languages where GPT-4o excels
Choose Qwen3-ASR-Flash when:
- You want leaderboard-leading accuracy (#1 on HuggingFace Open ASR Leaderboard)
- You have long recordings (lectures, webinars, audiobooks) — try chapters for long recordings
- The audio has a single speaker or you don't need speaker labels
- You need word-level timestamps for precise subtitle generation
- You need Chinese dialect support (Cantonese, Sichuanese, and 20 more)
- You want emotion detection in the transcript
Chinese Dialect Support
Qwen3-ASR-Flash stands out with support for 22 Chinese dialects including Cantonese, Sichuanese, Fujian, Henan, Hubei, and more — far beyond what most transcription services offer. If your content includes regional Chinese speech, this pipeline is a significant upgrade.
Same AI Analysis, Different Starting Point
Regardless of which pipeline you choose, every transcription gets the same enrichment:
- Topic detection using advanced NLP
- Chapter generation with titles and summaries
- Semantic search across your transcript (text-embedding-3-large)
- Q&A with citations that link to exact timestamps
- AI summary with takeaways, quotes, and speaker profiles
How to Use It
- Go to Transcribe
- In the pipeline selector, choose Qwen3-ASR-Flash
- Upload your file or paste a YouTube URL
- Review the quote — you'll see the lower rate automatically
You can switch between pipelines for each transcription. Use GPT-4o for your podcast interviews and Qwen3 for your lecture recordings.
Pricing with Subscriptions
Subscription tiers apply the same discount structure to both pipelines. Pro subscribers get the lowest rates on both:
| Tier | GPT-4o Diarize | Qwen3-ASR-Flash |
|---|---|---|
| Free | $3.88/hr | $1.71/hr |
| Basic ($12/mo) | ~$3.61/hr | ~$1.59/hr |
| Plus ($39/mo) | ~$3.45/hr | ~$1.52/hr |
| Pro ($99/mo) | ~$3.18/hr | ~$1.40/hr |
All plans are prepaid — no surprise bills. Your monthly credit and wallet balance work with both pipelines.
Try Qwen3-ASR-Flash today at transcribe.so/transcribe. Choose the pipeline that fits your job — speaker diarization with GPT-4o, or leaderboard-leading accuracy with Qwen3.