How Well Does AI Transcribe Arabic? Qwen Flash on a Real MSA Episode
Arabic is one of the harder languages for speech recognition. It is diglossic: the written, formal register (Modern Standard Arabic, or MSA) is used for news, lectures, and most published content, while everyday speech splits into Gulf, Egyptian, Levantine, and Maghrebi dialects that differ enough to trip up most models. So the honest question is not "can AI transcribe Arabic" but "how well, and on which Arabic."
We measured it instead of guessing.
The number: 14.78% WER on Modern Standard Arabic
On the FLEURS benchmark for Modern Standard Arabic, our Qwen3-ASR-Flash pipeline reaches 14.78% word error rate. Word error rate is the share of words the model gets wrong (insertions, deletions, substitutions), so lower is better. For comparison, Voxtral Mini lands at 14.64% on the same split. These are published, reproducible figures, not marketing claims. The full per-language table and sources live on our benchmarks page.
A few things worth being clear about:
- This is MSA. News reads, lectures, formal interviews, and audiobook-style narration sit in this register, and that is where the 14.78% applies.
- Dialect is harder. Heavy Gulf or Maghrebi speech drops accuracy for every model on the market, ours included. We do not claim dialect parity with MSA, and you should be skeptical of anyone who does.
- Audio quality matters. Background noise, overlapping speakers, and low-bitrate audio all push WER up regardless of language.
What the pipeline actually does
Transcription is the first step, not the whole product. Every Arabic recording you run through transcribe.so gets the same downstream analysis as any other language:
- Transcribe the full audio or video with Qwen Flash, rendered right-to-left so the Arabic text reads correctly.
- Chapters and sections generated automatically, so a 26-minute episode becomes a navigable outline instead of a wall of text.
- Searchable playback: find any Arabic phrase and jump to the exact second it was said.
- Subtitle and SRT export, ready for CapCut, Premiere Pro, and DaVinci Resolve.
See it on a real Arabic episode
We ran a real 26-minute Arabic (Modern Standard Arabic) episode from the channel The Immigrant المهاجر through the exact pipeline described above. It produced 6 chapters and 28 sections, fully searchable. Nothing was hand-cleaned.
Click through the live, interactive result here: Arabic transcription example. Every chapter and section is clickable and seeks the video to that moment.
If you want the product page with the full feature breakdown and FAQ, that lives at Arabic transcription.
Try it on your own Arabic audio
Every new account gets free credit on signup, enough to transcribe a few hours, with no credit card required. Paste a YouTube link or upload a file, pick Arabic (or let auto-detect handle it), and see the result. Start at transcribe.so/transcribe.