Google FLEURS (Marathi)

Google FLEURS (Marathi)

MH Specific

Few-shot speech benchmark derived from FLoRes MT benchmark; read-speech in 102 languages including Marathi (mr_in)

Benchmark Marathi ASR models against international standards using the FLEURS evaluation set.
HomepageHuggingFace

Quick Start

from datasets import load_dataset
ds = load_dataset('google/fleurs', 'mr_in', split='test')
print(f"Test utterances: {len(ds)}")
for ex in ds[:5]:
    print(f"Text: {ex['transcription'][:60]}...")
Modality
Speech + Text
Size
~10 hrs train; 2,009 sentences total
License
Format
WAV
Language
mr
Update Frequency
static
Organization
Google Research

Schema

FieldTypeDescription
audioaudioSpeech audio recording
transcriptionstringTranscription in Marathi
idintUtterance ID aligned across 102 languages

Build With This

Create a cross-lingual speech retrieval system that finds equivalent utterances across Marathi and other Indian languages
Develop a Marathi speech recognition error analysis tool that identifies specific phoneme-level weaknesses
Build a multilingual speech corpus by combining FLEURS Marathi with other Indic language subsets for joint training

AI Use Cases

ASR EvaluationSpeech LangIDMultilingual Benchmarking
Last verified: 2026-03-07