Google FLEURS (Marathi)

MH Specific

Few-shot speech benchmark derived from FLoRes MT benchmark; read-speech in 102 languages including Marathi (mr_in)

Benchmark Marathi ASR models against international standards using the FLEURS evaluation set.

Quick Start

from datasets import load_dataset
ds = load_dataset('google/fleurs', 'mr_in', split='test')
print(f"Test utterances: {len(ds)}")
for ex in ds[:5]:
    print(f"Text: {ex['transcription'][:60]}...")

Modality

Speech + Text

Size

~10 hrs train; 2,009 sentences total

License

CC-BY 4.0

Format

WAV

Language

Update Frequency

static

Organization

Google Research

Schema

Field	Type	Description
audio	audio	Speech audio recording
transcription	string	Transcription in Marathi
id	int	Utterance ID aligned across 102 languages

Build With This

Create a cross-lingual speech retrieval system that finds equivalent utterances across Marathi and other Indian languages

Develop a Marathi speech recognition error analysis tool that identifies specific phoneme-level weaknesses

Build a multilingual speech corpus by combining FLEURS Marathi with other Indic language subsets for joint training

AI Use Cases

ASR EvaluationSpeech LangIDMultilingual Benchmarking

Related Datasets

AI4Bharat BhasaAnuvaad (Marathi)

Speech + Text (Translation)

AI4Bharat IndicVoices

speech+text

AI4Bharat IndicVoices-R

Speech + Text (TTS-ready)

AI4Bharat Kathbath

Speech + Text

Last verified: 2026-03-07