Few-shot speech benchmark derived from FLoRes MT benchmark; read-speech in 102 languages including Marathi (mr_in)
from datasets import load_dataset
ds = load_dataset('google/fleurs', 'mr_in', split='test')
print(f"Test utterances: {len(ds)}")
for ex in ds[:5]:
print(f"Text: {ex['transcription'][:60]}...")| Field | Type | Description |
|---|---|---|
| audio | audio | Speech audio recording |
| transcription | string | Transcription in Marathi |
| id | int | Utterance ID aligned across 102 languages |