AI4Bharat BhasaAnuvaad (Marathi)
MH SpecificLargest Indic speech translation dataset with curated, web-mined, and synthetic speech-text pairs for 13 Indian languages
Build a Marathi speech-to-speech translation system for real-time interpretation in multilingual Maharashtra settings.
Quick Start
from datasets import load_dataset
ds = load_dataset('ai4bharat/BhasaAnuvaad', 'mr-en', split='train', streaming=True)
for i, ex in enumerate(ds):
print(f"Source: {ex['source_text'][:60]}...")
print(f"Target: {ex['target_text'][:60]}...\n")
if i >= 4: break
Modality
Speech + Text (Translation)
Size
44,400 hrs total (13 langs + English)
Organization
AI4Bharat, IIT Madras
Schema
| Field | Type | Description |
|---|
| audio | audio | Source audio speech in one language |
| source_text | string | Transcription of source audio |
| target_text | string | Translation in target language |
Build With This
Create a Marathi-Hindi real-time interpreter for cross-state government meetings and conferences
Develop an audio translation service for Marathi farmers to access Hindi/English agricultural advisories
Build a multilingual tourism guide that provides spoken Marathi translations of Hindi/English tourist information
AI Use Cases
Speech TranslationMultilingual ASR
Last verified: 2026-03-07