AI4Bharat BhasaAnuvaad (Marathi)

AI4Bharat BhasaAnuvaad (Marathi)

MH Specific

Largest Indic speech translation dataset with curated, web-mined, and synthetic speech-text pairs for 13 Indian languages

Build a Marathi speech-to-speech translation system for real-time interpretation in multilingual Maharashtra settings.
HomepageHuggingFace

Quick Start

from datasets import load_dataset
ds = load_dataset('ai4bharat/BhasaAnuvaad', 'mr-en', split='train', streaming=True)
for i, ex in enumerate(ds):
    print(f"Source: {ex['source_text'][:60]}...")
    print(f"Target: {ex['target_text'][:60]}...\n")
    if i >= 4: break
Modality
Speech + Text (Translation)
Size
44,400 hrs total (13 langs + English)
License
Format
WAV
Language
mr
Update Frequency
static
Organization
AI4Bharat, IIT Madras

Schema

FieldTypeDescription
audioaudioSource audio speech in one language
source_textstringTranscription of source audio
target_textstringTranslation in target language

Build With This

Create a Marathi-Hindi real-time interpreter for cross-state government meetings and conferences
Develop an audio translation service for Marathi farmers to access Hindi/English agricultural advisories
Build a multilingual tourism guide that provides spoken Marathi translations of Hindi/English tourist information

AI Use Cases

Speech TranslationMultilingual ASR
Last verified: 2026-03-07