AI4Bharat IndicVoices-R
MH SpecificASR-enhanced high-quality TTS corpus for 22 Indian languages; subset of IndicVoices optimized for speech synthesis
Build a Marathi read-speech ASR model optimized for formal reading scenarios like news broadcasting and audiobook narration.
Quick Start
from datasets import load_dataset
ds = load_dataset('ai4bharat/IndicVoices-R', 'mr', split='train', streaming=True)
for i, ex in enumerate(ds):
print(f"Transcript: {ex['transcript'][:80]}...")
if i >= 4: break
Modality
Speech + Text (TTS-ready)
Size
1,704 hrs total (22 langs); 9-175 hrs/lang
Organization
AI4Bharat, IIT Madras
Schema
| Field | Type | Description |
|---|
| audio | audio | Read speech audio recording |
| transcript | string | Reference text that was read aloud |
| language | string | Language code |
Build With This
Create an audiobook generation quality checker that compares TTS output against human read-speech patterns
Develop a Marathi reading fluency assessment tool for schools that evaluates student oral reading performance
Build a pronunciation training app for Marathi language learners using native read-speech as reference
AI Use Cases
TTSVoice CloningExpressive Speech Synthesis
Last verified: 2026-03-07