AI4Bharat IndicVoices-R

AI4Bharat IndicVoices-R

MH Specific

ASR-enhanced high-quality TTS corpus for 22 Indian languages; subset of IndicVoices optimized for speech synthesis

Build a Marathi read-speech ASR model optimized for formal reading scenarios like news broadcasting and audiobook narration.
HomepageHuggingFace

Quick Start

from datasets import load_dataset
ds = load_dataset('ai4bharat/IndicVoices-R', 'mr', split='train', streaming=True)
for i, ex in enumerate(ds):
    print(f"Transcript: {ex['transcript'][:80]}...")
    if i >= 4: break
Modality
Speech + Text (TTS-ready)
Size
1,704 hrs total (22 langs); 9-175 hrs/lang
License
Format
WAV
Language
mr
Update Frequency
static
Organization
AI4Bharat, IIT Madras

Schema

FieldTypeDescription
audioaudioRead speech audio recording
transcriptstringReference text that was read aloud
languagestringLanguage code

Build With This

Create an audiobook generation quality checker that compares TTS output against human read-speech patterns
Develop a Marathi reading fluency assessment tool for schools that evaluates student oral reading performance
Build a pronunciation training app for Marathi language learners using native read-speech as reference

AI Use Cases

TTSVoice CloningExpressive Speech Synthesis
Last verified: 2026-03-07