AI4Bharat IndicVoices-R

MH Specific

ASR-enhanced high-quality TTS corpus for 22 Indian languages; subset of IndicVoices optimized for speech synthesis

Build a Marathi read-speech ASR model optimized for formal reading scenarios like news broadcasting and audiobook narration.

Homepage HuggingFace

Quick Start

from datasets import load_dataset
ds = load_dataset('ai4bharat/IndicVoices-R', 'mr', split='train', streaming=True)
for i, ex in enumerate(ds):
    print(f"Transcript: {ex['transcript'][:80]}...")
    if i >= 4: break

Modality

Speech + Text (TTS-ready)

Size

1,704 hrs total (22 langs); 9-175 hrs/lang

License

CC-BY 4.0

Format

WAV

Language

Update Frequency

static

Organization

AI4Bharat, IIT Madras

Schema

Field	Type	Description
audio	audio	Read speech audio recording
transcript	string	Reference text that was read aloud
language	string	Language code

Build With This

Create an audiobook generation quality checker that compares TTS output against human read-speech patterns

Develop a Marathi reading fluency assessment tool for schools that evaluates student oral reading performance

Build a pronunciation training app for Marathi language learners using native read-speech as reference

AI Use Cases

TTSVoice CloningExpressive Speech Synthesis

Related Datasets

AI4Bharat BhasaAnuvaad (Marathi)

Speech + Text (Translation)

AI4Bharat IndicVoices

speech+text

AI4Bharat Kathbath

Speech + Text

AI4Bharat Kathbath (Marathi)

speech+text

Last verified: 2026-03-07