OpenSLR-64 (Marathi)

MH Specific

Crowdsourced high-quality multi-speaker Marathi speech corpus for TTS; female speakers only

Build a baseline Marathi ASR model using the OpenSLR-64 corpus for comparison with larger training sets.

Quick Start

# Download from https://openslr.org/64/
import torchaudio
# audio, sr = torchaudio.load('openslr64/mr/audio_sample.wav')
print("Download OpenSLR-64 Marathi: https://openslr.org/64/")

Modality

Speech + Text (TTS)

Size

~3 hrs; 712 MB archive

License

CC-BY-SA 4.0

Format

WAV

Language

Update Frequency

static

Organization

OpenSLR / Google

Schema

Field	Type	Description
audio	audio	Marathi speech audio file (WAV)
text	string	Transcription of the utterance

Build With This

Create a data augmentation pipeline that expands OpenSLR-64 with speed, pitch, and noise perturbations for robust ASR

Develop a Marathi phoneme-level acoustic model from the OpenSLR recordings for pronunciation research

Build a transfer learning study comparing ASR models pre-trained on OpenSLR-64 then fine-tuned on domain-specific data

AI Use Cases

TTSMulti-Speaker Voice Synthesis

Related Datasets

AI4Bharat BhasaAnuvaad (Marathi)

Speech + Text (Translation)

AI4Bharat IndicVoices

speech+text

AI4Bharat IndicVoices-R

Speech + Text (TTS-ready)

AI4Bharat Kathbath

Speech + Text

Last verified: 2026-03-07