Microsoft-IITB Marathi Speech Corpus

Microsoft-IITB Marathi Speech Corpus

MH Specific

Crowdsourced conversational Marathi speech from three user demographics (rural, urban, student)

Build a robust Marathi ASR model using the Microsoft-IITB corpus for enterprise voice applications.
Homepage

Quick Start

# Download from OpenSLR or Microsoft Research
import torchaudio
print("Download Microsoft-IITB Marathi speech corpus")
print("Check OpenSLR (openslr.org) for download links")
Modality
Speech + Text
Size
109 hrs (93.9 hrs train + 5 hrs test); 36 speakers
License
Format
WAV
Language
mr
Update Frequency
static
Organization
Microsoft Research India / IIT Bombay

Schema

FieldTypeDescription
audioaudioMarathi speech audio file
textstringTranscription text

Build With This

Create a Marathi speech recognition benchmark suite comparing models trained on different Marathi speech corpora
Develop an on-device Marathi ASR model optimized for mobile deployment in areas with limited connectivity
Build a Marathi voice typing system for smartphones that works offline in rural Maharashtra

AI Use Cases

ASRDemographic-Aware Speech Modeling
Last verified: 2026-03-07