RESPIN Marathi Dialect-Rich Speech Corpus

Part of the largest publicly available dialect-rich read-speech corpus for Indian languages, comprising 10,000+ hours validated audio across 9 languages. Marathi subset covers agriculture and finance domains with dialect-aware phonetic lexicons and speaker metadata. Captures rural speech patterns that urban-centric datasets miss.

Build a dialect-robust Marathi ASR model that performs well across regional speech varieties in Maharashtra.

Homepage Paper

Quick Start

# RESPIN Marathi dialect speech corpus
# Access from https://respin.iisc.ac.in/
print("RESPIN Marathi dialect speech corpus")
print("Access from: https://respin.iisc.ac.in/")

Modality

audio

Size

1,000+ hours Marathi speech; dialect tags; speaker metadata

License

Research (IISc / Gates Foundation)

Format

WAV + text transcripts + metadata

Language

Update Frequency

static

Organization

IISc Bangalore / SPIRE Lab

Schema

Field	Type	Description
audio	audio	Dialectal Marathi speech recording
text	string	Transcription in standard Marathi
dialect	string	Dialect or regional variety identifier

Build With This

Create a Marathi dialect identification system from speech for sociolinguistic research across Maharashtra regions

Develop a dialect-to-standard Marathi speech normalizer that converts dialectal speech to standard form

Build a crowd-sourcing platform for collecting more dialectal Marathi speech data from underrepresented regions

AI Use Cases

Dialect-aware Marathi speech recognitionRural voice interface developmentAgricultural domain ASRSpeaker diarization with dialect identification

Related Datasets

AI4Bharat BhasaAnuvaad (Marathi)

Speech + Text (Translation)

AI4Bharat IndicVoices

speech+text

AI4Bharat IndicVoices-R

Speech + Text (TTS-ready)

AI4Bharat Kathbath

Speech + Text

Last verified: 2026-03-09