RESPIN Marathi Dialect-Rich Speech Corpus

RESPIN Marathi Dialect-Rich Speech Corpus

Part of the largest publicly available dialect-rich read-speech corpus for Indian languages, comprising 10,000+ hours validated audio across 9 languages. Marathi subset covers agriculture and finance domains with dialect-aware phonetic lexicons and speaker metadata. Captures rural speech patterns that urban-centric datasets miss.

Build a dialect-robust Marathi ASR model that performs well across regional speech varieties in Maharashtra.
HomepagePaper

Quick Start

# RESPIN Marathi dialect speech corpus
# Access from https://respin.iisc.ac.in/
print("RESPIN Marathi dialect speech corpus")
print("Access from: https://respin.iisc.ac.in/")
Modality
audio
Size
1,000+ hours Marathi speech; dialect tags; speaker metadata
License
Format
WAV + text transcripts + metadata
Language
mr
Update Frequency
static
Organization
IISc Bangalore / SPIRE Lab

Schema

FieldTypeDescription
audioaudioDialectal Marathi speech recording
textstringTranscription in standard Marathi
dialectstringDialect or regional variety identifier

Build With This

Create a Marathi dialect identification system from speech for sociolinguistic research across Maharashtra regions
Develop a dialect-to-standard Marathi speech normalizer that converts dialectal speech to standard form
Build a crowd-sourcing platform for collecting more dialectal Marathi speech data from underrepresented regions

AI Use Cases

Dialect-aware Marathi speech recognitionRural voice interface developmentAgricultural domain ASRSpeaker diarization with dialect identification
Last verified: 2026-03-09