Mozilla Common Voice (Marathi)

MH Specific

Crowd-sourced read-speech recordings with validated transcriptions for Marathi, with approximately 30 hours total and 21 hours validated, part of Mozilla's open voice dataset initiative.

Build a Marathi voice assistant for farmers

Homepage HuggingFace

Quick Start

from datasets import load_dataset
ds = load_dataset("mozilla-foundation/common_voice_17_0", "mr", split="train")
sample = ds[0]
print(sample["sentence"], sample["path"])

Modality

speech+text

Size

~30 hrs total, ~21 hrs validated

License

CC0-1.0

Format

WAV

Language

Update Frequency

monthly

Organization

Mozilla Foundation

Schema

Field	Type	Description
client_id	string	Unique hashed speaker identifier
path	string	Relative path to the audio clip (MP3/WAV)
sentence	string	Transcribed Marathi text for the audio clip
up_votes	integer	Number of listener validations confirming correctness
down_votes	integer	Number of listener validations marking as incorrect
age	string	Self-reported age bracket of the speaker
gender	string	Self-reported gender of the speaker
accent	string	Self-reported accent or dialect

Build With This

Voice-controlled Marathi assistant for agricultural market prices

Marathi speech-to-text for court proceeding transcription

Accessibility app that reads Marathi web content aloud for visually impaired users

AI Use Cases

ASR trainingSpeaker identificationPronunciation modelingVoice interface development

Related Datasets

AI4Bharat BhasaAnuvaad (Marathi)

Speech + Text (Translation)

AI4Bharat IndicVoices

speech+text

AI4Bharat IndicVoices-R

Speech + Text (TTS-ready)

AI4Bharat Kathbath

Speech + Text

Last verified: 2026-03-07