Mozilla Common Voice (Marathi)

Mozilla Common Voice (Marathi)

MH Specific

Crowd-sourced read-speech recordings with validated transcriptions for Marathi, with approximately 30 hours total and 21 hours validated, part of Mozilla's open voice dataset initiative.

Build a Marathi voice assistant for farmers
HomepageHuggingFace

Quick Start

from datasets import load_dataset
ds = load_dataset("mozilla-foundation/common_voice_17_0", "mr", split="train")
sample = ds[0]
print(sample["sentence"], sample["path"])
Modality
speech+text
Size
~30 hrs total, ~21 hrs validated
License
Format
WAV
Language
mr
Update Frequency
monthly
Organization
Mozilla Foundation

Schema

FieldTypeDescription
client_idstringUnique hashed speaker identifier
pathstringRelative path to the audio clip (MP3/WAV)
sentencestringTranscribed Marathi text for the audio clip
up_votesintegerNumber of listener validations confirming correctness
down_votesintegerNumber of listener validations marking as incorrect
agestringSelf-reported age bracket of the speaker
genderstringSelf-reported gender of the speaker
accentstringSelf-reported accent or dialect

Build With This

Voice-controlled Marathi assistant for agricultural market prices
Marathi speech-to-text for court proceeding transcription
Accessibility app that reads Marathi web content aloud for visually impaired users

AI Use Cases

ASR trainingSpeaker identificationPronunciation modelingVoice interface development
Last verified: 2026-03-07