5-second MP3 audio samples across 10 Indian languages including Marathi, sourced from YouTube regional videos. Designed for spoken language identification and audio classification tasks rather than ASR transcription.
# Indian Languages Audio Dataset
import torchaudio
# Filter for Marathi subset
print("Access the Indian Languages Audio Dataset")
print("Filter for Marathi (mr) language code")| Field | Type | Description |
|---|---|---|
| audio | audio | Audio recording in Indian language |
| text | string | Transcription text |
| language | string | Language identifier |