Movie and TV subtitle dataset in 10 Indic languages sourced from OpenSubtitles.org. Contains pre-processed dialogues in JSONL format. The only publicly available Marathi conversational/dialogue dataset, essential for training chatbots and conversational AI in Marathi.
# IndicDialogue Marathi subtitles
import json
print('IndicDialogue: Marathi movie subtitle corpus')
print('Access from research repositories')| Field | Type | Description |
|---|---|---|
| dialogue | string | Marathi dialogue text from subtitles |
| movie_id | string | Source movie identifier |
| timestamp | string | Subtitle timestamp |