IndicDialogue Marathi (Movie/TV Subtitles)

IndicDialogue Marathi (Movie/TV Subtitles)

Movie and TV subtitle dataset in 10 Indic languages sourced from OpenSubtitles.org. Contains pre-processed dialogues in JSONL format. The only publicly available Marathi conversational/dialogue dataset, essential for training chatbots and conversational AI in Marathi.

Build a conversational Marathi language model trained on natural movie dialogue for chatbot development.

Quick Start

# IndicDialogue Marathi subtitles
import json
print('IndicDialogue: Marathi movie subtitle corpus')
print('Access from research repositories')
Modality
text
Size
Part of 6.85M dialogues total across 10 languages; Marathi subset
License
Format
SRT / JSONL
Language
mr
Update Frequency
static
Organization
Mendeley Data

Schema

FieldTypeDescription
dialoguestringMarathi dialogue text from subtitles
movie_idstringSource movie identifier
timestampstringSubtitle timestamp

Build With This

Create a Marathi dialogue generation model for video game and interactive fiction applications
Develop a Marathi speech style analyzer comparing formal vs colloquial language patterns in movie subtitles
Build a Marathi vocabulary frequency analyzer from movie subtitles for language learning curriculum design

AI Use Cases

Marathi chatbot trainingConversational AI developmentDialogue generationInformal Marathi language modeling
Last verified: 2026-03-09