Culture, Media & Heritage

Marathi literature, media archives, tourism statistics, and heritage site datasets.

9 datasets

Maharashtra — List of 285 centrally protected and 244 state protected archaeological monuments and heritage sites in Maharashtra maintained by the Archaeological Survey of India

Build a heritage tourism route optimizer for Maharashtra that creates itineraries connecting ASI-protected monuments.
Tabular (PDF, web)
Archaeological Survey of India

Movie and TV subtitle dataset in 10 Indic languages sourced from OpenSubtitles.org. Contains pre-processed dialogues in JSONL format. The only publicly available Marathi conversational/dialogue dataset, essential for training chatbots and conversational AI in Marathi.

Build a conversational Marathi language model trained on natural movie dialogue for chatbot development.
text
Mendeley Data

Maharashtra — Indian National Trust for Art and Cultural Heritage listings of unprotected heritage sites, buildings, and cultural landscapes in Maharashtra

Build a Maharashtra heritage discovery app that helps tourists explore INTACH-listed heritage sites by location and interest.
Text, Images (web, PDF)
Indian National Trust for Art and Cultural Heritage (INTACH)

Official tourism data from the Maharashtra Tourism Development Corporation covering domestic and international visitor numbers, tourist destinations, and accommodation statistics

Build a tourism analytics dashboard for Maharashtra showing visitor patterns, seasonal trends, and revenue estimates.
Tabular (PDF, web)
Maharashtra Tourism Development Corporation

Historical and contemporary Marathi newspaper collections from major publications (Loksatta, Sakal, Maharashtra Times) available through digital archives

Build a Marathi news analysis platform with topic modeling, sentiment tracking, and trend detection.
Text (Marathi)
Various Marathi Newspapers / Digital Library of India

Full dump of Marathi Wikipedia articles providing encyclopaedic knowledge coverage across diverse topics in Marathi language

Build a Marathi text classification model trained on Wikipedia article categories for document categorization.
Text (Marathi)
Wikimedia Foundation

Marathi Collection — Digital library providing access to Marathi books, manuscripts, lecture videos, and research articles across multiple disciplines, with Marathi language interface

Build a Marathi digital content discovery platform using NDLI's collection for educational resource access.
Text, Multimedia
Ministry of Education / IIT Kharagpur

Maharashtra — Detailed documentation for Maharashtra's UNESCO World Heritage Sites including Ajanta Caves, Ellora Caves, Elephanta Caves, Chhatrapati Shivaji Terminus, and Victorian Gothic/Art Deco ensembles of Mumbai

Build a Maharashtra World Heritage tourism platform highlighting Ajanta, Ellora, Chhatrapati Shivaji Terminus, and other sites.
Text, Images (web)
UNESCO

Public domain Marathi literature including 1,000+ books from Maharashtra Granthottejak Sanstha, classical texts (Dnyaneshwari, Dasbodh, Haripath), and historical documents

Build a Marathi literary text analyzer that studies writing styles and linguistic features across historical Marathi literature.
Text (Marathi)
Wikimedia Foundation