Awesome Marathi Datasets

Discover Marathi & Maharashtra Datasets

248 datasets across 15 categories. Find data, spark ideas, and start building.

Browse by Category

Language & NLP

40 datasets

Foundation datasets for Marathi language models, NER, sentiment analysis, machine translation, and text processing.

Speech & Audio

18 datasets

Speech recognition, text-to-speech, and audio datasets covering Marathi and Indian languages.

Vision, OCR & Multimodal

13 datasets

Computer vision, optical character recognition, and multimodal datasets for Devanagari and Marathi content.

Geospatial & GIS

28 datasets

Satellite imagery, maps, boundaries, and geographic information systems data for Maharashtra.

Agriculture & Rural

30 datasets

Crop production, soil health, market prices, weather, and rural development datasets for Maharashtra.

Health & Nutrition

14 datasets

Public health surveys, hospital data, disease surveillance, and nutrition datasets for Maharashtra.

Education & Skills

8 datasets

School enrollment, learning outcomes, skill development, and higher education datasets.

Economy, Labour & Finance

12 datasets

Economic indicators, employment surveys, MSME data, and financial statistics for Maharashtra.

Environment, Climate & Disaster

21 datasets

Air quality, climate data, flood monitoring, and disaster management datasets for Maharashtra.

Transport & Urban Infrastructure

11 datasets

Public transit, road networks, smart city, and urban development datasets for Maharashtra.

Governance, Census & Legal

17 datasets

Census data, election results, government resolutions, legislation, and demographic datasets.

Culture, Media & Heritage

9 datasets

Marathi literature, media archives, tourism statistics, and heritage site datasets.

Real-Time Streams & APIs

8 datasets

Live data feeds, REST APIs, and streaming endpoints for Maharashtra-relevant data.

Agentic, Instruction & RAG

8 datasets

Instruction-tuning datasets, RAG knowledge bases, and QA corpora for building Marathi AI agents.

Benchmarks, Tools & Dialects

11 datasets

Evaluation benchmarks, NLP toolkits, dialect resources, and fairness datasets for Marathi.

Biggest Opportunities

View all gaps
Conversational / Dialogue datasets

Critical for building Marathi chatbots, virtual assistants, and customer service AI.

language-nlp
Commonsense reasoning

Needed for Marathi LLMs to understand cultural context and implicit knowledge.

language-nlp
Emotional speech

Needed for call center sentiment analysis, mental health monitoring, accessibility.

speech-audio