Vision, OCR & Multimodal

Computer vision, optical character recognition, and multimodal datasets for Devanagari and Marathi content.

13 datasets

Collection of 16 Indic language datasets from IIT Bombay hosted on IndiaAI's AIKOSH platform as part of the BharatGen initiative. Includes handwritten and printed Devanagari script images, scanned table recognition data, 78+ hours of multilingual audio, QA pairs, and math word problems. Covers Marathi plus 9 other Indian languages.

Build a multimodal Marathi document understanding system using AIKosh vision-language datasets.
multimodal
IIT Bombay / IndiaAI Mission

Large-scale scene text dataset for 11 Indian languages plus English, sourced from Wikimedia images of Indian signboards and street scenes. Includes 5,113 Marathi word annotations with polygon bounding boxes

Build a Devanagari scene text recognition system for reading Marathi shop signs and street nameplates in urban Maharashtra.
Image (scene text)
IIIT Hyderabad

Marathi translations of MS-COCO image captions, verified by native Marathi speakers for linguistic accuracy and contextual integrity. Useful for training image captioning and cross-lingual retrieval models

Build a Marathi image captioning model that generates natural Marathi descriptions of photographs.
Text (caption pairs)
Microsoft COCO / AI4Bharat
CVQA
MH

Culturally-diverse Multilingual Visual Question Answering benchmark with questions from 30 countries in 31 languages including Marathi. Images and questions are annotated by native speakers familiar with local culture

Build a culturally-aware Marathi visual question answering system that understands Indian visual contexts.
Image + Text
CVQA Benchmark Authors

Handwritten character image database with 46 classes (36 characters + 10 digits) of Devanagari script. Each grayscale image is 32x32 pixels. Applicable to Marathi character recognition since Marathi uses Devanagari

Build a Devanagari handwriting recognition model for digitizing handwritten Marathi documents and forms.
Image (handwritten characters)
UCI Machine Learning Repository

Sentinel-2 satellite image dataset for land use and land cover classification. Contains geo-referenced RGB and multispectral images across 10 classes. Applicable to Maharashtra remote sensing and agricultural monitoring

Fine-tune a land use classifier pre-trained on EuroSAT for Maharashtra-specific land cover classification.
Satellite imagery (RGB + multispectral)
German Research Center for Artificial Intelligence (DFKI)

ICDAR 2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition. Covers 10 languages across 7 scripts including Devanagari, applicable to Marathi scene text

Build a multilingual scene text detector that handles Devanagari alongside other scripts in Indian street scenes.
Image (scene text)
ICDAR MLT Organizers

Large-scale Devanagari handwritten word dataset from CVIT, IIIT Hyderabad. Contains word-level images with corrected segmentation. Applicable to Marathi handwriting recognition as Marathi uses Devanagari script

Build a handwritten Marathi word recognition system for digitizing handwritten government documents and records.
Image (handwritten words)
IIIT Hyderabad

Indic Scene Text Recognition dataset covering 12 major Indian languages including Marathi. Word images collected from natural scenes such as signboards, shop nameplates, railway stations, advertisements, and banners

Build a robust Devanagari scene text recognition system for reading Marathi text in natural images.
Image (scene text)
AI4Bharat, IIT Madras

2,500+ images of full handwritten Marathi text (sentences and paragraphs, not isolated characters). Native speakers wrote pre-designed text covering nearly all Marathi characters, words, and diacritical marks. Fills the gap between character-level datasets (like Devanagari HWR) and real-world handwritten text recognition (HTR).

Build an end-to-end handwritten Marathi text recognition system for digitizing handwritten documents and forms.
image
Independent researcher (Kaggle)

Collection of ~12K Marathi word images with corresponding UTF-8 text labels, sourced from 12 Marathi books across various genres. Images are binarized, thresholded, and resized to 96 dpi for direct neural network input

Build a production-grade Marathi OCR engine for digitizing printed government documents and books.
Image (printed text)
IIT Bombay / IIIT Hyderabad

Marathi lip reading dataset containing video recordings of speakers pronouncing Marathi words and phrases, designed for visual speech recognition and lip-reading AI systems. One of the few lip-reading datasets for any Indian language.

Build a Marathi lip reading model for silent speech recognition in noisy environments or for hearing-impaired users.
video
Independent researcher (Kaggle)

Large-scale remote sensing image scene classification benchmark with 45 scene classes extracted from Google Earth covering 100+ countries. Applicable to Maharashtra urban planning, land use mapping, and infrastructure analysis

Fine-tune a remote sensing image classifier pre-trained on NWPU-RESISC45 for Maharashtra land use classification.
Satellite imagery (RGB)
Northwestern Polytechnical University, China