IndicSTR12

IndicSTR12

MH Specific

Indic Scene Text Recognition dataset covering 12 major Indian languages including Marathi. Word images collected from natural scenes such as signboards, shop nameplates, railway stations, advertisements, and banners

Build a robust Devanagari scene text recognition system for reading Marathi text in natural images.
Homepage

Quick Start

# IndicSTR12 dataset
import json
print('IndicSTR12: Scene text recognition for 12 Indian scripts')
print('Includes Devanagari for Marathi text recognition')
Modality
Image (scene text)
Size
27K+ word images (1K+ per language)
License
Format
PNG/JPEG
Language
mr
Update Frequency
static
Organization
AI4Bharat, IIT Madras

Schema

FieldTypeDescription
imageimageScene text image in Indian script
textstringGround truth text transcription
scriptstringScript type (Devanagari, etc.)

Build With This

Create a Marathi street sign reader for autonomous vehicle navigation in Maharashtra cities
Develop a product label reader for Marathi-labeled consumer goods in retail automation
Build a Marathi document image search engine indexing text recognized from scanned documents and photos

AI Use Cases

Scene text recognitionOCR for Indic scripts
Last verified: 2026-03-07