IIIT-INDIC-HW-WORDS - Large-Scale Handwritten Indic Words

IIIT-INDIC-HW-WORDS - Large-Scale Handwritten Indic Words

MH Subset Needed

Massive handwritten word dataset containing 872,000 word instances across 10 Indic scripts including Devanagari, written by 135 writers. Each writer contributed approximately 6,460 word instances. Includes word-level bounding box annotations and Unicode transcriptions. The scale and writer diversity make this essential for training robust handwritten text recognition systems that generalize across writing styles. Devanagari subset directly applicable to Marathi handwriting recognition.

Train a writer-independent Marathi handwriting recognizer using the Devanagari subset of this large-scale dataset.
Maharashtra subset not yet extracted. This is a global dataset that contains data covering Maharashtra. A regional subset can be extracted by filtering on geographic coordinates or administrative boundaries.
Homepage

Quick Start

# Request access from CVIT, IIIT Hyderabad
# https://cvit.iiit.ac.in/research/projects/cvit-projects/iiit-indic-hw-words
from PIL import Image

# After downloading, filter Devanagari (applicable to Marathi)
# devanagari_words/ contains word images with transcription files
print("IIIT-INDIC-HW-WORDS: 872K handwritten word instances")
print("Filter Devanagari script for Marathi OCR training")
Modality
Image (handwritten word crops with transcriptions)
Size
872K word instances; 135 writers; 10 Indic scripts
License
Format
PNG/JPEG with text labels
Language
mr, hi, bn, ta, te, kn, ml, gu, pa, or
Update Frequency
static
Organization
CVIT, IIIT Hyderabad

Schema

FieldTypeDescription
imageimageCropped handwritten word image
textstringUnicode ground-truth transcription
scriptstringScript identifier (Devanagari for Marathi/Hindi)
writer_idstringUnique writer identifier

Build With This

Create a handwriting-to-text API for Marathi documents using IIIT-INDIC-HW-WORDS as the primary training corpus
Develop a writer identification system for Marathi handwriting that can distinguish between 100+ writing styles
Build a handwriting difficulty analyzer scoring legibility of Marathi handwritten samples for quality control

AI Use Cases

Large-scale handwritten Marathi word recognitionWriter-independent handwriting model trainingCross-script transfer learning for handwritten OCRHandwriting style analysis and generation
Last verified: 2026-03-12