Massive handwritten word dataset containing 872,000 word instances across 10 Indic scripts including Devanagari, written by 135 writers. Each writer contributed approximately 6,460 word instances. Includes word-level bounding box annotations and Unicode transcriptions. The scale and writer diversity make this essential for training robust handwritten text recognition systems that generalize across writing styles. Devanagari subset directly applicable to Marathi handwriting recognition.
# Request access from CVIT, IIIT Hyderabad
# https://cvit.iiit.ac.in/research/projects/cvit-projects/iiit-indic-hw-words
from PIL import Image
# After downloading, filter Devanagari (applicable to Marathi)
# devanagari_words/ contains word images with transcription files
print("IIIT-INDIC-HW-WORDS: 872K handwritten word instances")
print("Filter Devanagari script for Marathi OCR training")| Field | Type | Description |
|---|---|---|
| image | image | Cropped handwritten word image |
| text | string | Unicode ground-truth transcription |
| script | string | Script identifier (Devanagari for Marathi/Hindi) |
| writer_id | string | Unique writer identifier |