Large-scale handwritten Devanagari character dataset containing approximately 4 million character samples, explicitly designed to address the limitations of existing datasets that fail on text containing matras (vowel modifiers) and conjuncts (jodakshara). Covers characters with varying combinations of matras and conjunct forms that appear in real Marathi/Hindi text. The scale and explicit focus on matras and conjuncts makes this one of the most important datasets for training robust Devanagari OCR systems.
# DevChar dataset
# Paper: https://link.springer.com/chapter/10.1007/978-981-16-2911-2_13
# Contact authors for access to DevChar2020
print("DevChar: ~4M handwritten Devanagari character images")
print("Explicitly covers matras and conjunct characters")
print("Addresses key weakness of standard 46-class datasets")| Field | Type | Description |
|---|---|---|
| image | image | Handwritten Devanagari character image (may include matra modifiers) |
| character_label | string | Unicode character with matra/conjunct annotation |
| has_matra | boolean | Whether character includes a matra modifier |
| is_conjunct | boolean | Whether character is a conjunct form |