Handwritten Devanagari character dataset specifically designed to include compound/conjunct characters (jodakshara) alongside basic characters. Contains 36,000 images across 60 classes (10 numerals, 13 vowels, 17 similar-looking consonants, and 20 compound character classes) with 600 balanced images per class. One of the few publicly available datasets that explicitly addresses conjunct character recognition — a major challenge for Marathi/Devanagari OCR where characters like क्ष, ज्ञ, त्र merge into single glyphs. Achieves 99.66% accuracy with CNN 2D.
# Clone from https://github.com/MKI-26/Devanagari-handwritten-character-dataset-with-Compound-characters
import torch
from torchvision import datasets, transforms
transform = transforms.Compose([
transforms.Grayscale(),
transforms.Resize((32, 32)),
transforms.ToTensor()
])
dataset = datasets.ImageFolder('mki26_dataset/', transform=transform)
print(f"Total images: {len(dataset)}, Classes: {len(dataset.classes)}")
# 60 classes including 20 compound characters| Field | Type | Description |
|---|---|---|
| image | image | Handwritten Devanagari character image |
| character_class | string | Character or compound character label |
| class_type | string | Type (numeral, vowel, consonant, compound) |
| class_id | int | Numeric class identifier (0-59) |