MKI-26 Devanagari Handwritten Characters with Compound Characters

MKI-26 Devanagari Handwritten Characters with Compound Characters

Handwritten Devanagari character dataset specifically designed to include compound/conjunct characters (jodakshara) alongside basic characters. Contains 36,000 images across 60 classes (10 numerals, 13 vowels, 17 similar-looking consonants, and 20 compound character classes) with 600 balanced images per class. One of the few publicly available datasets that explicitly addresses conjunct character recognition — a major challenge for Marathi/Devanagari OCR where characters like क्ष, ज्ञ, त्र merge into single glyphs. Achieves 99.66% accuracy with CNN 2D.

Build a Devanagari character recognizer that handles both basic and compound characters for robust Marathi OCR.
HomepageGitHubPaper

Quick Start

# Clone from https://github.com/MKI-26/Devanagari-handwritten-character-dataset-with-Compound-characters
import torch
from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.Grayscale(),
    transforms.Resize((32, 32)),
    transforms.ToTensor()
])
dataset = datasets.ImageFolder('mki26_dataset/', transform=transform)
print(f"Total images: {len(dataset)}, Classes: {len(dataset.classes)}")
# 60 classes including 20 compound characters
Modality
Image (handwritten character crops with class labels)
Size
36,000 images; 60 classes (including 20 conjunct classes); 600 images per class
License
Format
PNG/JPEG
Language
mr, hi
Update Frequency
static
Organization
Research community

Schema

FieldTypeDescription
imageimageHandwritten Devanagari character image
character_classstringCharacter or compound character label
class_typestringType (numeral, vowel, consonant, compound)
class_idintNumeric class identifier (0-59)

Build With This

Create a conjunct-aware Marathi OCR system using MKI-26 for compound character classification within word segmentation
Develop an augmented training pipeline generating additional conjunct classes beyond the 20 in MKI-26 using font rendering
Build a confusable character analyzer testing OCR robustness on similar-looking Devanagari consonant pairs

AI Use Cases

Devanagari conjunct character recognitionCompound character classification for OCR pipelinesSimilar-looking character disambiguationHandwritten character recognition with conjunct support
Last verified: 2026-03-12