MODI-HChar - Historical Modi Script Handwritten Character Dataset

MODI-HChar - Historical Modi Script Handwritten Character Dataset

MH Specific

Large-scale handwritten Modi script character dataset containing 575,920 character images across 57 classes (10 numerals, 12 vowels, 35 consonants). Modi script was the primary writing system for Marathi from the 12th to 20th century. This character-level dataset enables training classifiers for the foundational character recognition stage of historical Marathi document OCR. The scale (575K images) provides sufficient variety for robust recognition across different historical writing styles.

Train a Modi script character classifier achieving high accuracy across all 57 character classes.
Homepage

Quick Start

# Download from IEEE DataPort or Kaggle
# https://www.kaggle.com/datasets/msd6013/modi-hdc-historical-handwritten-modi-script
import torch
from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.Grayscale(),
    transforms.Resize((32, 32)),
    transforms.ToTensor()
])
# Load as image folder dataset (57 class subdirectories)
dataset = datasets.ImageFolder('modi_hchar/', transform=transform)
print(f"Total images: {len(dataset)}, Classes: {len(dataset.classes)}")
Modality
Image (handwritten character crops)
Size
575,920 character images; 57 classes
License
Format
PNG/JPEG
Language
mr
Update Frequency
static
Organization
Research community

Schema

FieldTypeDescription
imageimageCropped handwritten Modi script character image
character_classstringModi character label (vowel, consonant, or numeral)
class_idintNumeric class identifier (0-56)

Build With This

Create a Modi-Devanagari character mapping tool that visually maps Modi characters to their Devanagari equivalents
Develop a Modi script handwriting recognition model combining character classification with word-level context
Build a historical document transcription assistant that segments Modi text and classifies individual characters

AI Use Cases

Modi script character classificationHistorical Marathi OCR character recognition stageScript identification (Modi vs. Devanagari)Historical handwriting analysis
Last verified: 2026-03-12