Handwritten Devanagari character dataset with the widest class diversity available — 602 character classes covering basic vowels, consonants, modifiers, AND hundreds of conjunct/compound character combinations found in Sanskrit texts. Contains 7,702 images (~12.8 per class). While the per-class sample count is low, the class inventory is invaluable as a reference for which conjuncts actually appear in real Devanagari text. Many Sanskrit conjuncts carry over into Marathi vocabulary (e.g., विद्या, संस्कृत, शास्त्र). Essential for building comprehensive conjunct recognition models.
# Clone from https://github.com/avadesh02/Sanskrit-letter-dataset
import os
from PIL import Image
dataset_dir = 'Sanskrit-letter-dataset/'
classes = os.listdir(dataset_dir)
print(f"Total character classes: {len(classes)}")
# 602 classes including hundreds of conjunct characters
total = sum(len(os.listdir(os.path.join(dataset_dir, c))) for c in classes)
print(f"Total images: {total}")| Field | Type | Description |
|---|---|---|
| image | image | Handwritten Devanagari character image |
| character_class | string | Unicode character or conjunct sequence label |
| class_id | int | Numeric class identifier (0-601) |