Sanskrit Letter Dataset (602 Character Classes)

Sanskrit Letter Dataset (602 Character Classes)

Handwritten Devanagari character dataset with the widest class diversity available — 602 character classes covering basic vowels, consonants, modifiers, AND hundreds of conjunct/compound character combinations found in Sanskrit texts. Contains 7,702 images (~12.8 per class). While the per-class sample count is low, the class inventory is invaluable as a reference for which conjuncts actually appear in real Devanagari text. Many Sanskrit conjuncts carry over into Marathi vocabulary (e.g., विद्या, संस्कृत, शास्त्र). Essential for building comprehensive conjunct recognition models.

Use the 602-class inventory to build a comprehensive conjunct coverage test suite for Marathi OCR evaluation.
HomepageGitHub

Quick Start

# Clone from https://github.com/avadesh02/Sanskrit-letter-dataset
import os
from PIL import Image

dataset_dir = 'Sanskrit-letter-dataset/'
classes = os.listdir(dataset_dir)
print(f"Total character classes: {len(classes)}")
# 602 classes including hundreds of conjunct characters
total = sum(len(os.listdir(os.path.join(dataset_dir, c))) for c in classes)
print(f"Total images: {total}")
Modality
Image (handwritten character crops)
Size
7,702 images; 602 character classes
License
Format
PNG/JPEG
Language
sa, mr, hi
Update Frequency
static
Organization
Research community (DAS 2018)

Schema

FieldTypeDescription
imageimageHandwritten Devanagari character image
character_classstringUnicode character or conjunct sequence label
class_idintNumeric class identifier (0-601)

Build With This

Create a data augmentation pipeline expanding each of the 602 classes to 500+ samples using font rendering and GAN generation
Develop a conjunct frequency analyzer mapping Sanskrit dataset classes against Marathi text corpus frequencies to prioritize OCR training
Build a few-shot conjunct recognizer using meta-learning to handle the long tail of rare Devanagari conjuncts

AI Use Cases

Comprehensive Devanagari conjunct character recognitionCharacter class inventory for OCR model coverage testingLow-shot learning for rare conjunct charactersSanskrit-Marathi shared vocabulary OCR
Last verified: 2026-03-12