MoDeTrans - Modi Script Document Transliteration Dataset

MoDeTrans - Modi Script Document Transliteration Dataset

MH Specific

Dataset of 2,043 historical Modi script document images paired with Devanagari transliterations. Modi was the official script for writing Marathi from the 12th century until the British colonial period when Devanagari replaced it. This dataset enables training vision-language models (MoScNet architecture) to transliterate Modi documents into modern Devanagari, unlocking centuries of Marathi historical records including Peshwa-era administrative documents, Shivaji Maharaj's correspondence, and Maratha empire legal records.

Build a Modi-to-Devanagari transliteration tool that makes historical Marathi documents readable to modern Marathi speakers.
HomepagePaper

Quick Start

# MoDeTrans dataset for Modi script transliteration
# Paper: https://arxiv.org/abs/2503.13060
from PIL import Image

# Load Modi script document images paired with Devanagari ground truth
# Train VLM-based transliteration model (MoScNet architecture)
print("MoDeTrans: 2,043 Modi script documents with Devanagari transliterations")
print("Historical Marathi script preservation dataset")
Modality
Image (historical document scans with Devanagari transliterations)
Size
2,043 document images with parallel Devanagari text
License
Format
Image + text pairs
Language
mr
Update Frequency
static
Organization
Research community

Schema

FieldTypeDescription
imageimageScanned historical Modi script document page
devanagari_textstringTransliterated text in modern Devanagari script
document_typestringType of historical document

Build With This

Create a historical Marathi document archive with automatic Modi-to-Devanagari conversion for researchers
Develop a Maratha empire correspondence analyzer extracting names, places, and dates from transliterated Modi documents
Build a public-facing web tool where users can upload Modi script images and receive Devanagari transliterations

AI Use Cases

Historical Modi script to Devanagari transliterationHistorical Marathi document digitizationMaratha empire document analysisCultural heritage preservation
Last verified: 2026-03-12