Dataset of 2,043 historical Modi script document images paired with Devanagari transliterations. Modi was the official script for writing Marathi from the 12th century until the British colonial period when Devanagari replaced it. This dataset enables training vision-language models (MoScNet architecture) to transliterate Modi documents into modern Devanagari, unlocking centuries of Marathi historical records including Peshwa-era administrative documents, Shivaji Maharaj's correspondence, and Maratha empire legal records.
# MoDeTrans dataset for Modi script transliteration
# Paper: https://arxiv.org/abs/2503.13060
from PIL import Image
# Load Modi script document images paired with Devanagari ground truth
# Train VLM-based transliteration model (MoScNet architecture)
print("MoDeTrans: 2,043 Modi script documents with Devanagari transliterations")
print("Historical Marathi script preservation dataset")| Field | Type | Description |
|---|---|---|
| image | image | Scanned historical Modi script document page |
| devanagari_text | string | Transliterated text in modern Devanagari script |
| document_type | string | Type of historical document |