Collection of 3,350 handwritten historical Modi script document images for document-level recognition research. Modi script was used for writing Marathi for over 700 years and vast archives of administrative, legal, and literary documents remain undigitized. This dataset provides full-page document scans suitable for training document-level detection and recognition models for historical Marathi manuscripts.
# Download from https://data.mendeley.com/datasets/sg337vf6wn/1
# Also available on IEEE DataPort
from PIL import Image
import os
img_dir = 'modi_hhdoc/'
images = [f for f in os.listdir(img_dir) if f.endswith(('.jpg', '.png'))]
print(f"Historical Modi documents: {len(images)}")
img = Image.open(os.path.join(img_dir, images[0]))
print(f"Image size: {img.size}")| Field | Type | Description |
|---|---|---|
| image | image | Full-page historical Modi script document scan |
| document_id | string | Unique document identifier |