CHIPS - Corpus of Handwritten Indic Scripts (Page-Level OCR)

CHIPS - Corpus of Handwritten Indic Scripts (Page-Level OCR)

MH Subset Needed

Page-level handwritten OCR dataset for Indic scripts with both text detection and recognition annotations. Part of the PLATTER (Page-Level Handwritten Text Recognition) project. Unlike word-level datasets, CHIPS provides full-page handwritten document images with bounding box annotations for text regions plus Unicode transcriptions, enabling training end-to-end page-level OCR systems that handle detection and recognition jointly. Covers multiple Indic scripts including Devanagari.

Build a page-level Marathi handwritten OCR system that processes entire document pages without manual line segmentation.
Maharashtra subset not yet extracted. This is a global dataset that contains data covering Maharashtra. A regional subset can be extracted by filtering on geographic coordinates or administrative boundaries.
HomepagePaper

Quick Start

# PLATTER project - page-level handwritten OCR
# Paper: https://arxiv.org/abs/2502.06172
# Contact authors for dataset access

print("CHIPS: Page-level handwritten Indic OCR dataset")
print("Supports detection + recognition jointly")
print("Filter Devanagari script pages for Marathi OCR")
Modality
Image (full-page handwritten documents with detection + recognition annotations)
Size
1,458+ page-level document images; 11 Indic scripts; 463 writers
License
Format
PNG/JPEG with COCO-format annotations
Language
mr, hi, bn, ta, te, kn, ml, gu, pa, or, as
Update Frequency
static
Organization
Multi-institutional (PLATTER project)

Schema

FieldTypeDescription
imageimageFull-page handwritten document scan
text_regionsjsonBounding box coordinates for text line regions
transcriptionsarrayUnicode transcriptions for each detected text region
scriptstringScript identifier
writer_idstringWriter identifier

Build With This

Create an automatic Marathi handwritten exam paper grader that detects and reads handwritten answers from full pages
Develop a historical Marathi letter digitizer that handles full pages of cursive Devanagari handwriting
Build a handwritten form processor for Maharashtra government offices that extracts field values from handwritten forms

AI Use Cases

Page-level handwritten Marathi text detection and recognitionEnd-to-end handwritten document OCRText line detection in handwritten Indic documentsHandwriting segmentation model training
Last verified: 2026-03-12