MahaBhulekh - Maharashtra 7/12 Satbara Land Record Extracts

MahaBhulekh - Maharashtra 7/12 Satbara Land Record Extracts

MH Specific MH Subset Needed

Maharashtra's digitized land record system containing 2.11 crore (21.1 million) 7/12 satbara extracts across 358 talukas. Each extract follows a standardized Marathi template with structured fields for survey number, land area, landowner details, crop information, and encumbrances. The records are dynamically generated in Marathi and represent one of the largest standardized Marathi document sources available. Raw unannotated source requiring OCR ground-truth annotation, but the consistent template format makes automated annotation feasible. A community scraping tool exists for aggregation.

Build an annotation pipeline converting MahaBhulekh 7/12 extracts into field-level OCR training data with bounding boxes and transcriptions.
Maharashtra subset not yet extracted. This is a global dataset that contains data covering Maharashtra. A regional subset can be extracted by filtering on geographic coordinates or administrative boundaries.

Quick Start

# MahaBhulekh - Maharashtra Land Records
# Browse: https://bhulekh.mahabhumi.gov.in/
# Digital Satbara: https://digitalsatbara.mahabhumi.gov.in/
# Aggregation tool: https://github.com/answerquest/mahabhulekh-7-12-aggregating

print("MahaBhulekh: 21.1M digitized 7/12 land records")
print("Standardized Marathi template - ideal for form OCR training")
Modality
Document images (standardized Marathi form template, unannotated)
Size
21.1 million digitized extracts; 358 talukas; all 36 districts
License
Format
HTML / PDF (dynamically generated)
Language
mr
Update Frequency
continuous
Organization
Maharashtra Revenue Department, Government of Maharashtra

Schema

FieldTypeDescription
survey_numberstringLand survey number
villagestringVillage name in Marathi
talukastringTaluka name
districtstringDistrict name
landowner_namestringLandowner name in Marathi
land_areafloatLand area in hectares/ares
crop_typestringCurrent crop or land use

Build With This

Create a FUNSD-style key-value extraction dataset for Marathi using 7/12 extract templates as the annotation base
Develop an automated land record search system that OCRs scanned satbara documents and makes them searchable
Build a land dispute analyzer extracting ownership history from sequences of 7/12 extracts for the same survey number

AI Use Cases

Marathi government form OCR training data sourceStructured document field extraction for DevanagariLand record digitization and searchabilityTemplate-based document understanding model training
Last verified: 2026-03-12