Maharashtra's digitized land record system containing 2.11 crore (21.1 million) 7/12 satbara extracts across 358 talukas. Each extract follows a standardized Marathi template with structured fields for survey number, land area, landowner details, crop information, and encumbrances. The records are dynamically generated in Marathi and represent one of the largest standardized Marathi document sources available. Raw unannotated source requiring OCR ground-truth annotation, but the consistent template format makes automated annotation feasible. A community scraping tool exists for aggregation.
# MahaBhulekh - Maharashtra Land Records
# Browse: https://bhulekh.mahabhumi.gov.in/
# Digital Satbara: https://digitalsatbara.mahabhumi.gov.in/
# Aggregation tool: https://github.com/answerquest/mahabhulekh-7-12-aggregating
print("MahaBhulekh: 21.1M digitized 7/12 land records")
print("Standardized Marathi template - ideal for form OCR training")| Field | Type | Description |
|---|---|---|
| survey_number | string | Land survey number |
| village | string | Village name in Marathi |
| taluka | string | Taluka name |
| district | string | District name |
| landowner_name | string | Landowner name in Marathi |
| land_area | float | Land area in hectares/ares |
| crop_type | string | Current crop or land use |