IndiGEC Marathi Grammar Error Correction

IndiGEC Marathi Grammar Error Correction

MH Specific

First Marathi grammar error correction dataset, part of a multilingual GEC benchmark covering Hindi, Bengali, Marathi, and Tamil. Provides source-target corrected sentence pairs for training spelling and grammar checkers. Fills a critical gap since no GEC resources for Marathi previously existed.

Build a Marathi grammar correction API for word processors and messaging apps used in Maharashtra.
HomepagePaper

Quick Start

# Download from https://github.com/AI4Bharat/IndicGEC
from datasets import load_dataset
ds = load_dataset('ai4bharat/IndicGEC', 'mr')
for ex in ds['test'][:5]:
    print(f"Incorrect: {ex['incorrect'][:60]}...")
    print(f"Correct: {ex['correct'][:60]}...\n")
Modality
text
Size
Source-target sentence pairs for grammar correction
License
Format
Text (source-target pairs)
Language
mr
Update Frequency
static
Organization
EMNLP 2025

Schema

FieldTypeDescription
incorrectstringMarathi text with grammatical errors
correctstringGrammatically corrected Marathi text

Build With This

Create a real-time Marathi writing assistant browser extension that highlights and suggests grammar corrections
Develop a Marathi language learning app that provides grammar feedback on student writing exercises
Build an automated proofreading pipeline for Marathi government document publishing workflows

AI Use Cases

Marathi spell checker developmentGrammar correction for Marathi text editorsWriting assistance toolsAutomated proofreading
Last verified: 2026-03-09