IndiGEC Marathi Grammar Error Correction

MH Specific

First Marathi grammar error correction dataset, part of a multilingual GEC benchmark covering Hindi, Bengali, Marathi, and Tamil. Provides source-target corrected sentence pairs for training spelling and grammar checkers. Fills a critical gap since no GEC resources for Marathi previously existed.

Build a Marathi grammar correction API for word processors and messaging apps used in Maharashtra.

Homepage Paper

Quick Start

# Download from https://github.com/AI4Bharat/IndicGEC
from datasets import load_dataset
ds = load_dataset('ai4bharat/IndicGEC', 'mr')
for ex in ds['test'][:5]:
    print(f"Incorrect: {ex['incorrect'][:60]}...")
    print(f"Correct: {ex['correct'][:60]}...\n")

Modality

text

Size

Source-target sentence pairs for grammar correction

License

Research (EMNLP 2025)

Format

Text (source-target pairs)

Language

Update Frequency

static

Organization

EMNLP 2025

Schema

Field	Type	Description
incorrect	string	Marathi text with grammatical errors
correct	string	Grammatically corrected Marathi text

Build With This

Create a real-time Marathi writing assistant browser extension that highlights and suggests grammar corrections

Develop a Marathi language learning app that provides grammar feedback on student writing exercises

Build an automated proofreading pipeline for Marathi government document publishing workflows

AI Use Cases

Marathi spell checker developmentGrammar correction for Marathi text editorsWriting assistance toolsAutomated proofreading

Related Datasets

AI4Bharat BPCC (mr)

parallel-text

AI4Bharat IndicCorp v1 (mr)

text

AI4Bharat IndicCorp v2 (Marathi)

text

AI4Bharat IndicGLUE (mr)

text

Last verified: 2026-03-09