First Marathi grammar error correction dataset, part of a multilingual GEC benchmark covering Hindi, Bengali, Marathi, and Tamil. Provides source-target corrected sentence pairs for training spelling and grammar checkers. Fills a critical gap since no GEC resources for Marathi previously existed.
# Download from https://github.com/AI4Bharat/IndicGEC
from datasets import load_dataset
ds = load_dataset('ai4bharat/IndicGEC', 'mr')
for ex in ds['test'][:5]:
print(f"Incorrect: {ex['incorrect'][:60]}...")
print(f"Correct: {ex['correct'][:60]}...\n")| Field | Type | Description |
|---|---|---|
| incorrect | string | Marathi text with grammatical errors |
| correct | string | Grammatically corrected Marathi text |