AI4Bharat IndicGLUE (mr)

MH Specific

AI4Bharat IndicGLUE (mr) dataset for language nlp.

Benchmark Marathi language models on the IndicGLUE suite to establish performance baselines across NLU tasks.

Quick Start

from datasets import load_dataset
ds = load_dataset('ai4bharat/indic_glue', 'wnli.mr', split='train')
print(f"Task samples: {len(ds)}")
for ex in ds[:5]:
    print(f"Text: {str(ex)[:100]}...")

Modality

text

Size

Multi-task NLU benchmark for 11 languages

License

CC0-1.0

Format

CSV/JSON

Language

Update Frequency

static

Organization

AI4Bharat, IIT Madras

Schema

Field	Type	Description
text	string	Input text for the NLU task
label	string	Task-specific label or target

Build With This

Create an automated Marathi model evaluation pipeline that runs IndicGLUE benchmarks on new models and publishes leaderboard results

Develop a multi-task Marathi NLU model trained jointly on all IndicGLUE tasks for efficient deployment

Build a Marathi model distillation framework that compresses large models while maintaining IndicGLUE benchmark scores

AI Use Cases

NLU benchmark evaluation

Related Datasets

AI4Bharat BPCC (mr)

parallel-text

AI4Bharat IndicCorp v1 (mr)

text

AI4Bharat IndicCorp v2 (Marathi)

text

AI4Bharat IndicHeadlineGeneration (mr)

text

Last verified: 2026-03-07