IndicXNLI Marathi - Awesome Marathi Datasets

IndicXNLI Marathi

Natural Language Inference (NLI) dataset for 11 Indic languages including Marathi, created by high-quality machine translation of the English XNLI dataset. Contains premise-hypothesis pairs with entailment, contradiction, and neutral labels for evaluating Marathi language understanding.

Build a Marathi fact-checking assistant that uses natural language inference to verify claims against known facts.

Homepage HuggingFace Paper

Quick Start

from datasets import load_dataset
ds = load_dataset('Divyanshu/indicxnli', 'mr', split='test')
for ex in ds[:5]:
    labels = ['entailment', 'neutral', 'contradiction']
    print(f"P: {ex['premise'][:60]}...")
    print(f"H: {ex['hypothesis'][:60]}...")
    print(f"Label: {labels[ex['label']]}\n")

Modality

text

Size

~393K train, ~2.5K dev, ~5K test pairs for Marathi

License

CC-BY-NC-4.0

Format

Parquet / JSON

Language

mr, en

Update Frequency

static

Organization

AI4Bharat

Schema

Field	Type	Description
premise	string	Premise sentence in Marathi
hypothesis	string	Hypothesis sentence in Marathi
label	int	Entailment label (0=entailment, 1=neutral, 2=contradiction)

Build With This

Create a Marathi reading comprehension evaluator that tests whether students can infer correct conclusions from passages

Develop a news consistency checker that identifies contradictory claims across Marathi news articles

Build a Marathi argument analysis tool that identifies supporting and contradicting evidence in debate transcripts

AI Use Cases

Marathi natural language understanding evaluationTextual entailment for document classificationCross-lingual transfer learning benchmarkingMultilingual model evaluation

Related Datasets

BiasShades Marathi (LLM Bias Evaluation)

text

FLORES-200 Benchmark

Text (parallel, Marathi)

Google Fonts Devanagari Collection

Font files (TTF/OTF)

Indic NLP Library

Tools (Python)

Last verified: 2026-03-09