IndicXNLI Marathi

IndicXNLI Marathi

Natural Language Inference (NLI) dataset for 11 Indic languages including Marathi, created by high-quality machine translation of the English XNLI dataset. Contains premise-hypothesis pairs with entailment, contradiction, and neutral labels for evaluating Marathi language understanding.

Build a Marathi fact-checking assistant that uses natural language inference to verify claims against known facts.

Quick Start

from datasets import load_dataset
ds = load_dataset('Divyanshu/indicxnli', 'mr', split='test')
for ex in ds[:5]:
    labels = ['entailment', 'neutral', 'contradiction']
    print(f"P: {ex['premise'][:60]}...")
    print(f"H: {ex['hypothesis'][:60]}...")
    print(f"Label: {labels[ex['label']]}\n")
Modality
text
Size
~393K train, ~2.5K dev, ~5K test pairs for Marathi
License
Format
Parquet / JSON
Language
mr, en
Update Frequency
static
Organization
AI4Bharat

Schema

FieldTypeDescription
premisestringPremise sentence in Marathi
hypothesisstringHypothesis sentence in Marathi
labelintEntailment label (0=entailment, 1=neutral, 2=contradiction)

Build With This

Create a Marathi reading comprehension evaluator that tests whether students can infer correct conclusions from passages
Develop a news consistency checker that identifies contradictory claims across Marathi news articles
Build a Marathi argument analysis tool that identifies supporting and contradicting evidence in debate transcripts

AI Use Cases

Marathi natural language understanding evaluationTextual entailment for document classificationCross-lingual transfer learning benchmarkingMultilingual model evaluation
Last verified: 2026-03-09