AI4Bharat IndicParaphrase (mr) dataset for language nlp.
from datasets import load_dataset
ds = load_dataset('ai4bharat/IndicParaphrase', 'mr', split='train', streaming=True)
for i, ex in enumerate(ds):
print(f"S1: {ex['sentence1'][:60]}...")
print(f"S2: {ex['sentence2'][:60]}...")
print(f"Paraphrase: {bool(ex['label'])}\n")
if i >= 4: break| Field | Type | Description |
|---|---|---|
| sentence1 | string | First Marathi sentence |
| sentence2 | string | Second Marathi sentence (paraphrase or non-paraphrase) |
| label | int | 1 if paraphrase, 0 if not |