AI4Bharat BPCC (mr) dataset for language nlp.
from datasets import load_dataset
ds = load_dataset('ai4bharat/BPCC', 'mr-en', split='train', streaming=True)
for i, ex in enumerate(ds):
print(f"EN: {ex['src'][:60]}...")
print(f"MR: {ex['tgt'][:60]}...\n")
if i >= 4: break| Field | Type | Description |
|---|---|---|
| src | string | Source sentence in English |
| tgt | string | Target sentence in Marathi |
| domain | string | Text domain (general, government, etc.) |