Large-scale Marathi question answering dataset with 118,516 training, 11,873 validation, and 11,803 test QA samples, modeled after SQuAD.
from datasets import load_dataset
ds = load_dataset('l3cube-pune/MahaSQuAD')
qa = ds['train'][0]
print(f"Q: {qa['question']}")
print(f"A: {qa['answers']['text'][0]}")
print(f"Context: {qa['context'][:100]}...")| Field | Type | Description |
|---|---|---|
| context | string | Context paragraph in Marathi containing the answer |
| question | string | Question in Marathi about the context |
| answers | object | Answer object with 'text' (list of answer strings) and 'answer_start' (list of character offsets) |