L3Cube-MahaSQuAD

L3Cube-MahaSQuAD

MH Specific

Large-scale Marathi question answering dataset with 118,516 training, 11,873 validation, and 11,803 test QA samples, modeled after SQuAD.

Build a Marathi question-answering system for agricultural advisory queries from farmers.
HomepageGitHub

Quick Start

from datasets import load_dataset
ds = load_dataset('l3cube-pune/MahaSQuAD')
qa = ds['train'][0]
print(f"Q: {qa['question']}")
print(f"A: {qa['answers']['text'][0]}")
print(f"Context: {qa['context'][:100]}...")
Modality
text
Size
142,192 QA samples
License
Format
JSON
Language
mr
Update Frequency
static
Organization
L3Cube, Pune

Schema

FieldTypeDescription
contextstringContext paragraph in Marathi containing the answer
questionstringQuestion in Marathi about the context
answersobjectAnswer object with 'text' (list of answer strings) and 'answer_start' (list of character offsets)

Build With This

Create a RAG pipeline for answering questions over Marathi Wikipedia and government documents
Develop a customer support chatbot that answers FAQs in Marathi using extractive QA
Build an educational assistant that answers student questions about Marathi literature and history

AI Use Cases

Question answeringReading comprehensionRAG knowledge base evaluationConversational AI
Last verified: 2026-03-07