Marathi Wikipedia

Marathi Wikipedia

MH Specific

Knowledge Base for RAG — Full Marathi Wikipedia dump providing broad encyclopaedic coverage across diverse topics, suitable as a knowledge base for retrieval-augmented generation systems

Build a Marathi question-answering system trained on Wikipedia articles for general knowledge queries.
Homepage

Quick Start

from datasets import load_dataset
ds = load_dataset('wikimedia/wikipedia', '20231101.mr', split='train', streaming=True)
for i, ex in enumerate(ds):
    print(f"Title: {ex['title']}")
    print(f"Text: {ex['text'][:80]}...\n")
    if i >= 4: break
Modality
Text (Marathi)
Size
90,000+ articles
License
Format
CSV/JSON
Language
mr
Update Frequency
static
Organization
Wikimedia Foundation

Schema

FieldTypeDescription
textstringFull Wikipedia article text in Marathi
titlestringArticle title

Build With This

Create a Marathi knowledge base from Wikipedia for building a retrieval-augmented generation (RAG) system
Develop a Marathi educational chatbot that answers student questions using Wikipedia as the knowledge source
Build a Marathi entity linking system that maps entity mentions in text to their Wikipedia articles

AI Use Cases

Marathi knowledge QAopen-domain RAGfactual grounding for language models
Last verified: 2026-03-07