Knowledge Base for RAG — Full Marathi Wikipedia dump providing broad encyclopaedic coverage across diverse topics, suitable as a knowledge base for retrieval-augmented generation systems
from datasets import load_dataset
ds = load_dataset('wikimedia/wikipedia', '20231101.mr', split='train', streaming=True)
for i, ex in enumerate(ds):
print(f"Title: {ex['title']}")
print(f"Text: {ex['text'][:80]}...\n")
if i >= 4: break| Field | Type | Description |
|---|---|---|
| text | string | Full Wikipedia article text in Marathi |
| title | string | Article title |