Marathi Wikipedia Articles Corpus

Marathi Wikipedia Articles Corpus

MH Specific

Full dump of Marathi Wikipedia articles providing encyclopaedic knowledge coverage across diverse topics in Marathi language

Build a Marathi text classification model trained on Wikipedia article categories for document categorization.
Homepage

Quick Start

from datasets import load_dataset
ds = load_dataset('wikimedia/wikipedia', '20231101.mr', split='train')
print(f"Total articles: {len(ds)}")
for ex in ds[:5]:
    print(f"Title: {ex['title']}, Length: {len(ex['text'])} chars")
Modality
Text (Marathi)
Size
90,000+ articles
License
Format
CSV/JSON
Language
mr
Update Frequency
static
Organization
Wikimedia Foundation

Schema

FieldTypeDescription
textstringMarathi Wikipedia article content
titlestringArticle title in Marathi

Build With This

Create a Marathi reading difficulty scorer that estimates text complexity of Wikipedia articles for educational content leveling
Develop a Marathi named entity resource by extracting structured entity data from Wikipedia infoboxes
Build a content gap analyzer comparing Marathi Wikipedia coverage against English Wikipedia for translation prioritization

AI Use Cases

Knowledge base constructionMarathi RAG systemslanguage model pre-training
Last verified: 2026-03-07