Marathi Wikipedia Dump

Marathi Wikipedia Dump

MH Specific

Marathi Wikipedia Dump dataset for language nlp.

Use the raw Marathi Wikipedia dump to build a structured knowledge graph of Maharashtra-related entities and relationships.
HomepageHuggingFace

Quick Start

# Download from https://dumps.wikimedia.org/mrwiki/
import bz2
import xml.etree.ElementTree as ET
# with bz2.open('mrwiki-latest-pages-articles.xml.bz2', 'rt') as f:
#     tree = ET.iterparse(f, events=('start', 'end'))
print("Download Marathi Wikipedia dump from dumps.wikimedia.org/mrwiki/")
Modality
text
Size
~101,000 articles
License
Format
CSV/JSON
Language
mr
Update Frequency
static
Organization
Wikimedia Foundation

Schema

FieldTypeDescription
textstringRaw Wikipedia dump text in Marathi
titlestringArticle title
idstringWikipedia article ID

Build With This

Create a Marathi Wikipedia vandalism detector that identifies and reverts malicious edits in real-time
Develop a cross-lingual knowledge alignment tool linking Marathi Wikipedia articles to their Hindi and English counterparts
Build an automated Marathi Wikipedia article expander that suggests missing content sections based on English article structure

AI Use Cases

Knowledge extractionLM pretraining
Last verified: 2026-03-07