Largest Marathi news summarization dataset containing 25,374 news articles from Lokmat and Loksatta with manually verified abstractive summaries. Covers politics, economics, culture, sports, and more. First large-scale abstractive summarization dataset for Marathi.
# Download from https://github.com/l3cube-pune/MarathiNLP
import pandas as pd
df = pd.read_csv('MahaSum.csv')
print(f"Article-summary pairs: {len(df)}")
for _, row in df.head(3).iterrows():
print(f"Article: {row['article'][:80]}...")
print(f"Summary: {row['summary'][:80]}...\n")| Field | Type | Description |
|---|---|---|
| article | string | Full Marathi news article text |
| summary | string | Human-written summary of the article |