L3Cube-MahaSum

L3Cube-MahaSum

MH Specific

Largest Marathi news summarization dataset containing 25,374 news articles from Lokmat and Loksatta with manually verified abstractive summaries. Covers politics, economics, culture, sports, and more. First large-scale abstractive summarization dataset for Marathi.

Build a Marathi news summarization API that generates concise summaries from full-length news articles for mobile news apps.
HomepageGitHubPaper

Quick Start

# Download from https://github.com/l3cube-pune/MarathiNLP
import pandas as pd
df = pd.read_csv('MahaSum.csv')
print(f"Article-summary pairs: {len(df)}")
for _, row in df.head(3).iterrows():
    print(f"Article: {row['article'][:80]}...")
    print(f"Summary: {row['summary'][:80]}...\n")
Modality
text
Size
25,374 article-summary pairs
License
Format
CSV / JSON
Language
mr
Update Frequency
static
Organization
L3Cube, Pune

Schema

FieldTypeDescription
articlestringFull Marathi news article text
summarystringHuman-written summary of the article

Build With This

Create a daily Marathi news digest service that summarizes top stories across categories for WhatsApp distribution
Develop a legislative bill summarizer for Maharashtra Assembly that condenses lengthy bills into accessible summaries for citizens
Build an extractive-abstractive hybrid summarizer for Marathi that first selects key sentences then paraphrases them

AI Use Cases

Marathi news article summarizationAbstractive text generationContent condensation for Marathi news appsHeadline generation from articles
Last verified: 2026-03-09