XL-Sum Marathi (BBC)
10,903 article-summary pairs from BBC Marathi website with professionally written, highly abstractive summaries. Part of the 45-language XL-Sum benchmark. Gold-standard editorial quality summaries that crowdsourced datasets cannot match.
Build a Marathi news summarization model trained on professional BBC Marathi summaries for high-quality output.
Quick Start
from datasets import load_dataset
ds = load_dataset('csebuetnlp/xlsum', 'marathi', split='train')
print(f"Articles: {len(ds)}")
for ex in ds[:3]:
print(f"Title: {ex['title'][:60]}")
print(f"Summary: {ex['summary'][:80]}...\n")
Size
~10,903 Marathi samples with train/val/test splits
Organization
BUET CSE NLP Group
Schema
| Field | Type | Description |
|---|
| text | string | Full BBC Marathi news article text |
| summary | string | Professional summary of the article |
| title | string | Article headline |
| url | string | Source BBC URL |
Build With This
Create a daily Marathi news brief generator that produces BBC-quality summaries of top Maharashtra stories
Develop a headline generation model trained on BBC Marathi articles for automated news headline creation
Build a cross-lingual news summarizer that generates Marathi summaries from English BBC articles using this as training data
AI Use Cases
Marathi abstractive summarization benchmarkingNews content summarizationCross-lingual summarization evaluationFine-tuning LLMs for Marathi text generation
Last verified: 2026-03-09