XL-Sum Marathi (BBC)

XL-Sum Marathi (BBC)

10,903 article-summary pairs from BBC Marathi website with professionally written, highly abstractive summaries. Part of the 45-language XL-Sum benchmark. Gold-standard editorial quality summaries that crowdsourced datasets cannot match.

Build a Marathi news summarization model trained on professional BBC Marathi summaries for high-quality output.

Quick Start

from datasets import load_dataset
ds = load_dataset('csebuetnlp/xlsum', 'marathi', split='train')
print(f"Articles: {len(ds)}")
for ex in ds[:3]:
    print(f"Title: {ex['title'][:60]}")
    print(f"Summary: {ex['summary'][:80]}...\n")
Modality
text
Size
~10,903 Marathi samples with train/val/test splits
License
Format
JSON
Language
mr
Update Frequency
static
Organization
BUET CSE NLP Group

Schema

FieldTypeDescription
textstringFull BBC Marathi news article text
summarystringProfessional summary of the article
titlestringArticle headline
urlstringSource BBC URL

Build With This

Create a daily Marathi news brief generator that produces BBC-quality summaries of top Maharashtra stories
Develop a headline generation model trained on BBC Marathi articles for automated news headline creation
Build a cross-lingual news summarizer that generates Marathi summaries from English BBC articles using this as training data

AI Use Cases

Marathi abstractive summarization benchmarkingNews content summarizationCross-lingual summarization evaluationFine-tuning LLMs for Marathi text generation
Last verified: 2026-03-09