L3Cube-MahaNews - Awesome Marathi Datasets

L3Cube-MahaNews

MH Specific

L3Cube-MahaNews dataset for language nlp.

Build a Marathi news aggregation and auto-tagging service that categorizes articles by topic for personalized feeds.

Quick Start

from datasets import load_dataset
ds = load_dataset('l3cube-pune/MahaNews')
for ex in ds['train'][:5]:
    print(f"[{ex['label']}] {ex['text'][:80]}...")

Modality

text

Size

1.05 lakh records across 12 news categories

License

CC-BY-NC-SA-4.0

Format

CSV

Language

Update Frequency

static

Organization

L3Cube, Pune

Schema

Field	Type	Description
text	string	Marathi news headline or article text
label	string	News topic category (one of 12 categories: politics, sports, entertainment, etc.)

Build With This

Create a breaking-news detection system that classifies and prioritizes incoming Marathi news by topic and urgency

Develop a personalized Marathi news digest app that learns user preferences and recommends stories from their favourite categories

Build a media monitoring service for Maharashtra government officials that tracks coverage by topic across Marathi publications

AI Use Cases

News topic classification

Related Datasets

AI4Bharat BPCC (mr)

parallel-text

AI4Bharat IndicCorp v1 (mr)

text

AI4Bharat IndicCorp v2 (Marathi)

text

AI4Bharat IndicGLUE (mr)

text

Last verified: 2026-03-07