iNLTK Marathi News Headlines

iNLTK Marathi News Headlines

MH Specific

Marathi news classification dataset containing ~12,000 news article headlines collected from a Marathi news website, labeled across 3 categories (state 62%, entertainment 27%, sports 10%). Part of the iNLTK (Indic Natural Language Toolkit) project.

Build a Marathi news topic classifier that automatically categorizes incoming news articles for a Marathi news aggregator app.

Quick Start

from datasets import load_dataset
ds = load_dataset('inltk/marathi-news-headlines')
print(f"Headlines: {len(ds['train'])}")
for ex in ds['train'][:5]:
    print(f"[{ex['category']}] {ex['headline'][:80]}")
Modality
text
Size
~12,000 headlines; 3 categories
License
Format
CSV
Language
mr
Update Frequency
static
Organization
iNLTK / DISISBIG

Schema

FieldTypeDescription
headlinestringMarathi news headline text
categorystringNews category (sports, entertainment, politics, etc.)

Build With This

Create a Marathi trending topics detector that identifies emerging news stories from headline clustering
Develop a Marathi headline style transfer tool that rewrites headlines for different audience segments (formal vs casual)
Build a fake news detector for Marathi by analyzing headline linguistic patterns and comparing against verified sources

AI Use Cases

Marathi news topic classificationShort text classificationNews content categorizationHeadline generation training
Last verified: 2026-03-09