Marathi news classification dataset containing ~12,000 news article headlines collected from a Marathi news website, labeled across 3 categories (state 62%, entertainment 27%, sports 10%). Part of the iNLTK (Indic Natural Language Toolkit) project.
from datasets import load_dataset
ds = load_dataset('inltk/marathi-news-headlines')
print(f"Headlines: {len(ds['train'])}")
for ex in ds['train'][:5]:
print(f"[{ex['category']}] {ex['headline'][:80]}")| Field | Type | Description |
|---|---|---|
| headline | string | Marathi news headline text |
| category | string | News category (sports, entertainment, politics, etc.) |