L3Cube-MahaHate - Awesome Marathi Datasets

L3Cube-MahaHate

MH Specific

L3Cube-MahaHate dataset for language nlp.

Build a Marathi content moderation API that flags hate speech in real-time for community forum platforms.

Quick Start

from datasets import load_dataset
ds = load_dataset('l3cube-pune/MahaHate')
for ex in ds['train'][:5]:
    print(f"[{ex['label']}] {ex['text'][:80]}...")

Modality

text

Size

25,000 tweets (4-class), 37,500 tweets (binary)

License

CC-BY-NC-SA-4.0

Format

CSV

Language

Update Frequency

static

Organization

L3Cube, Pune

Schema

Field	Type	Description
text	string	Marathi tweet text
label	string	Hate speech class label (hate, offensive, profane, not in binary; hate/not-hate in binary)

Build With This

Create an automated moderation pipeline for Marathi-language social media communities that filters hateful content before it reaches users

Develop a caste-discrimination and communal-hate monitoring dashboard for civil society organizations tracking online abuse in Maharashtra

Build a browser extension that warns users about hateful Marathi content and suggests constructive alternatives

AI Use Cases

Hate speech detection

Related Datasets

AI4Bharat BPCC (mr)

parallel-text

AI4Bharat IndicCorp v1 (mr)

text

AI4Bharat IndicCorp v2 (Marathi)

text

AI4Bharat IndicGLUE (mr)

text

Last verified: 2026-03-07