L3Cube-MahaHate

L3Cube-MahaHate

MH Specific

L3Cube-MahaHate dataset for language nlp.

Build a Marathi content moderation API that flags hate speech in real-time for community forum platforms.
HomepageGitHub

Quick Start

from datasets import load_dataset
ds = load_dataset('l3cube-pune/MahaHate')
for ex in ds['train'][:5]:
    print(f"[{ex['label']}] {ex['text'][:80]}...")
Modality
text
Size
25,000 tweets (4-class), 37,500 tweets (binary)
License
Format
CSV
Language
mr
Update Frequency
static
Organization
L3Cube, Pune

Schema

FieldTypeDescription
textstringMarathi tweet text
labelstringHate speech class label (hate, offensive, profane, not in binary; hate/not-hate in binary)

Build With This

Create an automated moderation pipeline for Marathi-language social media communities that filters hateful content before it reaches users
Develop a caste-discrimination and communal-hate monitoring dashboard for civil society organizations tracking online abuse in Maharashtra
Build a browser extension that warns users about hateful Marathi content and suggests constructive alternatives

AI Use Cases

Hate speech detection
Last verified: 2026-03-07