L3Cube-MahaSocialNER

L3Cube-MahaSocialNER

MH Specific

Social media-based Marathi Named Entity Recognition dataset with annotations for entities in informal, code-mixed social media text. Addresses the gap between formal NER (like Naamapadam) and real-world social media Marathi usage with noisy, informal text patterns.

Build a social media entity tracker for Marathi Twitter/X that identifies mentions of people, organizations, and locations in real-time.
HomepageGitHub

Quick Start

# Download from https://github.com/l3cube-pune/MarathiNLP
import pandas as pd
df = pd.read_csv('MahaSocialNER.csv')
print(f"Samples: {len(df)}")
print(df.head())
Modality
text
Size
Social media NER annotations; BIO-tagged entities
License
Format
CoNLL / CSV
Language
mr
Update Frequency
static
Organization
L3Cube, Pune

Schema

FieldTypeDescription
tokenslist[string]List of word tokens from social media text
ner_tagslist[string]Named entity tags in BIO format

Build With This

Create a Marathi social media knowledge graph builder that links mentioned entities across posts to map public discourse networks
Develop a brand mention tracker for companies operating in Maharashtra that identifies entity references in informal Marathi text
Build a geo-tagging system for Marathi social media posts that extracts and resolves location mentions to coordinates

AI Use Cases

Social media entity extraction in MarathiInformal text NER for trend analysisSocial media monitoring for Maharashtra brandsCode-mixed Marathi-English entity recognition
Last verified: 2026-03-09