Social media-based Marathi Named Entity Recognition dataset with annotations for entities in informal, code-mixed social media text. Addresses the gap between formal NER (like Naamapadam) and real-world social media Marathi usage with noisy, informal text patterns.
# Download from https://github.com/l3cube-pune/MarathiNLP
import pandas as pd
df = pd.read_csv('MahaSocialNER.csv')
print(f"Samples: {len(df)}")
print(df.head())| Field | Type | Description |
|---|---|---|
| tokens | list[string] | List of word tokens from social media text |
| ner_tags | list[string] | Named entity tags in BIO format |