L3Cube-MahaNER

L3Cube-MahaNER

MH Specific

Manually annotated Marathi named entity recognition dataset with 25,000 sentences tagged across 8 entity classes.

Build a named entity recognition service to extract person, organization, and location names from Marathi government documents

Quick Start

from datasets import load_dataset
ds = load_dataset('l3cube-pune/marathi-ner')
example = ds['train'][0]
print(f"Tokens: {example['tokens']}")
print(f"Tags: {example['ner_tags']}")
Modality
text
Size
25,000 sentences
License
Format
JSON
Language
mr
Update Frequency
static
Organization
L3Cube, Pune

Schema

FieldTypeDescription
tokenslist[string]List of tokenized words in the sentence
ner_tagslist[int]BIO-scheme entity tags per token (8 entity classes including PER, ORG, LOC)

Build With This

Create a knowledge graph from Marathi news articles using this NER dataset
Develop a Marathi legal document parser that extracts entities for contract analysis
Build a fine-tuned NER model for identifying crop names and agricultural organizations in Marathi advisory texts

AI Use Cases

Named entity recognitionInformation extractionKnowledge graph constructionDocument understanding
Last verified: 2026-03-07