Manually annotated Marathi named entity recognition dataset with 25,000 sentences tagged across 8 entity classes.
from datasets import load_dataset
ds = load_dataset('l3cube-pune/marathi-ner')
example = ds['train'][0]
print(f"Tokens: {example['tokens']}")
print(f"Tags: {example['ner_tags']}")| Field | Type | Description |
|---|---|---|
| tokens | list[string] | List of tokenized words in the sentence |
| ner_tags | list[int] | BIO-scheme entity tags per token (8 entity classes including PER, ORG, LOC) |