UD Marathi-UFAL Treebank

MH Specific

UD Marathi-UFAL Treebank dataset for language nlp.

Build a Marathi grammar checker using dependency parsing from the Universal Dependencies treebank.

Quick Start

from datasets import load_dataset
ds = load_dataset('universal_dependencies', 'mr_ufal', split='train')
for ex in ds[:5]:
    print(f"Text: {ex['text'][:60]}...")
    print(f"POS: {ex['upos'][:8]}...\n")

Modality

text (CoNLL-U)

Size

466 sentences, 3,506 tokens

License

CC-BY-SA-4.0

Format

CSV/JSON

Language

Update Frequency

static

Organization

UFAL, Charles University

Schema

Field	Type	Description
text	string	Marathi sentence text
tokens	list[string]	Word tokens
upos	list[string]	Universal POS tags for each token
deprel	list[string]	Dependency relation labels

Build With This

Create a Marathi syntactic complexity analyzer for readability scoring of educational and government texts

Develop a Marathi POS tagger and parser for integration into downstream NLP pipelines

Build a Marathi sentence simplification tool that restructures complex sentences based on dependency parse analysis

AI Use Cases

POS taggingdependency parsing

Related Datasets

AI4Bharat BPCC (mr)

parallel-text

AI4Bharat IndicCorp v1 (mr)

text

AI4Bharat IndicCorp v2 (Marathi)

text

AI4Bharat IndicGLUE (mr)

text

Last verified: 2026-03-07