Evaluation benchmarks, NLP toolkits, dialect resources, and fairness datasets for Marathi.
11 datasets
728 stereotypes with contrasts in parallel across 16 languages including Marathi. Annotated with regional and demographic features for evaluating LLM bias. The only bias/fairness evaluation dataset available in Marathi, critical for responsible AI development.
Marathi — Human-translated evaluation benchmark for machine translation covering 200+ languages including Marathi, with 3,001 sentences from diverse web articles
Python library for Indian language text processing including tokenisation, normalisation, script conversion, and transliteration with full support for Devanagari/Marathi
Marathi Subset — Natural language understanding benchmark for 11 Indian languages including Marathi, covering tasks like news categorisation, headline prediction, and paraphrase detection
Natural Language Inference (NLI) dataset for 11 Indic languages including Marathi, created by high-quality machine translation of the English XNLI dataset. Contains premise-hypothesis pairs with entailment, contradiction, and neutral labels for evaluating Marathi language understanding.
Marathi — Comprehensive NLU benchmark of 9 tasks across 20 Indian languages including Marathi, covering classification, structure prediction, QA, and sentence retrieval
Deep learning-based NLP library supporting Marathi with pre-trained language models, text generation, tokenisation, sentence embeddings, and data augmentation
Comprehensive Marathi NLP library including MahaBERT, MahaAlBERT, MahaRoBERTa language models, MahaFT word embeddings, and tools for tokenisation, sentiment, NER, and hate speech detection
Evaluation results and benchmark scores for MahaBERT (L3Cube) and IndicBERT (AI4Bharat) models on Marathi NLU tasks including sentiment, NER, and text classification
Regional dialect data and linguistic documentation for major Marathi dialect varieties including Varhadi (Vidarbha), Malvani (Konkan coast), and Deshi (Western Maharashtra)
Translated MMLU (Massive Multitask Language Understanding) benchmark in 10 Indian languages including Marathi. Contains multiple-choice questions spanning science, humanities, social sciences, and more. Standard benchmark for evaluating how well Marathi LLMs compare to English ones.