L3Cube-MahaNLP Toolkit

L3Cube-MahaNLP Toolkit

MH Specific

Comprehensive Marathi NLP library including MahaBERT, MahaAlBERT, MahaRoBERTa language models, MahaFT word embeddings, and tools for tokenisation, sentiment, NER, and hate speech detection

Build an end-to-end Marathi NLP pipeline using L3Cube models for text classification, NER, and sentiment analysis.
HomepageGitHub

Quick Start

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('l3cube-pune/marathi-roberta')
model = AutoModel.from_pretrained('l3cube-pune/marathi-roberta')
inputs = tokenizer('मराठी भाषा प्रक्रिया', return_tensors='pt')
outputs = model(**inputs)
print(f"Embedding shape: {outputs.last_hidden_state.shape}")
Modality
Models, Tools (Python)
Size
Models + MahaCorpus (752M tokens)
License
Format
Various
Language
mr
Update Frequency
static
Organization
L3Cube, Pune

Schema

FieldTypeDescription
model_namestringName of the pre-trained Marathi NLP model
taskstringNLP task (classification, NER, embedding, etc.)

Build With This

Create a Marathi text analytics API that combines L3Cube models into a single REST endpoint for multiple NLP tasks
Develop a comparison benchmarking suite evaluating L3Cube models against multilingual alternatives on Marathi tasks
Build a Marathi document understanding pipeline that chains L3Cube models for entity extraction, classification, and summarization

AI Use Cases

Marathi NLP pipelinemodel fine-tuningtext classificationNER
Last verified: 2026-03-07