Indic NLP Library

Indic NLP Library

MH Specific

Python library for Indian language text processing including tokenisation, normalisation, script conversion, and transliteration with full support for Devanagari/Marathi

Build a Marathi text preprocessing pipeline using Indic NLP Library for tokenization, normalization, and script conversion.
HomepageGitHub

Quick Start

from indicnlp.tokenize import indic_tokenize
from indicnlp.normalize import indic_normalize
text = 'मराठी भाषा प्रक्रिया'
tokens = indic_tokenize.trivial_tokenize(text, 'mr')
print(f'Tokens: {tokens}')
Modality
Tools (Python)
Size
20+ Indian languages
License
Format
Various
Language
mr
Update Frequency
static
Organization
AI4Bharat / Anuvaad

Schema

FieldTypeDescription
functionstringNLP function name (tokenize, normalize, transliterate)
languagestringSupported language code

Build With This

Create a Marathi text cleaning service combining Indic NLP Library functions for web-scraped content normalization
Develop a Marathi script converter handling Devanagari to Roman and back for transliteration applications
Build a Marathi morphological analyzer extending Indic NLP Library with stemming and lemmatization

AI Use Cases

Text preprocessing pipelinescript normalisationtokenisation for downstream NLP
Last verified: 2026-03-07