AI4Bharat Samanantar (Marathi)

AI4Bharat Samanantar (Marathi)

MH Specific

Largest publicly available English-Marathi parallel corpus with 3.32 million sentence pairs for machine translation.

Build an English-to-Marathi translation API for government schemes
HomepageHuggingFace

Quick Start

from datasets import load_dataset
ds = load_dataset("ai4bharat/samanantar", "mr")
print(ds["train"][0])
# {'src': 'English sentence', 'tgt': 'मराठी वाक्य', ...}
Modality
parallel-text
Size
3.32M sentence pairs
License
Format
Parquet
Language
mr, en
Update Frequency
static
Organization
AI4Bharat

Schema

FieldTypeDescription
srcstringSource sentence in English
tgtstringParallel translation in Marathi
src_langstringSource language code (en)
tgt_langstringTarget language code (mr)
data_sourcestringOrigin corpus the sentence pair was mined from

Build With This

WhatsApp bot that translates government scheme notifications to Marathi
Browser extension that translates English web pages to Marathi in real-time
Bilingual Marathi-English chatbot for tourism and hospitality

AI Use Cases

Machine translationCross-lingual transfer learningBilingual dictionary extractionParallel corpus mining
Last verified: 2026-03-07