AI4Bharat BPCC (mr)

AI4Bharat BPCC (mr)

MH Specific

AI4Bharat BPCC (mr) dataset for language nlp.

Build a domain-specific English-to-Marathi translation model fine-tuned on government and legal text for Maharashtra administrative use.
HomepageHuggingFace

Quick Start

from datasets import load_dataset
ds = load_dataset('ai4bharat/BPCC', 'mr-en', split='train', streaming=True)
for i, ex in enumerate(ds):
    print(f"EN: {ex['src'][:60]}...")
    print(f"MR: {ex['tgt'][:60]}...\n")
    if i >= 4: break
Modality
parallel-text
Size
Part of 230M bitext pairs across 22 languages
License
Format
CSV/JSON
Language
mr
Update Frequency
static
Organization
AI4Bharat, IIT Madras

Schema

FieldTypeDescription
srcstringSource sentence in English
tgtstringTarget sentence in Marathi
domainstringText domain (general, government, etc.)

Build With This

Create a real-time English-to-Marathi translation Chrome extension for Maharashtra government employees reading central government communications
Develop a bilingual document alignment tool for Maharashtra legislative assembly to maintain English-Marathi parallel versions of bills
Build a translation quality estimator that scores machine translations of Marathi without reference translations

AI Use Cases

Machine translation
Last verified: 2026-03-07