Google Dakshina (mr)

Google Dakshina (mr)

MH Specific

Google Dakshina (mr) dataset for language nlp.

Build a Marathi transliteration engine that converts Roman-script Marathi (commonly typed on phones) to proper Devanagari.
HomepageGitHub

Quick Start

from datasets import load_dataset
ds = load_dataset('google/dakshina', 'mr', split='train')
for ex in ds[:5]:
    print(f"Devanagari: {ex['native']}")
    print(f"Roman: {ex['romanized']}\n")
Modality
text, transliteration
Size
25,000 lexicon entries + 10,000 romanized sentences
License
Format
CSV/JSON
Language
mr
Update Frequency
static
Organization
Google Research

Schema

FieldTypeDescription
nativestringText in Devanagari script
romanizedstringText in Latin/Roman script transliteration

Build With This

Create a Marathi keyboard input method that auto-corrects Roman-to-Devanagari transliteration in real-time
Develop a social media normalizer that converts code-mixed Roman-Marathi text to standard Devanagari for NLP processing
Build a Marathi OCR post-processor that handles mixed-script documents with both Roman and Devanagari text

AI Use Cases

Transliteration
Last verified: 2026-03-07