Marathi Alpaca Instruction Dataset

MH Specific

Marathi translation of the Stanford Alpaca instruction-tuning dataset for fine-tuning instruction-following capabilities in Marathi language models

Fine-tune a Marathi instruction-following LLM using this Alpaca-format dataset for building a Marathi AI assistant.

Homepage HuggingFace

Quick Start

from datasets import load_dataset
ds = load_dataset('ravithejads/marathi-alpaca')
for ex in ds['train'][:5]:
    print(f"Instruction: {ex['instruction'][:60]}...")
    print(f"Output: {ex['output'][:60]}...\n")

Modality

Text (Marathi)

Size

~52K instructions

License

Open Research

Format

CSV/JSON

Language

Update Frequency

static

Organization

Open-Source Community (Translated from Stanford Alpaca)

Schema

Field	Type	Description
instruction	string	Task instruction in Marathi
input	string	Optional input context for the task
output	string	Expected response in Marathi

Build With This

Create a Marathi chatbot for Maharashtra government services that answers citizen queries about schemes and procedures

Develop a Marathi writing assistant that helps with grammar correction, paraphrasing, and style improvement

Build a Marathi educational tutor that generates explanations, quiz questions, and study materials on demand

AI Use Cases

Marathi instruction fine-tuningconversational AI trainingtask-following model development

Related Datasets

AI4Bharat IndicQA

Text (Marathi)

Government Scheme Documents for RAG

Text (PDF, web)

Maharashtra Government Resolutions (mahGRs)

Text (Marathi + English)

Marathi Wikipedia

Text (Marathi)

Last verified: 2026-03-07