OpenAssistant OASST1/OASST2

OpenAssistant OASST1/OASST2

MH Specific

Multilingual (incl. Marathi) — Human-generated, human-annotated assistant-style conversation corpus in 35 languages including Marathi conversation trees with quality ratings

Fine-tune a Marathi conversational AI model using OpenAssistant multilingual data including Indic languages.
HomepageHuggingFace

Quick Start

from datasets import load_dataset
ds = load_dataset('OpenAssistant/oasst2')
print(f'Total messages: {len(ds["train"])}')
mr = [ex for ex in ds['train'] if ex.get('lang') == 'mr']
print(f'Marathi messages: {len(mr)}')
Modality
Text (multilingual)
Size
161K messages in 35 languages
License
Format
CSV/JSON
Language
mr
Update Frequency
static
Organization
LAION / Open Assistant Community

Schema

FieldTypeDescription
textstringConversation message text
rolestringMessage role (prompter, assistant)
langstringLanguage code
rankintQuality rank from human evaluation

Build With This

Create a Marathi RLHF training pipeline using human preference data from OpenAssistant for chatbot alignment
Develop a cross-lingual conversation quality evaluator benchmarking Marathi chatbot responses against other languages
Build a Marathi conversation dataset by translating high-ranked OpenAssistant dialogues for local context

AI Use Cases

Conversational AI trainingRLHF data for Marathimultilingual chat model fine-tuning
Last verified: 2026-03-07