L3Cube-MahaParaphrase

L3Cube-MahaParaphrase

MH Specific

L3Cube-MahaParaphrase dataset for language nlp.

Build a semantic search engine for Marathi educational content that finds similar questions and answers across textbooks.
HomepageGitHub

Quick Start

from datasets import load_dataset
ds = load_dataset('l3cube-pune/MahaParaphrase')
for ex in ds['train'][:5]:
    print(f"S1: {ex['sentence1'][:60]}...")
    print(f"S2: {ex['sentence2'][:60]}...")
    print(f"Paraphrase: {bool(ex['label'])}\n")
Modality
text
Size
8,000 sentence pairs
License
Format
CSV
Language
mr
Update Frequency
static
Organization
L3Cube, Pune

Schema

FieldTypeDescription
sentence1stringFirst Marathi sentence
sentence2stringSecond Marathi sentence
labelintWhether the sentences are paraphrases (1) or not (0)

Build With This

Create a question deduplication system for Marathi Q&A platforms that merges duplicate questions and consolidates answers
Develop a FAQ matching service that instantly finds whether a customer's Marathi query already has a documented answer
Build a plagiarism detection tool for Marathi academic submissions that identifies paraphrased content

AI Use Cases

Paraphrase detection
Last verified: 2026-03-07