Open-source 7B parameter vision-language model specifically trained for Indian document understanding, from government forms to handwritten pages. Handles the varied structure of scanned and photographed Indian documents including Devanagari text. Achieves strong scores on DocVQA (0.855), VisualMRC (0.851), and the custom Patram-Bench. Can be fine-tuned for specific Marathi document types (7/12 extracts, certificates, forms). Represents the current state-of-the-art in open-source Indian document AI.
from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image
model = AutoModelForVision2Seq.from_pretrained("bharatgenai/patram-7b-instruct")
processor = AutoProcessor.from_pretrained("bharatgenai/patram-7b-instruct")
image = Image.open("marathi_document.jpg")
prompt = "Extract all field values from this Marathi document"
# inputs = processor(images=image, text=prompt, return_tensors="pt")
print("Patram-7B: Indian document VLM for Marathi form understanding")| Field | Type | Description |
|---|---|---|
| input_image | image | Document image (scanned or photographed) |
| query | string | Question or extraction instruction about the document |
| response | string | Extracted text, field values, or answer |