Collection of 16 Indic language datasets from IIT Bombay hosted on IndiaAI's AIKOSH platform as part of the BharatGen initiative. Includes handwritten and printed Devanagari script images, scanned table recognition data, 78+ hours of multilingual audio, QA pairs, and math word problems. Covers Marathi plus 9 other Indian languages.
# Access from https://aikosh.indiaai.gov.in/
print('AIKosh - India AI Indic Datasets')
print('Register at aikosh.indiaai.gov.in for access')| Field | Type | Description |
|---|---|---|
| image | image | Image file for vision tasks |
| label | string | Classification label or annotation |
| language | string | Language code for text components |