Sarvam AI Launches Indic Language Model ‘Sarvam-1’

Sarvam AI Launches Indic Language Model ‘Sarvam-1’

SUMMARY

Sarvam-1 is optimised for 10 Indian languages, including Hindi, Bengali, Tamil, and Telugu, besides English.

The model aims to tackle two key challenges – token inefficiency and poor data quality for Indic languages.

Sarvam AI also announced a partnership with Yotta Data Services for the Indic language model

Sarvam AI has launched Sarvam-1, a 2 Bn parameter large language model built specifically for Indian languages.

In a blogpost, the startup said that the model is optimised for 10 Indian languages, including Hindi, Bengali, Tamil, and Telugu, besides English.

The model aims to tackle two key challenges – token inefficiency and poor data quality for Indic languages. 

Token inefficiency refers to the number of pieces (tokens) a language model needs to break a word into in order to process it. For instance, in English, a word like “apple” might be processed as one token. But in some Indian languages, the same word might get split into 4-8 tokens. This makes processing slower and less efficient.

Sarvam-1 claims to have achieved a token efficiency rate of 1.4-2.1 tokens per word (vs. 4-8 in existing models). It said that the LLM is trained on Sarvam-2T, a 2-trillion-token dataset curated specifically for Indian languages. This ensures better performance in areas like cross-lingual translation and question-answering.

Despite being smaller than models like Meta’s Llama-3.2-3B, Sarvam-1 claims to have outperformed them in several industry benchmarks. 

Sarvam-1 is now available for download on Hugging Face.

Earlier on Thursday (October 25), chip giant Nvidia’s CEO Jensen Huang said that the Hindi language model is the hardest to develop.

Meanwhile, Sarvam AI also announced its partnership with Yotta Data Services. The Sarvam-1 model has been trained on Yotta’s Shakti Cloud infrastructure, the startup said. 

Earlier this year, the startup launched its full-stack GenAI platform comprising multiple products — Sarvam Agents, Sarvam 2B, Shuka 1.0, Sarvam Models, and A1.

The startup raised $41 Mn (around INR 342 Cr) in its Series A funding round led by Lightspeed Venture Partners, in participation with Peak XV Partners and Khosla Ventures, in December last year. 

At the heart of all these is the growing Indian GenAI market, which is expected to clock a CAGR of 48% between 2023 and 2030 to become an over $17 Bn opportunity.

You have reached your limit of free stories
Become An Inc42 Plus Member

Become a Startup Insider in 2024 with Inc42 Plus. Join our exclusive community of 10,000+ founders, investors & operators and stay ahead in India’s startup & business economy.

2 YEAR PLAN
₹19999
₹7999
₹333/Month
UNLOCK 60% OFF
Cancel Anytime
1 YEAR PLAN
₹9999
₹4999
₹416/Month
UNLOCK 50% OFF
Cancel Anytime
Already A Member?
Discover Startups & Business Models

Unleash your potential by exploring unlimited articles, trackers, and playbooks. Identify the hottest startup deals, supercharge your innovation projects, and stay updated with expert curation.

Sarvam AI Launches Indic Language Model ‘Sarvam-1’-Inc42 Media
How-To’s on Starting & Scaling Up

Empower yourself with comprehensive playbooks, expert analysis, and invaluable insights. Learn to validate ideas, acquire customers, secure funding, and navigate the journey to startup success.

Sarvam AI Launches Indic Language Model ‘Sarvam-1’-Inc42 Media
Identify Trends & New Markets

Access 75+ in-depth reports on frontier industries. Gain exclusive market intelligence, understand market landscapes, and decode emerging trends to make informed decisions.

Sarvam AI Launches Indic Language Model ‘Sarvam-1’-Inc42 Media
Track & Decode the Investment Landscape

Stay ahead with startup and funding trackers. Analyse investment strategies, profile successful investors, and keep track of upcoming funds, accelerators, and more.

Sarvam AI Launches Indic Language Model ‘Sarvam-1’-Inc42 Media
Sarvam AI Launches Indic Language Model ‘Sarvam-1’-Inc42 Media
You’re in Good company