BharatGen: Decoding India’s Bid To Build Maiden State-Funded Multimodal LLM

BharatGen: Decoding India’s Bid To Build Maiden State-Funded Multimodal LLM

SUMMARY

The key “distinguishing features” of BharatGen will be its multilingual and multimodal nature, indigenously built datasets, open-source architecture, among others

By July 2026, Indian authorities have set their eyes on extensive AI model development, experimentation, and the establishment of AI benchmarks tailored to India’s needs

One of the core features of BharatGen will be its focus on data-efficient learning, particularly for Indian languages with limited digital presence

It was December of 2023 and Prime Minister (PM) Narendra Modi had just taken to the stage to address the gathering of people at the Kashi Tamil Sangamam in Uttar Pradesh’s Varanasi. Just as PM Modi commenced his address in Hindi, the attendees plugged in their headphones to hear the translated version of his speech in real time.

At work was the Centre’s ambitious AI platform, Bhashini, a language translation platform that aims to make digital services and the internet more accessible in Indian languages. However, much has happened in the 10 months since then.

Since then, the government has taken a series of steps to bolster its AI offerings. Most recently, the Centre announced the BharatGen project, touted as the world’s first government-funded multimodal large language model (LLM) project.

The Ministry of Science said that it will undertake the development of the multimodal LLM project, which will be focussed on creating “efficient and inclusive AIs” in Indian languages.

Once completed, BharatGen will be able to generate high-quality text and “multimodal content” in various Indian languages. 

For the uninitiated, a multimodal LLM can process multiple types of data, or modalities, such as text, images, audio, video, and 3D environments. It can also generate content in all these formats.

As per the government, there will be  four key “distinguishing features” of BharatGen: 

  • Multilingual and multimodal nature of foundation models
  • Indigenously built datasets, which will be leveraged to train the LLMs
  • Open-source architecture
  • Development of an ecosystem of GenAI research in India

The Making Of BharatGen

Slated to be completed in a span of two years, BharatGen will cater to both text and speech to ensure coverage across India’s “diverse linguistic landscape”. 

“Looking ahead, BharatGen’s roadmap outlines key milestones up to July 2026. These include extensive AI model development, experimentation, and the establishment of AI benchmarks tailored to India’s needs,” the government said in a statement.

To be undertaken under DST’s National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS), the development of BharatGen will be spearheaded by IIT Bombay. Besides, the execution of the project will also see participation from other academic institutes such as IIIT Hyderabad, IIT Mandi, IIT Kanpur, IIT Hyderabad, IIM Indore, and IIT Madras.

The multimodal LLM will be trained on multilingual datasets to “deeply capture” the nuances of Indian languages. 

In order to address this paucity of data sets, necessary to train AI models, BharatGen will also look to develop processes for collecting and curating India-centric data. This data will be accumulated in a way that the country’s diverse languages, dialects, and cultural contexts are accurately represented. 

Notably, one of the core features of BharatGen will be its focus on data-efficient learning, particularly for Indian languages with limited digital presence. The government will partner with multiple academic institutions to develop AI models that are effective with minimal data. 

“This emphasis on data sovereignty strengthens India’s control over its digital resources and narrative,” the statement added.

As of now, LLMs are predominantly trained in the English language as there is a plethora of data online with regards to the language. However, there have been attempts by the likes of Google to roll out their AI chatbots in multiple Indian languages on the back of the treasure trove of search-related data.

However, smaller players do not have access to such resources. And it is this chasm that the government wants to fill with its open-architecture LLM, which can be used by startups and academicians to build products on top of this tech stack and linguistic datasets. 

“BharatGen will deliver generative AI models and their applications as a public good by prioritising India’s socio-cultural and linguistic diversity. It strives to address India’s broader needs such as social equity, cultural preservation, and linguistic diversity, while ensuring that GenAI reaches all segments of society,” as per the government. 

Secretary in the department of science and technology (DST), Professor Abhay Karandikar, said that BharatGen will be leveraged to address “national priorities” such as cultural preservation and inclusive technology development, beyond merely making AI accessible to all and for industrial and commercial purposes.

Aligned with the government’s ‘Atmanirbhar Bharat’ vision, one of the stated goals of the project is to reduce “reliance on foreign technologies” and strengthen the domestic AI ecosystem for startups, industries, and government agencies. 

The Centre also believes that BharatGen will democratise access to AI through foundational models, adding that the tech stack will allow innovators, researchers, and startups to build AI applications quickly and affordably. 

The project will also look to foster a vibrant AI research community through training programmes, hackathons, and collaborations with global experts.

The proposed project is part of the Indian government’s overarching push for digital public infrastructure (DPI). Leveraging AI could give a further impetus to India’s existing digital public goods rails and pave the way for offering cost-effective solutions not just in India, but globally. 

The BharatGen project also echoes India’s focus on fostering the adoption of AI technologies. Earlier this year, the union cabinet approved the IndiaAI Mission with an allocation of INR 10,372 Cr over the course of next five years. The outlay will be utilised to facilitate funding for emerging AI startups and spur innovation in the sector.

In September, the government also invited applications from startups and researchers to build and deploy “impactful” AI solutions in key critical areas. Amid all these, the Centre has already constituted an advisory group to formulate a framework to regulate AI. 

At the heart of all this is the Indian AI landscape, which already hosts more than 100 startups that have raised more than $600 Mn between 2019 and H1 2024. As per Inc42 data, the Indian GenAI ecosystem is projected to be a $17 Bn market opportunity by 2030 on the back of the growing adoption of the emerging technology. 

Note: We at Inc42 take our ethics very seriously. More information about it can be found here.

You have reached your limit of free stories
Become An Inc42 Plus Member

Become a Startup Insider in 2024 with Inc42 Plus. Join our exclusive community of 10,000+ founders, investors & operators and stay ahead in India’s startup & business economy.

2 YEAR PLAN
₹19999
₹7999
₹333/Month
UNLOCK 60% OFF
Cancel Anytime
1 YEAR PLAN
₹9999
₹4999
₹416/Month
UNLOCK 50% OFF
Cancel Anytime
Already A Member?
Discover Startups & Business Models

Unleash your potential by exploring unlimited articles, trackers, and playbooks. Identify the hottest startup deals, supercharge your innovation projects, and stay updated with expert curation.

BharatGen: Decoding India’s Bid To Build Maiden State-Funded Multimodal LLM-Inc42 Media
How-To’s on Starting & Scaling Up

Empower yourself with comprehensive playbooks, expert analysis, and invaluable insights. Learn to validate ideas, acquire customers, secure funding, and navigate the journey to startup success.

BharatGen: Decoding India’s Bid To Build Maiden State-Funded Multimodal LLM-Inc42 Media
Identify Trends & New Markets

Access 75+ in-depth reports on frontier industries. Gain exclusive market intelligence, understand market landscapes, and decode emerging trends to make informed decisions.

BharatGen: Decoding India’s Bid To Build Maiden State-Funded Multimodal LLM-Inc42 Media
Track & Decode the Investment Landscape

Stay ahead with startup and funding trackers. Analyse investment strategies, profile successful investors, and keep track of upcoming funds, accelerators, and more.

BharatGen: Decoding India’s Bid To Build Maiden State-Funded Multimodal LLM-Inc42 Media
BharatGen: Decoding India’s Bid To Build Maiden State-Funded Multimodal LLM-Inc42 Media
You’re in Good company