BharatGen: 'This Isn't Day One Of AI; It Is Hour One'

'When a technology is this fundamental, a country should have its own version of it, rather than relying on whether someone else chooses to build it for you.'

BharatGen

Illustration: Dominic Xavier/Rediff

Key Points

BharatGen brings together nearly 10 academic institutions, including multiple IITs, IIITs, and IIM Indore
The AI ecosystem is vast. You need application builders, model tuners, data infrastructure, and foundational research
There is a lot of Indian data, but it needs to be unlocked.

With the government-backed BharatGen unveiling its next generation of large language models at the India AI Impact Summit, the idea of 'sovereign AI' is moving from slogan to substance.

BharatGen -- a consortium of leading academic institutions funded by the government -- is building foundational artificial intelligence (AI) models tailored for Indian languages, data, and use cases.

Rishi Bal, CEO, BharatGen, spoke to Khalid Anzar and Harsh Shivam/Business Standard in New Delhi about why India needs its native AI models, how the effort is being funded, what has been released so far, and what the upcoming 17-billion-parameter Param2 model is designed to achieve.

What is BharatGen?

BharatGen is a non-profit foundational AI startup that is entirely government-funded. It brings together nearly 10 academic institutions, including multiple IITs, IIITs, and IIM Indore.

The goal is to build a new generation of foundational AI models in India.

When you say 'foundational AI', what does that mean in practice?

AI has existed for decades, but a real shift happened around 2018-2019, when transformer architectures, large datasets, and large-scale compute came together.

That combination unlocked entirely new capabilities. We are building that foundational layer in India.

This is not Day One of AI; it is Hour One. When a technology is this fundamental, a country should have its own version of it, rather than relying on whether someone else chooses to build it for you.

India is often seen as a consumer of technology rather than a creator. Why does this moment feel different?

That view depends on how far back you look. Historically, India produced advances in science, mathematics, and technology. That changed due to long periods of disruption.

Today, India is the world's fourth-largest economy and is moving towards third. We need to start thinking of ourselves as creators and innovators again.

AI is one area where that shift is possible, alongside investments in quantum, semiconductors, and manufacturing. This is part of a broader movement already underway.

Where does BharatGen fit into this bigger picture?

The AI ecosystem is vast. You need application builders, model tuners, data infrastructure, and foundational research. BharatGen focuses on one part of that stack: building the core foundational models.

'Need model that supports Indianness'

What are the advantages of a sovereign AI model?

The first dimension is sovereignty as a guarantee of access. That includes supply security, transparency around what goes into the model, and serviceability -- the ability to fix or adapt the system when something goes wrong.

If you rely on an external model, even an open-weights one, do you truly control these aspects?

The second dimension is what I call 'Indianness'. We need models that support Indian languages, culture, and context. Global models often reflect a US-centric worldview.

That is not wrong, but we should also have models that speak and behave like us.

Data availability is often cited as a challenge. How do you see it?

Most global data is in English. Only a handful of countries, such as China, have large volumes of data in their own languages. This is a shared challenge.

If we solve it for India, we also create methods that can work for many other countries. That means new training techniques and new approaches to working with data.

What support has BharatGen received from the government?

We received seed funding of Rs 235 crore from the Department of Science and Technology, covering hiring, operations, and some compute.

We have also received graphics processing unit support from the Ministry of Electronics and Information Technology. BharatGen is fully government-funded.

Where does model development stand?

We launched BharatGen in June last year and released a 2.9-billion-parameter base model, Param, as open source.

We also released domain-specific models for agriculture, Ayurveda, finance, and legal use cases. We have now developed Param2, a 17-billion-parameter model.

How does Param2 compare globally?

It sits in the same range as other 17-20 billion parameter models worldwide. We will publish both the model and its benchmarks so that performance can be evaluated by people.'

'Indian data needs to be unlocked'

Where does Indian data come from?

There is a lot of Indian data, but it needs to be unlocked. We work with publishers, people holding old books, and community radio stations.

For speech data, India has enormous potential. The challenge is doing this while addressing privacy concerns or developing privacy-preserving learning methods. That is a core part of our work.

'Real progess takes time'

Most benchmarks are designed around Western datasets. How do you handle that?

When we built Ayur-Param, there were no meaningful benchmarks, so we built one. In another case, we strengthened a weak benchmark even though it meant our model ranked second instead of first.

Real progress takes time. What looks like an overnight breakthrough, including China's DeepSeek moment, is usually the result of a decade of work. We are on a similar journey.

Are there consumer-facing applications yet?

We are at the pilot stage across governance, education, culture, finance, and the private sector.

People often ask if we are building an 'Indian ChatGPT', but ChatGPT is a collection of systems (models, inference infrastructure, and data pipelines), not just a model. Those capabilities have to be built step by step.

Where can developers access BharatGen models?

They are available on AI Kosh and Hugging Face. We have already seen over a thousand downloads. Over time, we will expand inference access and partner platforms to maximise distribution and availability.

Any final thoughts?

This is just the beginning. The aim is to make Indian AI models in Indian languages accessible to everyone, and there is much more to come.

Feature Presentation: Aslam Hunani/Rediff