rediff logo
« Back to Article
Print this article

Sarvam AI To Launch India's First LLM In Feb

November 20, 2025 11:52 IST

Sarvam's LLM will have more than 17 trillion tokens with 17 to 20 per cent coming from Indian data.

Kindly note that this illustration generated using ChatGPT has only been posted for representational purposes.
 

India's first homegrown foundational large language model (LLM) built by Sarvam Artificial Intelligence may come out early next year, company cofounder Vivek Raghavan said on Wednesday.

The launch will, in all likelihood, happen before or during the India AI Impact Summit.

The government will hold the flagship event to demonstrate India's capabilities and advancements in AI, and more specifically around sovereign models.

"We are trying to get the model out by February," Raghavan told Business Standard on the sidelines of the Bengaluru Tech Summit.

Sarvam AI was selected by the IndiaAI Mission this year to build the country's first sovereign LLM ecosystem.

It will be developing an open source 120-billion parameter AI model to enhance governance and public service access through use cases like 2047: Citizen Connect and AI4Pragati.

In a panel discussion Raghavan said, "The existing models have sub 1 per cent Indian data."

Sarvam's LLM will have more than 17 trillion tokens with 17 to 20 per cent coming from Indian data.

Besides Sarvam, Soket will develop India's first open-source 120 billion parameter foundation model optimised for the country's linguistic diversity, targeting sectors such as defence, healthcare, and education.

On the other hand, Gnani will build a 14 billion parameter voice AI foundation model delivering multilingual, real-time speech processing with advanced reasoning capabilities.

Gan AI, another company, will create a 70-billion parameter multilingual foundation model targeting text-to-speech capabilities.

When asked if the latest Digital Personal Data Protection (DPDP) Rules would make LLM makers tweak these models to comply with the regulations, Raghavan said it is unlikely to be the case.

"As of now, I do not see the problem because LLMs are memory-less systems which do not store data unlike apps which store consumer data. However, these rules are subject to interpretation," he added.

Retraining such models may require significant cost and effort.

This comes as companies processing user data -- known as data fiduciaries -- must clearly explain to users, or data principals, how their personal data will be used.

The new rules need informed consent along with easy provisions to revoke it for any personal data processing.

Abhishek Singh, additional secretary, ministry of electronics and information technology (MeitY), said while the expectation was India should be leading the AI race, it is behind the US and China.

"We have to realise our potential and so the gaps have been identified. We had a lack of ability in compute, data sets, capital, and foundation models.

"Even access to graphic processing units (GPUs) was a limiting factor.

"However, 38,000 GPUs have been empanelled which is solving part of the problem.

"We need thousands of GPUs to be somewhere near to the best in the world and for that we need more investment from the industry."

Feature Presentation: Ashish Narsale/Rediff

Avik Das
Source: source image