Inputs from religious texts, scriptures of local dialects and inspirational word-of-mouth stories will also be included.
The government is planning to use text, images, and other content from religious books and scriptures -- cutting across languages like Hindi, English, Tamil, Telugu, Kannada, Urdu, and some others -- as inputs for the AIKosh database, according to sources aware of the development.
Inputs from religious texts, scriptures of local dialects and inspirational word-of-mouth stories will also be included, a senior government official told Business Standard.
"These books have several thousands of years' wisdom and experience narrated and compiled. So, they can be an excellent source of not just accurate information, but also important context for all kinds of AI models, LLMs (large language models) which will be built by companies under the India AI Mission,” the official said.
On March 6, Meity launched AIKosh -- a domestic dataset containing non-personal data which can be used in the development of AI models, applications, LLMs, small language models, large reasoning models, among other things.
Apart from inviting companies and startups to voluntarily share and contribute their non-personal data for training and finetuning of the AI models, the ministry has also signed a memorandum of understanding with the Lok Sabha secretariat.
This is to leverage the extensive corpus of Parliament questions and answers, reports tabled by various governments over the years, ministry-wise agenda files and committee meetings for the purpose.
As on April 9, AIKosh offered around 350 datasets through its platform, while nearly 150 AI models, which include both LLMs and SLMs, have been developed and registered on the platform.
The datasets available with various ministries could also be tapped. "Most of these datasets are present on the Open Governance Data Platform, but we are exploring the possibility of securing other non-personal and completely anonymised data that can be added (to the AIKosh platform)," another official said.
The India datasets platform -- one of the seven pillars of the ₹10,372 crore (Rs 103.72 billion) India AI mission -- aims to enable access to non-personal data for building AI applications, LLMs, SLMs, and AI tools, among other things. In the Budget for 2025-2026, the government had allocated ₹200 crore (Rs 2 billion) for the scheme.
The datasets present on the AIKosh platform will not, however, be available for monetisation, either by the government or private sector, one of the officials said, adding that Meity had clarified this in Parliament as well.
On March 21, Minister of State for Electronics and Information Technology Jitin Prasada had informed the Rajya Sabha that the sole purpose of the AIKosh and India datasets platform was to enable access to non-personal public and private sector data for building AI applications and 'not monetisation'.
'The AIKosh Platform implements stringent data protection standards to ensure security and confidentiality of user information. It adheres to relevant regulations under Indian law, including the Information Technology Act, 2000, and the Data Protection Bill,' Prasada had said in a written Rajya Sabha reply.
Stating that the compliance governs how personal and usage information is handled, thereby ensuring legal accountability, the minister had stressed that the platform did not engage in monetisation of data in any capacity and that there would be no purchasers or subscribers.
Feature Presentation: Rajesh Alva/Rediff