'The aim is to have curated datasets that create cultural and historical nuances that help us make our models understand these better.'

OpenAI has launched IndQA, a new benchmark designed to evaluate how well AI models understand and reason about questions pertinent to various Indian languages across a wide range of cultural domains, to deepen its focus in the country that has its second highest user base after the US.
This would likely be the first time that Indian languages are being benchmarked by a global large language model (LLM) platform.
IndQA evaluates knowledge and reasoning about Indian culture and everyday life in 12 Indian languages and 10 cultural domains by partnering with 261 domain experts across the country.
As of now its reach spans across 2,278 questions.
"The aim is to have curated datasets that create cultural and historical nuances that help us make our models understand these better," Srinivas Narayanan, chief technology officer of B2B applications at OpenAI, said on the sidelines of the company's DevDay Exchange.
"That will not only give highly quality answers, but also create a rubric so that when AI gives answers, we can evaluate how good these answers are," Narayanan added.
By starting this initiative in India, which has numerous languages and dialects, the company wants to make its LLMs more contextual which will help it adopt this in other countries, he added.
The move comes on the heels of OpenAI -- one of the world's most valuable startups -- making ChatGPTGo plan free for its Indian users starting November 4.
This is expected to allow users in the country to get more access to advanced AI features without paying the usual subscription fee.
IndQA covers topics such as architecture and design, arts and culture, everyday life, food and cuisine, history, law and ethics, literature and linguistics, media and entertainment, religion and spirituality, and sports and recreation.
Each data point includes a culturally grounded prompt in an Indian language, an English translation for auditability, rubric criteria for grading, and an ideal answer that reflects expert expectations.
The move is similar to the attempts by few Indian startups to build foundation models based on Indic languages as part of the India AI Mission.
Soket will develop India's first open-source 120 billion parameter foundation model optimised for the country's linguistic diversity, targeting sectors such as defence, healthcare, and education.
Gnani which will build a 14 billion parameter voice AI foundation model delivering multilingual, real-time speech processing with advanced reasoning capabilities.
Gan AI will create a 70 billion parameter multilingual foundation model targeting text-to-speech capabilities.
While OpenAI does not break up India specific user metrics, it said it has 4 to 5 million developers globally who use its platform to build apps and 800 million weekly active users.
The company has also announced plans to open its maiden office in India and has started hiring from last month. It will be looking for solution engineers and architects and has about seven open positions in the country, as per its website.
The roles include account director for startups, strategics, and large enterprises, an AI deployment manager and a solution architect for startups.
Narayanan said the nature of software engineering is changing as AI tools help engineers become more productive and improve efficiency.
It will also create a new generation of builders at a time when millions of lines of codes are being written by machines.

Feature Presentation: Ashish Narsale/Rediff








