While data can empower communities, it reinforces identities, making local politics more caste-centric, with decisions increasingly contested on the grounds of representation.
Such dynamics could lead to shifting alliances and, in the worst case, local governance getting paralysed as each group demands proportional power-sharing, explain Amitabh Kundu and Mehebub Rahaman.
The demand for a caste census in India primarily stems from the iniquitous access to resources and inadequate political representation of the marginalised and vulnerable population.
As affirmative action policies evolved, the necessity of generating robust empirical data to assess the inequitable distribution of socio-political and economic benefits among individuals and caste groups has come to the forefront.
The Socio-Economic and Caste Census (SECC) conducted in 2011 was an attempt to collect such data, aiming to identify the poor and determine household-level eligibility for various welfare programmes.
However, while the socio-economic data from the SECC was released, the caste data remains unpublished due to issues such as the lack of standardisation in caste classifications and inconsistencies in the data.
Concerns about the confidentiality of census data, potential biases in self-reporting due to caste identification, and the political ramifications of caste data at the grassroots level have unfortunately not received adequate attention.
Here we examine the operational complexity of these issues in the context of continued hesitation of the ruling dispensation to holding it at national level, against the vociferous demand by the opposition for holding it with a sense of urgency.
The origins of caste enumeration in India trace back to the decennial Censuses conducted under British rule.
The last comprehensive collection of caste data as part of the main Census was in 1931, which recorded socio-economic information on all castes across the country.
This exercise was carried out, not under any formal legislation but through an administrative order.
After Independence, successive Indian governments chose not to include detailed caste data -- except for the scheduled castes and scheduled tribes -- in the Census, citing concerns about the potentially divisive and politically sensitive nature of such data.
The SECC of 2011 marked the first attempt in post-Independence India to collect detailed caste data alongside economic information.
This decision was driven by longstanding demands for accurate data to facilitate the effective implementation and evaluation of affirmative action and welfare policies.
According to Constitutional provisions, only a central government agency is authorised to conduct a Census.
Accordingly, SECC 2011 was carried out by the ministry of rural development in rural areas and the ministry of housing and urban poverty alleviation in urban areas, in coordination with the office of the registrar general and census commissioner of India under the ministry of home ffairs.
Confidentiality clauses in Census data
The Indian Census is governed by the Census Act of 1948, which strictly enforces the confidentiality of information collected from households.
Section 15 of the Act stipulates that no individual data gathered during the Census can be used as evidence in any legal proceedings or shared with any external agency, thereby ensuring the privacy of respondents.
This confidentiality is essential for building public trust, as it encourages individuals to provide accurate and honest information without any fear of exposure or reprisal.
The integrity of the data is further reinforced by adhering to a fundamental principle of data collection: Neither the enumerator nor the respondent should derive any personal benefit from the information generated.
Understandably, the SECC was not conducted under the Census Act but was instead carried out through an administrative order.
This distinction arose because here the data collection involves a process of community validation at the level of gram sabhas and ward committees.
Draft caste-based data was publicly displayed for comments and corrections, allowing for collective scrutiny.
While this participatory approach enhanced transparency, it posed several challenges.
It raised concerns about privacy, deliberate bias in reporting, the influence of local political dynamics, and the potential for conflict.
Socio-economic data classified by caste remains unpublished, primarily due to the lack of standardisation in caste categories.
Over 4.7 million caste names were recorded, including numerous synonyms, spelling and language variations, as well as entries for sub-castes and professions.
The herculean task of organising this information within a codified national caste classification system -- beyond the recognised SC, ST, and Other Backward Classes (OBC) categories -- has rendered the data largely unusable for policy purposes.
The agencies responsible for the SECC lacked a clear normative framework for validating, classifying, and publishing caste data, as there was no authoritative guideline to determine what constitutes an acceptable caste entry, beyond the official lists for SCs and STs.
There are additional challenges as well. Since welfare entitlements and the identification of the poor were expected to rely on the deprivation recorded at both the household and caste levels, concerns have been raised about the statistical robustness of the data.
In many cases, respondents have overstated their deprivation -- a phenomenon known as strategic response bias in economics and public policy (Bhalla 2015 and Kundu 2024).
This tendency appeared to be more pronounced in areas where the local validation process was less stringent, or where communities collectively coordinated their responses in the hope of securing higher group-level allocations.
Overreporting of deprivation in the SECC has been observed not only in the indicators used to identify the poor in rural and urban areas but also in others such as access to basic amenities and literacy rates.
This pattern of strategic misreporting -- highlighted through comparisons between SECC data and that of the National Sample Survey (NSS) conducted around the same period, raised questions about the credibility of the data and its implications for effective policy targeting.
Globally, behavioural economics and survey research have well-documented the risks of such response patterns, particularly when potential beneficiaries are aware of the eligibility criteria.
In the case of SECC 2011, these dynamics contributed to concerns about both exclusion and inclusion errors.
Moreover, households often responded to the questionnaire in ways that reflected the social norms associated with their caste -- a phenomenon known as social desirability bias in social research.
This bias tends to be particularly pronounced when sensitive topics such as caste, ethnicity, or gender are involved.
When respondents are aware that their household is being identified by caste, their answers may, to an extent, align with societal expectations or stereotypes of their caste, rather than accurately reflecting their actual personal or household situation.
State-level scenarios
States such as Bihar, Karnataka, Telangana, and Andhra Pradesh have undertaken their own initiatives to conduct caste surveys over the past year-and-a-half.
While these surveys achieved near-complete coverage, they are still considered surveys rather than Censuses, as only the central government holds the mandate to conduct a Census under constitutional provisions, as mentioned earlier.
Bihar's 2022 caste-based survey, for example, involved greater participation from local officials and utilised digital tools to ensure verification and transparency, covering both economic and caste indicators.
Similarly, the caste surveys in Karnataka, Telangana and Andhra Pradesh were designed to integrate caste data into the planning and targeting of their welfare programmes.
However, it is important to recognise that these surveys vary significantly in scope and coverage, meaning the data collected is not strictly comparable.
As a result, they cannot be combined to create a national-level dataset.
Potential for data misuse
It has been argued that the caste data, if released at the granular level, could be used to further entrench caste identities, leading to increased social fragmentation.
This could shift the political landscape, sparking demands for new political alignments and de facto reservations even for non-backward castes.
Consequently, successive governments have refrained from publishing detailed caste data, mindful of its potentially disruptive effects.
The availability of such data at the gram panchayat or ward level might highlight the under-representation of certain castes in local governance and distribution of services and welfare, in public as well as in private institutions.
This transparency could, in turn, trigger movements protesting perceived caste domination or exclusion.
Political actors and civil society organisations might use the data to mobilise communities along caste lines, potentially undermining institutional neutrality.
These dynamics raise significant ethical concerns regarding how caste data is shared, interpreted, and used. Ensuring responsible data governance, therefore, is crucial.
Understandably, the availability of caste data could intensify demands for the distribution of political, economic, and social benefits across castes in proportion to their population share at the local level under stipulations like 'Jiski jitni bhagidari, uski utni hissedari', often interpreted as the community's share in benefits should be proportional to its share in population.
Ground-level data would likely make these expectations more pronounced, leading to increased intolerance toward disparities that were once less visible or unknown.
This could potentially result in tensions in areas such as hiring, education admissions, and political ticket distribution.
While data can empower communities, it reinforces identities, making local politics more caste-centric, with decisions increasingly contested on the grounds of representation.
Such dynamics could lead to shifting alliances and, in the worst case, local governance getting paralysed as each group demands proportional power-sharing.
Public and private institutions alike may face greater scrutiny over their caste composition, with the risk of protests, boycotts, or political pressures with the allegation of their being dominated by a few castes, despite their efforts to maintain inclusivity and neutrality.
For a caste-based Census at the national level
Scholars such as Andre Beteille and Christophe Jaffrelot have observed that while caste is Constitutionally discouraged, it remains a fundamental organising principle in Indian society.
The connection between caste data and welfare distribution is a significant concern, as caste data can be an invaluable tool for evidence-based policymaking.
It enables targeted interventions in areas like education, healthcare, and employment for historically marginalised groups.
While the availability of caste data at the macro level is crucial for this, it is important that the Caste Census be conducted independently of the Population Census and outside the purview of the Census Act of 1948, incorporating local-level validation.
Additionally, legal safeguards must be established to ensure that such data availability does not lead to disruptive local politics.
To minimise biases in data reporting, survey design should include indirect or less obvious questions.
Ultimately, data-driven programmes at both the national and state levels could strengthen affirmative action for underprivileged castes.
Amitabh Kundu is senior advisor at Development Alternatives, and Mehebub Rahaman is an independent researcher working on developmental issues.
Feature Presentation: Rajesh Alva/Rediff