By Arya Tripathy on 07 August, 2020
In the earlier post Part I, we delved into some of the recommendations made by the Committee of Experts on Non-Personal Data (NPD Committee)[1] in its report of July 12, 2020 (Report)[2]. We discussed the genesis, Committee’s rationale for regulation of non-personal data (NPD), its scope, the distinction between NPDs based on sensitive nature, and consent mechanism for anonymisation.
In this post, we continue our analysis on a few other aspects.
1. Key stakeholders: The Report contemplates 4 key stakeholders in the NPD ecosystem and processing chain – data principal, data custodian, data trustees, and data trusts.
- Data principal: The Report observes that in NPD context, the natural person cannot be the data principal as is the case for personal data. Determining data principal will be dependent on the type of NPD. Accordingly, the data principal in case of public and private NPD will refer to the natural or legal person to whom the data relates to, such as government bodies, companies, etc. For instance, Ministry of Health will be the data principal for anonymised health data collated through Aaroogya Setu app, and Uber will be the data principal for anonymised ride data collected through Uber application. In case of community data, the community from where NPD originates will be the data principal, and will be entitled to exercise economic and other key rights vis-à-vis the community NPD.
- Data custodian: This is proposed as the equivalent of data fiduciary in the PDP Bill. Data custodian is the person who undertakes collection, storage, processing or who uses NPD, and owes a duty of care towards data principal. They would be obligated to act in data principal’s best interest.
- Data trustee: Data trustee is one who shall exercise all data principal rights vis-à-vis community NPD for and on behalf of the concerned community. While NPD regulations will specify who can act as data trustee, the Report indicates that the person/entity/body that is the closest and most appropriate representative body for a community can act as data trustee.
- Data trusts: The Report proposes new institutional structures called “data trusts” who would receive and share NPD in accordance with NPD regulations. These would be NPD repositories designated to provide digital and data services, such as access, data sharing, etc. The Report also states that this essential infrastructure will be akin to public infrastructure and can be managed by public authorities or new neutral bodies, but remains silent on the specifics, which are likely to be fleshed out in the NPD regulations.
At this stage, several fundamental issues can be highlighted, which the Committee should have elaborated upon.
Firstly, there is no meaningful distinction between data principal and data custodian, when it comes to private and public NPD. NPD is in anonymised format. This means without de-anonymisation, the individual data principal cannot be discovered. Naturally, NPD will then, relate to the legal entity or the government body that collected, processed and will use NPD. Such entity will also qualify as the data custodian, who is required to work in the best interest of the data principal. But in the given scenario, the data principal will be the entity itself who will determine its own best interest. To this extent, the Report creates a fictitious distinction between data principal and custodian, and the objective is unclear.
Secondly, for a given set of community NPD, there could be divergent data points which do not relate to the community. At the same time, a community’s data sets could be manifold, and consequently, there may be multiple data trustees. For instance, NPD collected from mobility aggregators could have information about a particular geographic population’s commute habits, and at the same time, the data set may capture unrelated data points such as traffic conditions, fares, preferred routes, accidents, etc. In this data set, the community can be determined based on the geographic location from where commuters used aggregator services. However, other data points such as traffic conditions, or accidents may or may not be attributable to the said community. Thus, determining the data trustee that has the most proximate connection with the community from where NPD originates will be a difficult task.
Assuming such trustee identification is feasible, it is ambiguous as to how the data trustee will determine the best interests of the community and exercise rights on their behalf. The ambiguity is further augmented by the fact that the Report is completely silent on what individual and community rights can be exercised vis-à-vis NPD. These necessarily need to be addressed in the NPD regulations.
Thirdly, the Report equates data trusts with public infrastructure. If so be the case, there could be possibly no economic incentive for private organizations to set up data trusts. Further, it is indicated that where private bodies set up data trusts, they must exhibit certain degree of neutrality. At the same time, the Report suggests that government and public authorities can manage and act as data trusts. This is contradictory, as the NPD regulations will also govern government, and as such the government could be a data principal, data custodian or data trustee. Where government is involved in any other stakeholder capacity, it can be questioned if government run data trusts are neutral. Hence, it is imperative that while drafting the NPD regulations, due care must be given to the eligibility, manner and mechanism of creation of data trusts.
2. Ownership of NPD: While dealing with ownership of NPD, the Report adopts a beneficial ownership/interest and best interest logic to identify who controls and owns the NPD set. In an attempt at over-simplification, the Committee is guided by the ideology that the ultimate beneficiaries of the NPD should own and exercise rights over NPD. It proposes the following ownership matrix:
- Public NPD being derived from public efforts must be treated as a national resource, used for public good, and hence, ownership lies with the state.
- Community NPD can be viewed as a collective or shared asset with overlapping legitimate interests of various groups, and as such the community should be the beneficiary of such datasets. It does not clearly state that the community is the owner of such NPD, which is a correct approach, considering the overlapping nature of community NPD. It further states that community NPD could provide systemic intelligence about the community and accordingly, the community should exercise control on how such data is used in order to maximize benefits and minimize harms.
- Private NPD belongs to the individual, which means that the concerned individual should be the beneficiary of its NPD.
Relying on beneficial and best interest analogy to determine ownership of NPD may not be the best approach.
Firstly, if data were a non-rivalrous source, it would also mean that same data can be used for multiple beneficiaries and groups. For example, NPD derived from e-commerce company could be used not only for individual benefits (such as providing new services), but also for the benefit of logistics and supply chain sectors, business strategy, better connectivity, and so on. In such scenario, trying to identify ownership and beneficial interest at a granular level is unessential, and can give way to an overtly complicated NPD ecosystem.
Secondly, under no situation is the data custodian considered as the owner of NPD, but, the fact of the matter remains that NPD creation is due to efforts and resources of the data custodian.
Thirdly, the Report completely digresses from its main theme while dealing into the issue of ownership. One of the key objectives for regulation of NPD is to unlock economic potential and boost innovation. This will require that organizations have enough economic incentive to capitalize NPD and enforceable rights over created NPD. If they are mandated to use NPD only in best or beneficial interest of the owner, which could be the community or an individual, the whole objective is likely to get defeated. If the Committee’s concern is to ensure that NPD owner can assert claims should there be risk of re-identification and consequent harm, the same has already been provided for under the PDP Bill. Hence, it is imperative that ownership of NPD is relooked into to account for business and economic realities.
3. Data Businesses: As businesses derive additional value from data, the Report proposes creating a new category of business called “data business”. These businesses could be engaged in any sector such as health, telecom, banking, consumer goods, etc., but will be treated data business if they trigger certain thresholds as will elaborated in the NPD regulations, factoring data volumes, traffic, context and necessity.
Upon reaching the prescribed thresholds, businesses will have to compulsorily register in the proposed data business registration system. During registration, details such as business identification number (like UIN, FID), digital platform name, associated brand names, rough data traffic, cumulative data collected in terms of number of users, nature of data business, kinds of data collected, aggregated, processed, used, sold, data-based services delivered will have to be disclosed. Further, once certain traffic or volume-based data thresholds are exceeded, data businesses would be required to share meta-data about data users and community from where data originates. The meta-data should be stored in meta-data directories in India, openly accessible to all Indian citizens and businesses. If meta-data reveals potential uses, data request for sharing underlying data can be processed.
While making this recommendation, the Committee seems to have overlooked the unique position of meta-data for organizations and individuals. Meta-data is generally, understood as data that provides information about other data. A more apt description will be to view meta-data as a statement about a potentially informative object, where meta-data provides context around the data object. It can be of various kinds such as structural, descriptive, reference, statistical, and administrative. Of these kinds, some sets of meta-data such as descriptive and statistical ones can reveal business processes, intellectual and proprietary information of an organisation, after all, meta-data and unstructured data account for up to 90% company’s data landscape, often referred to as ‘dark data’.
Another aspect worth considering here is the ongoing debate around whether meta-data qualifies as identifiers and hence, personal data. Organizations are increasingly focusing on efficient management of their meta-data to have better visibility on data flow and implementing controls, so as to mitigate chances of unauthorised access and use. Meta-data can reveal an individual’s identity, even where they are in a de-identified format when combined with few other data points.[3] Should the NPD regulations require hosting of significant volumes of meta-data on meta-data repositories, accessible to multitudes, newer and heightened privacy risks emerge and it will be thus, extremely crucial to revisit the proposal on who, how and when can meta-data be accessed. An individual’s privacy cannot be at stake in order to harness the value of dark data for common good.
On the overall nature of how data businesses will be regulated, the Report states that the registration and compliance process for data businesses should be easy, simplified, digital, transparent and “light weight”, without being subject to any license requirement. This does not necessarily mean that compliances will be akin to a self-regulated framework. At the same time, while discussing on liability for data businesses, the Report observes that organizations that comply thoroughly with laid-down standards, self-report, and self-audit their digital compliances will be deemed to have exhibited good faith, and should be indemnified against any liability as long as they swiftly remedy it. It will be interesting to see how and to what extent the NPD regulations regulate data businesses and what consequences entail for organisations that do not fully comply with NPD regulations.
4. Data sharing: The Report contemplates a data sharing framework for NPD. In many ways, this is the crux of the Committee’s recommendations, and is likely to take major chunk of the NPD regulations.
Data sharing is defined as provision of controlled access to all kinds of NPD (private, public, community) by individuals and organizations for defined purposes with appropriate safeguards. The purposes for data sharing can be sovereign interests, community benefits and economic purposes. The Report illustrates each of these purposes. Sovereign interest could include national security, law enforcement, crime prevention, pandemic mapping, while community benefits could include public goods, research, policy making, etc. Regarding economic purposes, the list illustrates creating competitive markets, eliminating entry barriers, encouraging start-ups and even monetary consideration.
Data sharing can be initiated through data sharing requests, or mandated under NPD regulations. Shared data can be accessed through appropriate data infrastructure and made available to all relevant parties. Where a data request is rejected, the requesting party can approach the NPDA to assess the genuineness of the request and require data custodian to share. Some of the guiding principles around mechanism for data sharing, and necessary checks and balances suggested in the Report include:
- improvising existing Open Government Data initiatives
- prescribing a limited mandate of sharing raw data only when it relates to community data,
- following fair, reasonable and non-discriminatory pricing principles (FRAND) where NPD entails minimum value-add
- enabling pricing as per market practices if NPD has higher value-add
- mandatory storage of sensitive NPD in India, although the same can be transferred outside India
- absolute localization i.e., mandatory storage and processing of critical NPD in India
- putting in place contractual terms for cloud service providers and data business around storage, processing and usage of NPD,
- following a data sharing technology architecture where all sharable NPD should be anonymised, have representational state transfer (REST) API, distributed storage format, and standardized to enable cross-sectoral access and utility, and
- developing testing and proving tools that continually runs on data insecure clouds and generate reports on compliance check.
The proposal for data sharing is lofty and while NPD regulations should be light-weight, data sharing protocols and processes must be streamlined. In laying out some of the checks and balances, the Committee does not adequately address the concerns around deanonymisation risks, collective privacy harms, economic costs for data businesses, and an organization’s right to derive the economic value of datasets.
Firstly, there is no conclusive evidence so far, that localization of sensitive and critical NPD will minimize risks posed by deanonymisation and better safeguard privacy. The proposal comes despite severe resistance to localisation of personal data as proposed under PDP Bill. Should NPD localisation formalise as the law of the land, data custodians will not only be obligated to incur costs for localising, but also have no flexibility in managing data assets, or even, determining whether access and storage of NPD as per NPD regulations escalate the risk of reidentification, and resultant breach of privacy.
Secondly, the proposed technology architecture, although not the only form, clearly indicates that in order to comply with data sharing regulations, data custodians will incur significant costs to standardize their NPD formats. This strikes at the fundamental idea that private and community NPD should ideally comprise of raw data where custodians have not deployed any resources or skills.
Thirdly, while access and sharing of data for sovereign and community purposes could substantiate the need for sharing NPD, mandating data custodians to share data for economic benefits of other organizations is fundamentally flawed. India already has well-established competition law mechanisms to ensure that anti-competitive activities are penalised, and as such the whole rationale that data sharing is essential for creating level playing fields may not be the best solution. The Report does not state out what will be minimum or high value-add. Data valuation continues to be a tricky proposition, and is bound to be subjective. In reality, aggregated NPD could in different contexts and applications reveal unlike economic outcomes for diverse organizations. For instance, aggregated health NPD collected by a telemedicine platform will have limited economic relevance for improvising SaaS delivery to its customers and patients, but could provide valuable research insights to healthcare providers, facilitate new drugs development for pharmaceutical companies, and invaluable risk dilution information for insurance companies.
Conclusion: Indian government’s move to monopolize Indian data is a two-edged sword. Regulation of NPD as proposed under the Report is untested, and could have massive ramifications for businesses. Currently, the Report is open for public consultation till August 13, 2020. Needless to state, the Report has several gaps, and even before proposing a new legal framework, it will be absolutely essential that the initial set of recommendations are revisited bearing in mind requirements under PDP Bill, collective privacy dimensions, and unique technical and business realities around NPD.
[1] Ministry of Electronics & Information Technology Office Memorandum No. 24(4)/2019-CLES dated September 13, 2019 accessible at https://www.meity.gov.in/writereaddata/files/constitution_of_committee_of_experts_to_deliberate_on_data_governance_framework.pdf (last accessed on August 2, 2020)
[2] Report by the Committee of experts on Non-Personal Data Governance Framework available at https://static.mygov.in/rest/s3fs-public/mygov_159453381955063671.pdf (last accessed on August 2, 2020)
[3] In a 2015 MIT research, it was showcased that randomised meta-data derived from credit cards resulted in identification of 90% of the purchasers who transacted using the credit cards; to read more about how meta-data can disclose identities of individuals, access https://www.networkworld.com/article/2878394/mit-researchers-show-you-can-be-identified-by-a-just-few-data-points.html (last accessed on August 3, 2020)