The views expressed by contributors are their own and not the view of The Hill

What we’re missing in the CCPA de-identification debate


The California Consumer Privacy Act (CCPA) is predicted to reshape national technology companies in the United States.

Unsurprisingly, it is under constant attacks from both businesses and lawyers. This is despite ‘eloquent’ calls from big tech companies in favor of comprehensive privacy laws. As usual, it is relatively easy to create consensus over high-level pro-user principles.

But the devil is in the details.

When it comes to making clear decisions about what to protect, the temptation is to throw the baby out with the bathwater. A piece of legislation really becomes alive once it has been interpreted by judges, who have the chance to be informed by experts. If these experts were the first generation of legal engineers, the soon-to-be-born CCPA could probably be saved.

Widely misunderstood are the CCPA’s de-identification provisions, which identify the types of data that can be excluded from CCPA’s scope or which should generate a much lighter compliance burden.

As a result, companies have an incentive to de-identify their data to reduce their obligations and protect consumers. Yet, by focusing upon the legalese and apparent inconsistencies of language between sections, lawyers often fail to see that de-identification is in fact a spectrum. Businesses using de-identified data should not be given ‘carte blanche.’

There is no doubt that the CCPA drafting could appear at first glance relatively clumsy. What’s more, it does not precisely identify the methods to be used to achieve de-identification, although it lists a series of key controls.

The truth is that lack of clarity was to be expected. Regulators in different parts of the world have been struggling with capturing what the core components of an effective de-identification process look like and what its actual effect should be. Just look at the six-year debate about ‘pseudonymization vs anonymization’ in the wake of the adoption of the General Data Protection Regulation (GDPR) in Europe.

The main source of confusion

Confusion mainly comes from the fact that while the CCPA does provide a clear exclusion for “publicly available” data, it is more ambiguous when it comes to de-identified or in the aggregate consumer information (compare section 1798.140(o)(2) with section 1798.145.(a)(5)). Nevertheless — and this should be underlined twice — the intent seems to be to ensure that the collection, use, retention, sale, or disclosure of de-identified information should not be restricted. De-identification is a spectrum, so de-identification techniques should be coupled with other technical and organizational measures or safeguards, such as access control, auditing, and obligations not to reidentify.

California legislators seem to be aware of these ambiguities and confusions and have attempted to remove them from the bill. For instance, three amendment bills — AB 873, 874, and 1355 (which the California Legislature passed) — all had in their intent to clarify that de-identified data is excluded from the definition of personal information, and thus the scope of the CCPA.

Legal standards are progressively converging

Despite the difficulty of the drafting task, the good news is that standards seem to be progressively converging and the CCPA is in fact building on existing recommendations.

The 2012 FTC safeguard recommends three steps for de-identification: (1) takes reasonable measures to ensure that the data is de-identified; (2) publicly commits not to try to re-identify the data; and (3) contractually prohibits downstream recipients from trying to re-identify the data.

The CCPA’s own recommendation for de-identification, in its 1.0 version, appears to overlap with the FTC’s framework:

  1. Implement technical safeguards that prohibit re-identification.
  2. Implement business processes that specifically prohibit re-identification.
  3. Implement business processes that prevent inadvertent release of de-identified information; and
  4. Make no attempts to re-identify the information.

The most recent draft of the new CCPA consumer privacy ballot initiative (“CCPA 2.0”) is even closer to the FTC’s three-prong test.

Best practices are progressively maturing

Best practices are maturing to this key insight: Only when organizations enact varied controls together, can regulators justify lifting restraints on processing personal information.

It is not enough to perturb the data to claim effective de-identification.

A successful de-identification strategy, as captured by both the CCPA and the FTC recommendations, is the control of downstream data usage through contractual obligations, auditing, and the range of techniques applied on the data in order to better balance utility and data perturbation. 

The following regulatory trends support this conclusion: 

Yet, this is not exactly how lawyers are usually advising businesses. A more common approach is to insist upon the uncertainty, rather than to identify the range of possible solutions, and therefore demotivate businesses.

The only way out of this vicious circle and to give privacy regulations a chance to survive the storm is to call on legal engineers, who are able to combine both legal and technical insights in order to support responsible decision-making within organizations. These new figures should be tasked with helping build solutions, embedding as many legally-relevant safeguards as possible within products, services and organizational processes.

Sophie Stalla-Bourdillon is a leading expert on the EU GDPR, and Senior Privacy Counsel & Legal Engineer at Immuta, a leading automated data governance platform, where she works on tackling the ethical challenges of AI. She holds a master’s degree in English and North-American Business Law from Panthéon-Sorbonne University and an L.LM. from Cornell Law School. Follow her on Twitter @SophieStallaB

Dan Wu is a Privacy Counsel & Legal Engineer at Immuta, a leading automated data governance platform. He holds a J.D. & Ph.D. from Harvard University. His work on legal tech, data ethics, and inclusive urban innovation has been featured in TechCrunch, Harvard Business Review, and Bloomberg. Follow him on Twitter @danwu_danwu