How AI risks creating a ‘black box’ at the heart of US legal system

by Clayton Vickers - 04/07/24 6:00 AM ET

Artificial intelligence (AI) is playing an expanding — and often invisible — role in America’s legal system. While AI tools are being used to inform criminal investigations, there is often no way for defendants to challenge their digital accuser or even know what role it played in the case.

“Under current law in most jurisdictions, [prosecutors] don’t have to disclose artificial intelligence use to the judge or defense counsel,” Rebecca Wexler, professor of law at the University of California, Berkeley, told The Hill.

AI and machine learning tools are being deployed by police and prosecutors to identify faces, weapons, license plates and objects at crime scenes, survey live feeds for suspicious behavior, enhance DNA analysis, direct police to gunshots, determine how likely a defendant is to skip bail, forecast crime and process evidence, according to the National Institute of Justice.

But trade secrets laws are blocking public scrutiny of how these tools work, creating a “black box” in the criminal justice system, with no guardrails for how AI can be used and when it must be disclosed.

“There’s no standard at any level,” said Brandon Garrett of Duke University School of Law. “The big picture point is that just like there need to be standards for the product, there needs to be standards on how and when they’re used.”

Concerns about AI in the criminal justice system are compounded by research showing how tools like facial recognition are prone to bias — for example, misidentifying people of color because it was trained on mostly white faces.

For the past three congresses, Rep. Mark Takano (D-Calif.), joined twice by Rep. Dwight Evans (D-Pa.), has introduced legislation that addresses issues of testing and transparency in criminal justice, so far failing to garner enough traction to pass the bill.

“Nobody had really addressed this particular issue of black box technologies that are being marketed to prosecutors, police and law enforcement folks on the basis of their alleged efficacy,” Takano said in an interview with The Hill.

“Every American wants to feel that they can get a fair trial if they are accused of something wrong — that’s one of the hallmarks of being an American,” he added. “But what do you do when the witness and evidence brought against you is a machine protected as a trade secret, how do you contend with that?”

Trust but can’t verify

The term artificial intelligence refers to the broad discipline of making machines that learn from experience and mimic humanlike intelligence in making predictions. Unlike other forensic technologies law enforcement uses, AI is responsive to its environment and sensitive to its users, meaning it can produce different outcomes throughout its life cycle.

Without testing and transparency, these nuances are lost and the likelihood of error isn’t accounted for, Garrett said.

Currently, public officials are essentially taking private firms at their word that their technologies are as robust or nuanced as advertised, despite expanding research exposing the potential pitfalls of this approach.

Take one of its most common use cases: facial recognition.

Clearview AI, one of the leading contractors for law enforcement, has scraped billions of publicly available social media posts of Americans’ faces to train its AI, for example.

This initial training teaches an AI program a set of patterns and rules that will guide its predictions. Developers tweak the program by instructing it to consider some factors more than others. Theoretically, the AI becomes an expert at matching human faces — at a speed that far outpaces human capacity.

But when the machine goes out into the field, it may see a population that looks different from its training set. Individual facial recognition algorithms generate notably different findings from their peer products, a 2019 National Institute for Standards and Technology (NIST) report found.

Researchers have found that facial recognition AI has concerning failure rates when handling images of Black Americans, especially Black women, either failing to identify a person at all or making an inaccurate match.

The Gender Shades project from the Massachusetts Institute of Technology’s Media Lab found consistently high error rates, as high as 33 percent, across AI recognition of females with darker skin tones.

Products from Amazon, IBM and Microsoft each exhibited this problem in the study, and some of their products have since been taken off the market. Multiple academic institutions — George Mason University, the University of Texas at Dallas, and New York University (NYU) — have corroborated persistent demographic disparities in facial identification rates.

But studies like the Gender Shades project test facial recognition accuracy on comparatively ideal image quality.

Footage used by police is not often ideal, and a selling point of AI to law enforcement is that it can make use of poor-quality data previously useless to human investigators or traditional forensic algorithms.

To account for the possibility of faulty matches, police commonly treat facial recognition matches as a tip for further investigation and not evidence against the person identified.

But tips still narrow law enforcement’s focus in an investigation, said Wexler at Berkeley. If supporting evidence against a suspect is found, that becomes the basis for an indictment while the use of AI is never disclosed.

That means neither the defense, the prosecution nor the judge often know that police have used AI to guide an investigation, and they never get the chance to interrogate its findings.

“At no point, from pretrial investigations through to conviction, does law enforcement have any constitutional, legal, or formal ethical obligation to affirmatively investigate evidence of innocence,” Wexler said at a Senate Judiciary Committee hearing in January.

Catch-22 in court

Creators of the forensic machine learning models have defended the opaqueness of their products by arguing that disclosure will effectively require revealing trade secrets to competitors in their industry.

However, the companies have been largely supportive of government regulation of its use in criminal justice settings.

Amazon’s Rekognition software “should only be used to narrow the field of potential matches,” according to its site.

Matt Wood, vice president of product at Amazon Web Services, is quoted by the company as saying it’s a “very reasonable idea for the government to weigh in and specify what temperature (or confidence levels) it wants law enforcement agencies to meet.”

IBM sunsetted its AI facial recognition products shortly after the Gender Shades study, and IBM CEO Arvind Krishna wrote a letter to Congress calling for “precision regulation” of the tech.

Microsoft discontinued sale of facial recognition AI to police departments in 2020, saying it wouldn’t budge “until strong regulation [on facial recognition AI], grounded in human rights, has been enacted.”

In March, Clearview AI obtained “awardable” status from the Department of Defense’s Tradewinds Solutions Marketplace, a vetting body that creates a suite of technologies ready for “rapid acquisition.”

In a statement to The Hill, Clearview AI CEO Hoan Ton-That said his product survived testing from NIST with higher than a 99 percent accuracy rate “across all demographics.”

“As a person of mixed race, having non-biased technology is important to me,” he said.

“According to the Innocence Project, 70% of wrongful convictions come from eyewitness lineups. Technology like Clearview AI is much more accurate than the human eye, and can be used to exonerate people and eliminate bias from the criminal justice system,” he added.

Still, defense counsel faces a high bar to prove errors in an AI lead. They often must show that AI source code was likely to be “necessary” for a criminal case, a higher standard than for most subpoenas in search of evidence.

“The reason that is so troubling is that it creates a Catch-22. It may be impossible to prove that information you’ve never seen is necessary to a case,” Wexler said.

Defense attorneys have already lost major cases seeking disclosure of non-AI algorithm source code. And in addition to fighting the “necessary” standard, defense counsel often meets resistance from the state, said Mitha Nandagopalan, staff attorney at the Innocence Project.

“In pretty much any case I’ve touched that has involved a request for underlying source code or machine learning model, prosecution has opposed it,” Nandagopalan told The Hill.

Judges frequently don’t see the relevance if AI-generated leads are not considered evidence, she said. And in her work as a defense attorney in Albuquerque, N.M., Nandagopalan said police often fail to disclose it.

“In a lot of cases, we got police reports that said, ‘We looked at the surveillance footage from the store, and using state mugshot databases or other databases, we found a match,’” she said. “Nowhere in their report did it say, ‘We used AI recognition software to identify the suspect.’”

Those concerns extend well beyond facial recognition, encompassing the risk of “dirty data” perpetuating injustices in various uses of AI tools.

The potential for biased AI predictions informed by dirty data is “enormous,” said Vincent Southerland, director of the Center for Race, Inequality and the law at NYU, in an article for the American Civil Liberties Union.

Southerland cited police behavior in Ferguson, Mo.; Newark, N.J.; Baltimore; and New York City as examples of biased policing that would give AI “a distorted picture” in its handling of risk assessments or crime forecasting, for example.

Crime forecasting refers to AI that takes historical crime data in a community and makes predictions of where future criminal behavior will take place, allowing police, theoretically, to efficiently allocate scarce resources.

Risk assessments broadly refer to AI’s assignment of a risk score to a person based on factors like their criminal history. These scores inform decisions on worthiness for bail, parole and even the severity of sentences.

“The failure to adequately interrogate and reform police data creation and collection practices elevates the risks of skewing predictive policing systems and creating lasting consequences that will permeate throughout the criminal justice system and society more widely,” an NYU Law Review case study said.

Setting standards

Ideally, government users of AI would take an informed approach to AI’s conclusions that accounts for its specific features and limitations, Karen Howard, director of science, technology and analytics assessment at the Government Accountability Office, told The Hill.

But that’s often not possible as long as AI remains in a “black box,” she said, as public officials can’t even confirm the tools are reliable and unbiased in the first place.

Testifying before the Senate Judiciary Committee in January, Howard said any AI program in use by law enforcement without independent review “should set off alarms.”

“The riskiest AI tool would be one where the training data set is not understood, not representative and it’s being handled by somebody who really doesn’t understand what the technology is and isn’t telling them,” she said.

The Biden administration has announced a series of efforts to ensure AI tools aren’t hurting Americans, both in the legal system and elsewhere.

The National Institute for Standards and Technology released an AI Risk Management Framework in January 2023.

“Without proper controls, AI systems can amplify, perpetuate, or exacerbate inequitable or undesirable outcomes for individuals and communities,” it said. “With proper controls, AI systems can mitigate and manage inequitable outcomes.”

The White House Office of Science and Technology Policy also released the Blueprint for an AI Bill of Rights in October 2022, which includes “algorithmic discrimination protections.”

However, these measures do not have the force of law, and they place no binding mandate for testing or transparency on AI products the government uses in the criminal justice system.

The legislation sponsored by Takano and Evans would prohibit the use of trade secret privilege to deny cross-examination of forensic AI to defense attorneys, direct NIST to establish a testing program for forensic algorithms adopted by law enforcement and mandate vetting before use.

“AI would be another layer of source code that would be required to be open under my bill,” Takano said. “That technology is not infallible, that technology should be subjected to tests of reliability and fairness.”

Tags Artificial Intelligence