The newly formed International Association of Algorithmic Auditors (IAAA) is hoping to professionalize the sector by creating a code of conduct for AI auditors, training curriculums, and eventually, a certification program.
Over the last few years, lawmakers and researchers have repeatedly proposed the same solution for regulating artificial intelligence: require independent audits. But the industry remains a wild west; there are only a handful of reputable AI auditing firms and no established guardrails for how they should conduct their work.
Yet several jurisdictions have passed laws mandating tech firms to commission independent audits, including New York City. The idea is that AI firms should have to demonstrate their algorithms work as advertised, the same way companies need to prove they haven’t fudged their finances.
The IAAA is being launched by a number of academic researchers, like the Mozilla Foundation’s Ramak Molavi Vasse’i, along with industry leaders like former Twitter executive Rumman Chowdhury and Shea Brown, CEO of the AI auditing firm BABL AI.
Brown told Semafor that the organization is trying to address a persistent problem with the way AI safety and transparency efforts are often framed. A plethora of think tanks and nonprofit groups have come up with different methods for evaluating AI tools, but they often overlook the real people tasked with the job, who come from varied backgrounds.
“Everybody’s talking about standards,” said Brown, who also teaches astrophysics at the University of Iowa. “But what we’re thinking about is that there’s going to be people at the heart of that process. Those people need to be professionals.”
Since ChatGPT was released last year, a troubling norm has been established in the AI industry, which is that it’s perfectly acceptable to evaluate your own models in-house.
Leading startups like OpenAI and Anthropic regularly publish research about the AI systems they’re developing, including the potential risks. But they rarely commission independent audits, let alone publish the results, making it difficult for anyone to know what’s really happening under the hood.
“Our biggest battle is talking to industry leaders and telling them, ‘Unless you audit your AI systems, they will not work as intended and will have really bad consequences on society,’” said Gemma Galdon-Clavell, a founding member of the IAAA and CEO of the algorithm auditing firm Eticas Consulting.
Part of the challenge is that there’s not one agreed-upon method for testing whether an AI system is dangerous. New York City has established guidelines for determining whether an algorithm exhibits racial or gender bias, but only in the context of tools used for hiring or promotions, which is what its law is designed to address.
Several of IAAA’s founding members acknowledged that different approaches will be needed depending on the type of system in question. You can’t use the same method to audit an algorithm for approving healthcare claims that you would for one determining car insurance premiums.
But what the IAAA is really designed to do is establish the ethical ground rules for how algorithm auditors are supposed to do their jobs. One of the most important principles is ensuring they’re only accountable to the public or a regulatory body, not to the tech company whose product they’re examining. That means auditors can’t have financial ties or other conflicts of interests with firms, which may be their clients.
Another crucial idea both Galdon-Clavell and Brown mentioned is creating auditing procedures that take into account how an algorithm is actually being used in the real world, which is where many of the worst harms tend to arise.
For example, United Healthcare allegedly pressured staff to strictly obey an algorithm that determined when critically ill patients should be cut off from care, even though it was intended only as a guideline. What mattered most was not necessarily the accuracy of the tool, but how it had been deployed.
“You can’t just look at the technical aspects, you have to consider the system and how the technical algorithms interface with people,” Brown said.
Room for Disagreement
Anthropic recently commissioned a third-party audit by a nonprofit called the Alignment Research Center (ARC), and the process didn’t go as smoothly as it initially hoped. In an effort to remain impartial, the auditors didn’t share many details about their approach to probing Anthropic’s models. But when the company saw the results, they realized that disclosing more information ahead of time would have “helped ARC be more successful in identifying concerning behavior.”
The View From China
China’s AI ecosystem is also struggling with finding adequate ways to audit AI algorithms. There are currently many different testing standards that often yield different results, according to a policy paper published by researchers from the Chinese Academy of Sciences and a state-backed technology think tank last month. The industry has “not yet formed a unified evaluation method and system of indicators,” the researchers wrote, per a translation by George Washington University professor Jeffrey Ding.
- AI hiring firm Pymetrics paid a team of computer scientists from Northeastern University to audit its algorithm in 2020. The results showed that it selected roughly the same proportion of men and women, as well as people from different racial groups. But the audit didn’t prove that it “actually picks the most qualified candidates for any job,” according to an article in MIT Technology Review.
- New York City’s 2021 AI auditing law was part of a much broader movement to “better understand how AI works and hold users accountable,” Wired reported.