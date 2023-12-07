Since ChatGPT was released last year, a troubling norm has been established in the AI industry, which is that it’s perfectly acceptable to evaluate your own models in-house.

Leading startups like OpenAI and Anthropic regularly publish research about the AI systems they’re developing, including the potential risks. But they rarely commission independent audits, let alone publish the results, making it difficult for anyone to know what’s really happening under the hood.

“Our biggest battle is talking to industry leaders and telling them, ‘Unless you audit your AI systems, they will not work as intended and will have really bad consequences on society,’” said Gemma Galdon-Clavell, a founding member of the IAAA and CEO of the algorithm auditing firm Eticas Consulting.

AD

Part of the challenge is that there’s not one agreed-upon method for testing whether an AI system is dangerous. New York City has established guidelines for determining whether an algorithm exhibits racial or gender bias, but only in the context of tools used for hiring or promotions, which is what its law is designed to address.

Several of IAAA’s founding members acknowledged that different approaches will be needed depending on the type of system in question. You can’t use the same method to audit an algorithm for approving healthcare claims that you would for one determining car insurance premiums.

But what the IAAA is really designed to do is establish the ethical ground rules for how algorithm auditors are supposed to do their jobs. One of the most important principles is ensuring they’re only accountable to the public or a regulatory body, not to the tech company whose product they’re examining. That means auditors can’t have financial ties or other conflicts of interests with firms, which may be their clients.

Another crucial idea both Galdon-Clavell and Brown mentioned is creating auditing procedures that take into account how an algorithm is actually being used in the real world, which is where many of the worst harms tend to arise.

For example, United Healthcare allegedly pressured staff to strictly obey an algorithm that determined when critically ill patients should be cut off from care, even though it was intended only as a guideline. What mattered most was not necessarily the accuracy of the tool, but how it had been deployed.

“You can’t just look at the technical aspects, you have to consider the system and how the technical algorithms interface with people,” Brown said.