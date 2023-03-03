Wenbo’s job is to listen to what people in China say to their voice-activated smart assistants as part of his work for a leading Chinese artificial intelligence company.

Every shift, Wenbo, who uses a wheelchair, sorts through audio clips, categorizing them based on things like the presence of speech or other sounds, the presumed age or gender of the speaker, and whether or not the person used the device’s “wake word” (like “Hey Google” or “Alexa” for U.S. smart assistants).

Wenbo, in other words, is playing a crucial role in the development of one of the world’s most promising technologies. Around the globe, hundreds of thousands of people like him — many of them independent contractors with few protections — create and organize the datasets that allow self-driving cars to recognize raindrops, help ChatGPT avoid spewing racial slurs, and ensure Facebook’s algorithms don’t mistake a pepperoni slice for a female nipple.

Despite their importance, tech companies rarely acknowledge how much they rely on data annotators (sometimes called labelers or content moderators). The audio clips are sent to Wenbo because artificial intelligence technology isn’t sophisticated enough to recognize what’s happening in them, perhaps because the speaker has a thick accent, or what they said was confusing or unclear. The goal is to teach the AI model how best to make sense of these edge cases, so that it becomes more accurate in the future.