• D.C.
  • BXL
  • Lagos
  • Dubai
  • Beijing
  • SG
rotating globe
  • D.C.
  • BXL
  • Lagos
Semafor Logo
  • Dubai
  • Beijing
  • SG


In today’s edition, a closer look at the hidden workers in China influencing how AI is built.͏‌  ͏‌  ͏‌  ͏‌  ͏‌  ͏‌ 
 
sunny Beijing
cloudy Brussels
cloudy San Francisco
rotating globe
March 3, 2023
semafor

Technology

Technology
Sign up for our free newsletters
 
Louise Matsakis
Louise Matsakis

Hi, and welcome to Semafor Tech, a twice-weekly newsletter from Reed Albergotti and me that gives an inside look at the struggle for the future of the tech industry. I’ve spent the last few months obsessing over one of the most underappreciated labor forces in artificial intelligence: data annotators. They’re the people teaching chatbots how to recognize racist slurs, or ensuring medical software correctly identifies a tumor.

Tech companies rarely acknowledge these workers, but over the last few years, data annotating has become a billion-dollar business employing an enormous amount of people around the world. Today, I look at what’s happening with a small group of them in China, where a remarkable organization has fought for disabled annotators to be treated with respect.

Plus, our Semafor Business colleague Bradley Saacks chats with Freshworks President Dennis Woodside about AI startups. There’s also bad news for Elon Musk’s brain implant startup, and a porn Zoom mishap involving the U.S. Federal Reserve, of all things.

We are also welcoming a new member of the Semafor newsletter family on Monday. Sign up for Semafor Security from Jay Solomon — showcasing the personalities, hot spots, and money flows driving global instability and conflict.

Are you enjoying Semafor Tech? Help us spread the word!

Move Fast/Break Things

Reuters/Dado Ruvic

➚ MOVE FAST: Activision. Microsoft has been playing an intense game of “Call of Duty: Bureaucratic Warfare,” and it may have paid off. The company seems to have placated the European Union into allowing it to acquire the iconic videogame company, according to this Reuters scoop. The hitch: Microsoft has to offer licensing deals to its rivals.

➘ BREAK THINGS: Vision Fund. The SoftBank investment vehicle is having a hard time getting Chinese officials to permit Arm to let go of its China division, according to the Financial Times. SoftBank, which owns the U.K.-based chip design firm, wants to transfer Arm China to a Vision Fund entity to clear the way for an Arm IPO, with a listing in New York.

PostEmail
Semafor Stat

The percentage of staff that the once-venerable tech news site CNET is laying off. The news is sad, but also interesting. The publication has been on the cutting-edge of using artificial intelligence to pump out content that floats to the top of search queries. The layoffs foreshadow how the internet is about to be reshaped by AI.

PostEmail
Louise Matsakis

The hidden army of workers in China influencing how AI is built

THE SCENE

Wenbo’s job is to listen to what people in China say to their voice-activated smart assistants as part of his work for a leading Chinese artificial intelligence company.

Every shift, Wenbo, who uses a wheelchair, sorts through audio clips, categorizing them based on things like the presence of speech or other sounds, the presumed age or gender of the speaker, and whether or not the person used the device’s “wake word” (like “Hey Google” or “Alexa” for U.S. smart assistants).

Wenbo, in other words, is playing a crucial role in the development of one of the world’s most promising technologies. Around the globe, hundreds of thousands of people like him — many of them independent contractors with few protections — create and organize the datasets that allow self-driving cars to recognize raindrops, help ChatGPT avoid spewing racial slurs, and ensure Facebook’s algorithms don’t mistake a pepperoni slice for a female nipple.

Despite their importance, tech companies rarely acknowledge how much they rely on data annotators (sometimes called labelers or content moderators). The audio clips are sent to Wenbo because artificial intelligence technology isn’t sophisticated enough to recognize what’s happening in them, perhaps because the speaker has a thick accent, or what they said was confusing or unclear. The goal is to teach the AI model how best to make sense of these edge cases, so that it becomes more accurate in the future.

LOUISE’S VIEW

DALL-E/Woman in wheelchair on laptop in Matrix rain

A new study in the peer-reviewed journal First Monday shows why the data annotation industry shouldn’t be ignored or discussed only in the context of unfair labor practices. If the world keeps overlooking the role it plays in manufacturing AI, it will be unable to understand how complex models really work. People will likely believe instead in a popular illusion about AI: that it is solely the creation of elite engineers in places like Silicon Valley or Beijing.

The reality is that it is also built using the specialized labor of Wenbo and hundreds of thousands of his colleagues. The tiny decisions and judgment calls they make shape things like, say, the tone or rhythm of a chatbot’s replies.

As part of her research, Di Wu, a PhD candidate at the Massachusetts Institute of Technology, interviewed Wenbo and over a dozen other annotators working for a unique program run by a disabled persons’ organization in China she calls ENABLE. (That name, as well as the ones given to all of her interviewees, are pseudonyms used to protect them from political or economic blowback for speaking to a researcher.)

ENABLE helped Wenbo and his colleagues negotiate with the AI company for accommodations to make their jobs easier. For example, they asked for text-based datasets to be compatible with screen readers — software that converts written words into audio files — so that they could be interpreted by annotators who are blind or have limited vision.

The AI company was happy to comply, because the work ENABLE produces “recently outperformed many non-disabled competitors” and became one of the firm’s major data annotation service providers, Wu writes. One of the reasons ENABLE is so successful is that the workers have honed their experience over relatively long periods of time.

KNOW MORE

That might sound like an obvious advantage in any job, but data annotation is often misunderstood as mindless “click work” or simply teaching an AI about objective “human preferences.” But as previous research has shown, annotators incorporate their own individual experiences, and must also work within the structures tech companies have designed.

Multiple ENABLE annotators told Wu that they frequently heard different things in the audio clips than the quality assurance (QA) reviewers who checked their work. To get it approved, they trained themselves to be attuned to the ears of the QAs.

“There is no standard,” Wenbo told Wu. “For things like sound, everyone’s ears are different, and everyone’s accents are different.” For example, if someone spoke the smart assistant’s wake word quickly, Wu might find it clear enough to pass, but the QA could easily disagree.

If companies developing AI want to uncover the biases in their models, they will need to take the work done by data annotators seriously, and give them opportunities to provide feedback, which is what happens at ENABLE. The annotators meet regularly with developers at the AI firm, Wu writes, where they discuss trends, emerging problems, and recommendations to improve the system.

Julian Posada, an incoming professor at Yale University who has studied the data annotation industry, said this is not the norm. In many cases, data annotators work as contractors on third-party platforms, where there is no way to communicate directly with their clients.

“What if the majority of workers don’t understand the guidelines properly? Then you will be giving the greenlight to things that are not properly data-annotated,” said Posada. “One of the problems that the platform industry faces is bias from ground truth data.” (Ground truth is the term AI engineers use to describe the target objective they want a model to achieve.)

ROOM FOR DISAGREEMENT

Data annotators will eventually eliminate their own jobs. Each time AI becomes more advanced, it needs less or different kinds of human input to function. Its inherent temporality is a major reason why the data annotation industry may continue to struggle for recognition. “All these jobs, all these things, are produced in a way that’s not for the long term,” said Posada.

NOTABLE

  • In 2021, China’s Ministry of Human Resources and Social Security released occupational standards for the data annotation sector, including skill requirements, reported the Chinese online magazine Sixth Tone.
  • Rest of World traced how self-driving car companies helped spur the professionalization of the data annotation industry. More platforms now have “quality control measures to ensure jobs for autonomous vehicle clients come back with very few mistakes,” wrote reporter Vittoria Elliott.
PostEmail
Evidence

U.S. Republicans recently criticized Ford Motor for announcing its new $3.5 billion battery factory would rely on technology from the Chinese company CATL. But the truth is that Ford likely had little choice in the matter: CATL leads the world in production of lithium-ion batteries, far outpacing its competitors.

PostEmail
Watchdogs

Neuralink's Elon Musk
Reuters/Dado Ruvic

It appears the U.S. Food & Drug Administration doesn’t want Elon Musk to stick computer chips inside your brain…yet. The Twitter-Tesla-SpaceX mogul has a brain-implant startup called Neuralink, and it’s running into regulatory roadblocks, Reuters reports.

Musk has a love-hate thing going with the U.S. government. When wearing his space hat (helmet?), he plays an important national security role. When he’s got his Tesla hat on, he’s a watchdog provocateur, poking the Securities and Exchange Commission (and mostly getting away with it). Musk may have less luck humiliating the FDA into submission so he may go with honey instead of vinegar, but Musk’s brain, implants or not, can be unpredictable.

Neuralink wants FDA permission to run clinical trials on humans. At first, it hopes to help disabled people type with their brains. Neuralink is following in the footsteps of another, lesser known startup called Synchron, which we’ve covered in the past. Synchron already has the FDA’s blessing for trials and has had some impressive results. Semafor received the first ever iMessages from a person’s brain (We discussed Australian Rules Football).

Synchron has taken an easier path than Neuralink. Instead of implanting chips via brain surgery, Synchron slides sensors up through the jugular vein, which gets them close enough to the brain without actually making contact. That process is more akin to inserting a heart stent.

Even if Neuralink did make an FDA-approved device, there aren’t enough brain surgeons in the world to install a lot of them. And each implant would be costly. Like Tesla-level costly. But that could change, and Neuralink may be betting that it won’t be as complicated as brain surgery to make that happen. 

— Reed

PostEmail
One Good Text

Dennis Woodside is president of the software firm Freshworks and formerly helped companies like Impossible Foods, Dropbox, and Google scale their businesses.

PostEmail
Crash Landing

Three years ago, the global economy was on the cusp of shutting down as the coronavirus spread. Our colleague at Semafor Business, Liz Hoffman, wrote a book that tells the story of how CEOs battled an economic catastrophe for which there was no playbook. Crash Landing comes out on March 7 and you can pre-order here.

Crown
PostEmail
Ahem

It was only a matter of time before the tech industry truly disrupted monetary policy. A virtual event featuring Federal Reserve Governor Christopher Waller yesterday was interrupted by an internet prankster who apparently thought interest rate talk pairs well with pornographic images. An executive at Mid-Size Bank Coalition of America, which hosted and then canceled the Zoom video conference, told Reuters he thinks one of the security switches muting those watching the event was set incorrectly.

PostEmail
How Are We Doing?

Are you enjoying Semafor Tech? The more people read us, the better we’ll get. So please share it with your family, friends and colleagues to get those network effects rolling.

And hey, we can’t inform you on what’s happening in tech from inside your spam folder. Be sure to add reed.albergotti@semafor.com (you can always reach me by replying to these emails) and lmatsakis@semafor.com to your contacts. In Gmail, drag this newsletter over to your ‘Primary’ tab.

Thanks for reading.

Want more Semafor? Explore all our newsletters at semafor.com/newsletters

— Reed and Louise

PostEmail