• D.C.
  • BXL
  • Lagos
  • Riyadh
  • Beijing
  • SG
  • D.C.
  • BXL
  • Lagos
Semafor Logo
  • Riyadh
  • Beijing
  • SG


Exclusive / Meta’s $15 billion investment in Scale AI comes with a hidden perk: data

Reed Albergotti
Reed Albergotti
Tech Editor, Semafor
Jun 10, 2025, 8:25pm EDT
tech
Reuters/Xavier Collin/Image Press Agency/NurPhoto
PostEmailWhatsapp
Title icon

The Scoop

A big chunk of Meta’s $15 billion investment in Scale AI requires the startup to provide future work to Mark Zuckerberg’s firm, according to two people familiar with the matter, underscoring the importance of data in the pursuit of artificial general intelligence.

The unusual arrangement, which has not been previously reported, reflects how companies pushing the boundaries of AI are racing to gather more training data that can be used to improve the capabilities of AI models, but they’ve mostly tapped the existing supply.

Scale AI became a juggernaut in the data collection industry by amassing a global workforce of contractors capable of generating high-quality data. It pays everyone from university professors to coders to comedians to spend time converting their expertise into AI fodder.

That work is becoming increasingly expensive as AI companies seek more specialized data, and firms believe the kind of data they collect could amount to the secret sauce that catapults them ahead of competitors.

AD

Meta’s investment in Scale AI is, in part, an advanced payment on data collection fees and, for the first time, offers a glimpse into the mammoth costs associated with the endeavor.

As part of the deal, Meta will end up owning just under half of Scale, The Information first reported, and co-founder Alexandr Wang will take a role within Meta leading a new artificial intelligence effort, according to Bloomberg, which also first reported on the investment.

While the deal has been compared to an acquisition, Scale will continue operations as an independent company and Wang will still be its CEO.

AD

The Meta investment in Scale is similar to other major AI investments where cash infusions are earmarked for specific purposes that benefit the investor.

For instance, part of Microsoft’s investment in OpenAI paid for the startup’s compute costs at Microsoft’s data centers. Amazon made a similar deal with Anthropic, which is now using Amazon Web Services’ custom silicon to train its next frontier model.

Title icon

Know More

Wang founded the company in 2016 as a data labeling service, hiring low-wage workers around the world to help companies categorize data so that it could be used for machine learning.

AD

During the pandemic, Wang lived with OpenAI co-founder Sam Altman and got a front row seat to the next wave of artificial intelligence. Rather than label data, OpenAI wanted people to essentially grade the outputs of large language models, a process known as reinforcement learning with human feedback.

The technique, while it wasn’t a new invention, proved critical for generative AI. Where the performance of AI models once plateaued after a certain amount of training, new models kept getting better, with no end in sight.

Eventually, though, models hit a new bottleneck: data. As they became larger and larger, the sources of data weren’t growing as much. Eventually, without enough data, models have a tendency to “overfit” or essentially memorize training data instead of generating new outputs.

Foundation model companies are experimenting with new techniques that involve using highly specialized data, including video and sound to boost model performance.

At Meta, Wang will run a new division focused on building advanced AI. While not an AI researcher himself, he understands the AI landscape perhaps better than most. Scale’s clients include almost the entire top tier of AI companies.

Wang also has deep connections in DC, where Scale has been building up a consulting business similar to Palantir’s.

Title icon

Reed’s view

Assuming this investment stands up to scrutiny from antitrust watchdogs in Washington, this is a masterful move by Zuckerberg. It brings one of the most important AI companies into Meta’s fold, which will help with recruiting and resources.

What’s more interesting is what it says about the role of data in that effort. Zuckerberg is making a big bet that the AI race can’t be won with tricks like synthetically-created data alone.

Meta is not alone. Every frontier lab is racing to create new training data to match increases in compute capacity. And Scale’s competitors no doubt see an opportunity in wake of the Meta investment.

“Some clients may prefer to work with a more neutral provider,” said Jonathan Siddharth, co-founder and CEO of Turing, one of Scale’s competitors, calling the deal “validation of the importance of a data partner. I’m excited about what it means for Turing.”

Frontier AI labs are constantly experimenting with new techniques, but the basic philosophy has remained constant: More compute and more data equates to more powerful models.

New AI research tends to flow freely from company to company. But proprietary data that cost billions to create is something companies can guard and use to get an edge.

Another factor is that Meta has chosen to open source its most powerful AI models, essentially giving away extremely expensive work products.

That strategy has worked for Meta so far. Its AI models are not the highest performing in the industry, but the fact that they are open source has made them a popular choice for companies all over the world, giving Meta credibility and allowing it to stay in the race.

As it continues to spend billions to develop increasingly performant models, Meta may have to keep some models closed.

Title icon

Room for Disagreement

Some journalists commenting on the deal said that it was merely an acquisition of Scale disguised as an investment for the purposes of avoiding antitrust scrutiny. “re: Meta/Scale AI. U.S. antitrust regulators can review non-control deals,” tweeted Axios business editor Dan Primack.

Tech reporters Eric Newcomer and Tom Dotan said the deal was risky and compared it to Microsoft’s non-acquisition acquisition of Inflection AI, where almost the entire company was absorbed into Microsoft after failing to find traction.

“If the question is whether Meta-Scale is an OpenAI or an Inflection, the final answer will likely end up being: yes.

The idea is for the two to combine their visions and build their respective companies together, an approach that strikes us as both promising and risky. The combination could end up being a synthesis of the Inflection and OpenAI deals—taking the top executives out of a successful business and hoping the remaining company still flourishes.”

AD
AD