• D.C.
  • BXL
  • Lagos
  • Riyadh
  • Beijing
  • SG
  • D.C.
  • BXL
  • Lagos
Semafor Logo
  • Riyadh
  • Beijing
  • SG


Meta releases new data set, AI model aimed at speeding up scientific research

May 14, 2025, 12:03pm EDT
techNorth America
A Meta logo is seen in Menlo Park.
Peter Dasilva/File Photo/Reuters
PostEmailWhatsapp
Title icon

The Scoop

Meta released a massive trove of chemistry data Wednesday that it hopes will supercharge scientific research, and is also crucial for the development of more advanced, general-purpose AI systems.

The company used the data set to build a powerful new AI model for scientists that can speed up the time it takes to create new drugs and materials.

The Open Molecules 2025 effort required 6 billion compute hours to create, and is the result of 100 million calculations that simulate the quantum mechanics of atoms and molecules in four key areas chosen for their potential impact on science.

AD

“We’re talking about two orders of magnitude more compute than any kind of academic data set that’s ever been made,” said Sam Blau, a research scientist at the Lawrence Berkeley National Laboratory who worked with Meta on the project. “It’s going to dramatically change how people do computational chemistry.”

Scientists have long created mathematical representations of the way atoms and molecules interact, but it has been prohibitively expensive to do those calculations on a large scale. In recent years, Meta has been putting unused data center capacity to work on the problem, employing complex math known as Density Functional Theory to map out the way atoms physically interact with other atoms and molecules.

Those calculations help researchers come up with ideas for new drugs, battery technology, and other breakthroughs.

Meta’s Fundamental AI Research (FAIR) team also believes that the new data set and AI model will help advance overall AI technology. In order to reach what Meta calls “Advanced Machine Intelligence,” its researchers say AI must build a “world model,” or an understanding of its physical surroundings. That includes atoms, the building blocks of the physical world.

AD
Title icon

Know More

The Open Molecules 2025 data set doesn’t just include more of these DFT calculations. The calculations themselves are also larger. Blau said that most DFT calculations map out interactions of around 20 to 30 atoms. But the ones in the new data set include systems of up to 350 atoms.

More atoms means a more complete picture of what is actually happening in real-world atomic interactions, allowing an AI model to predict new, undiscovered interactions with more fidelity.

By using that extremely large data source, Meta has trained a new AI model, known as UMA, short for Universal Frontier model for Atoms. UMA, Meta says, can get to the same end result as the method using Density Functional Theory, but do it 10,000 times faster.

AD

“Instead of saying, ‘Oh, let me try this new molecule, and I’ll check back in a couple days,’ it’s saying, ‘Let me try these 10,000 different molecules and just run them all simultaneously’ and then getting an answer in a minute,” said Larry Zitnick, a research scientist on Meta’s FAIR team.

The UMA family of models will come in three sizes, allowing researchers to choose a model with more power or opt for lower cost and speed. The models were trained on five data sets, including the Open Molecules data set.

While past data sets from Meta have focused on specific topics, such as materials science, the Open Chemistry data set spans four broad areas: small molecules, biomolecules, metal complexes, and electrolytes.

Combining current and past data sets unlocked new capabilities, Meta told Semafor. In the past, while training on just one narrow data set, Meta researchers saw diminishing returns as the size of the model increased, ultimately leading to a problem called overfitting, where the model would start memorizing the training data instead of generalizing from it to create new outputs.

With the larger data set, the AI models could be scaled more and more, without running into the overfitting problem. Meta has still not hit the size limit of its new model.

Title icon

Step Back

ChatGPT. It uses a “mixture of experts” technique that has become popular in AI because of its ability to make models run more efficiently on smaller computers. In effect, it won’t require powerful computers for researchers to benefit from UMA models.

The model could be useful in everything from curing disease to finding a new electrolyte for advanced battery chemistries.

Meta hopes that by open-sourcing the underlying data set, it will allow other researchers to innovate on even more models.

The company also deliberately left out data that could help bad actors develop biological or radiological weapons using the data set or models.

Title icon

Reed’s view

What Meta and other AI companies are doing will transform the world we live in. This will take time, and it might not be obvious to most people. Science is no longer about a lightbulb moment leading an inventor to create something new and novel. Every breakthrough takes lots of mini breakthroughs to bring it to market. A great example is the lithium-ion battery. The first breakthrough came in the late 1970s. It took nearly 30 years before the technology became widely adopted, in part because there were all sorts of downstream challenges to make it practical. A half century later, it is still the state-of-the-art battery technology.

What if that process could be condensed to, say, 10 years or five? And it’s not just speeding up that process, but making it more accessible. Anyone with a moderately powerful computer can do the early part of that kind of research now.

That is a huge opportunity for innovation. But more infrastructure is needed around this kind of research. The world should be investing in more labs that can synthesize molecules faster and move new ideas from silicon to the physical world. That also means building bigger supply chains to gain economies of scale.

AD
AD