Silicon Valley teams up with the U.S. government on AI

The News

Microsoft will pitch in $20 million in cloud computing credits for a U.S. government pilot program to democratize access to AI tools for scientists, including researchers, physicists and biologists.

The National AI Research Resource, launched by the National Science Foundation on Wednesday, will help establish shared infrastructure aimed at AI advancements in healthcare, education, the environment, privacy, and other areas.

Microsoft was joined by a long list of other tech companies, including Nvidia, which will contribute $30 million in compute power and other resources. The effort, which has been in the works for years, is a sign of Silicon Valley’s increasingly close relationship with the public sector and academia, once a characteristic of the country’s Cold War strategy against the Soviet Union and a driving force of innovation.

The U.S. government’s investment in R&D is still a shadow of what it was during that period. But the rising importance of AI, combined with a new cold war with China, has made that partnership important once again.

Tech companies have also been spawning their own scientific research divisions, creating proof of concept examples of how techniques pioneered in Silicon Valley can be applied to labs. One such example is Microsoft Research’s AI4Science initiative.

It focuses on the intersection of machine learning and the natural sciences, such as using AI to speed up drug discovery. Some of its targets include tuberculosis and COVID.

Led by Christopher Bishop, whose background is in theoretical physics, he pivoted to neural networks about 35 years ago and set up Microsoft’s first lab outside the U.S. Excited about how artificial intelligence was going to impact scientific discovery, he made a pitch a few years ago to set up AI4Science.

“I feel like I’m in the right place at the right time,” Bishop said. “My career, at last, makes sense.”

We spoke to Bishop about how AI could help treat COVID mutations and how the technology is giving scientists the luxury of thinking of the big picture.

The View From Christopher Bishop

Q: There are many companies now trying to use AI to come up with new molecules and to go after new targets. How should we think about this kind of drug discovery?

A: We’re seeking to provide AI-based, machine learning-based tools and technologies. This collaboration with [China’s Global Health Drug Discovery Institute] is really a proof point. It’s collaborating with a world class team of people who can actually synthesize these molecules, test them, and evaluate them. It’s really easy to run stuff in computers, produce numbers, plots, and so on. The key is making this real in the real world. That requires external validation, that partnership with domain experts, and ultimately, the real world validation in the laboratory. The ultimate arbiter is the experiment. It’s the evidence. It doesn’t matter how beautiful your theory is, or how clever your AI is. It’s the experiment that decides whether you got it right or not.

Q: Did you pick certain disease targets?

A: Yes, COVID-19 and tuberculosis are the world’s two biggest killers in terms of infectious diseases. So they’re very obvious targets to go after. And then in partnership with GHDDI, they helped define targets for us, meaning proteins that we can then try to build molecules that will dock with it. A real breakthrough here is the way in which we’re doing generative AI. We’ve had a partnership, for example, with Novartis for the last five years doing generative machine learning to design new drug molecules. That’s going extremely well.

What’s very interesting about this particular model is the way we brought together some real state of the art technology, and showed that it leads to this very significant acceleration compared to what we had even a year or 18 months ago.

The actual molecules that we’ve produced are very interesting. They are comparable to, or even superior to, the best known state of the art [ones]. They’re not themselves a final drug; they’re a lead to more optimization, testing, refinement. We’d love to see, eventually, some descendants of those molecules go into clinical practice.

By using this generative model, we’re able to design new molecules that are actually target aware. We’re not claiming that’s the final molecule that’s going to cure TB. But the fact we could do that so rapidly is the real excitement here.

We’re kind of on the beginning of this S-curve of research disruption in this space. I kind of wish I was 22 and doing a PhD again. The next decade is going to be phenomenal. And a year is a very long time in the field right now.

Q: And this is a small molecule, meaning it could be stored for long periods at room temperature and taken in pill form. In the case of your work on COVID, is that also a small molecule?

A: It’s the same story. The difference is the protein target. The starting molecules are different and the end molecules are different. But it’s the same architecture.

Q: The COVID vaccines that we’re all used to are not small molecules. They’re these mRNA vaccines. So why wasn’t it possible to have a small molecule drug treat COVID? And this technology would make that possible where it wasn’t before?

A: There are lots of factors there. Obviously, the power of the mRNA approach was incredibly impressive, so I’m not in any way belittling that. It’s an alternative approach to tackling COVID. We know that COVID is constantly mutating and changing, so it’s a different sort of vector. Whether this ultimately is how we tackle COVID long term remains to be seen. The thing we’re excited about is that we could take a disease that’s killing millions of people, and in a relatively short time, find a new state of the art molecule, compared to ones that were previously known.

Q: Do you see a world where you would basically design small molecule drugs for every iteration of COVID as it comes out?

A: We’re very interested in working on an adjacent project, which is trying to predict where these mutations are going to go. Because they have a sort of random element. There’s natural selection that picks some of them as the survivors, and they can escape drugs. It will always be an arms race.

But can we look one step ahead in the arms race, look one chess move ahead, and actually try to understand where it could mutate and anticipate that? The power of machine learning, deep learning techniques now to look across huge datasets of past mutations and understand the pattern of those mutations is really exciting.

Q: You talked about testing this out in the wet lab and doing real world experiments. Where does that happen? I’ve heard a lot of times, it gets sent to China or there are robotic labs.

A: There are labs around the world with different expertise. The GHDDI lab is based in Beijing so that experimental work is done in China. The other thing I showcased at Davos was the new solid state lithium battery electrolyte that we’ve discovered, which uses 70% less lithium. Again, we wanted real world validation of that. In that case, it was done with the Pacific Northwest National Lab, which was one of the U.S. National Labs. They synthesized it, measured its properties and built some batteries. [At Davos,] I held up a little alarm clock that was actually being powered by one of these batteries. There’s lots of hype and lots of papers in this materials-discovery, drug-discovery space. For us, it’s got to have that real world test to validate it and to make it real, because that’s ultimately why we’re doing it. We’re doing it to change the real world, not just to write papers.

Q: We’re in this place now where you have the ability to come up with these drugs on a computer in the U.S. and then it can be synthesized in a remote lab. There’s this economies of scale happening where the iteration is faster because of that. You don’t have to have your own lab to get pretty far down the road of trying to figure this out.

A: It’s spot on that we have these two components. The machine learning really is a disruption. We’re seeing factors of [1,000-fold] or more acceleration or ability to do certain chemistry calculations across the spectrum. We now need to synthesize those compounds and measure them because the models are not perfect. They give us pretty good candidates, but they don’t exactly get it right. Where we have to test them, or we have to refine them, we need experiments.

In the more traditional approach, we can invent a new material, and then we could phone up some national lab and see if we could place a contract and have them synthesize it. You can imagine that taking a very long time, let’s say five years from now. The sticking point in the whole process could be that experimental process. So to be able to think about that experimental step being done more rapidly, being more tightly integrated.

It’s a continuous iterative loop of refinement with machine learning driving the whole thing, driving that scientific hypothesis process. We’ve got some idea of what molecules might work, let’s synthesize them, get some measurements back, feed that back into the machine learning, and now do another. It’s not just the AI. The AI is being accelerated phenomenally to the point now where we’re thinking, ‘we now need to get that experimental loop very tight.’

Q: There are people in Silicon Valley now with small teams trying to do drug discovery. They feel like they can take their AI expertise, hire someone who has more biology expertise, and create essentially a drug company started in the dorm room. Do you think that’s fantasy?

A: It’s a hugely exciting field. The only thing I feel really confident about is that the world will look very different 10 years from now. The nature of human expertise is changing. It’s tough for a human to know about every drug molecule that’s ever been synthesized, even the ones in the public domain. This is allowing the medicinal chemist, for example, to rise up a level and think more about that broad problem they’re trying to solve, how to tackle this disease. Then the machine learning can iterate very fast on very large numbers of molecules, far more than can be done by hand, as it were, in the more traditional paradigm. It’s very empowering because they can operate at a much more strategic level. Small startups could do very, very interesting things in the space. And I watch with great interest.

Q: A lot of startups are using [protein-structure prediction program] AlphaFold or some version of it. It’s not perfect but the predictions are becoming more accurate. How important is that?

A: We’ve worked quite a bit with David Baker’s lab, who also is in that space. The question of protein structure lies at the very heart of biology, but a protein is much more than a static structure. You can take a database of static structures that were measured sometimes at cryogenic temperatures, where you’ve got a single static structure for protein, and then learn to map from the amino acid sequence to the structure. The fact that it can be done accurately is a very important step forward.

But it’s only really the first step on the road to understanding proteins. Proteins are very dynamic. Look at the Spike protein on SARS-CoV-2, for example. The spike protein opens and closes, it spends a long time wiggling and jiggling in one configuration, then it transitions and spends a long time wiggling and jiggling in the other configuration.

In principle, we can model all of those dynamics. It would take a supercomputer the age of the universe just to model a spike protein properly. One of the areas we see for very big disruption from machine learning and AI is being able to understand those dynamic processes. How does that protein actually distort when the drug molecule binds with it, how do proteins interact? Some proteins don’t form static structures at all, parts of them are flailing around in there.

The behavior of proteins, as with everything in biology, is 10 times as complicated as you thought it was, even after taking this rule into account. There have been some impressive advances in the last few years, but there’s just a tremendous amount still to do before we really can be very predictive about biology.

Q: Moving over to the material science side of things, there seems to be less standing in the way there because you don’t have the FDA. So is that going to advance faster?

A: The field itself is a little less mature. The process of drug discovery, in a sense, is a very mature, well-established pipeline. And machine learning has been injected into that for a number of years. The current generation of AI techniques are proven to be way more powerful than things we had even a year ago. So you have a relatively mature, well-established process that’s been quite heavily disrupted.

Companies often don’t think of themselves primarily as a materials design company. But they will use materials and they will optimize them. On the other hand, as you say, there are some advantages. You don’t have to do all the human clinical trials, you don’t have to get the FDA approval. Obviously, you still think about safety checks and tests.

The other thing is just everything around us is made of atoms. Your own body is made of atoms, everything you can perceive apart from light and gravity is some assembly of atoms. How many different ways can I put atoms together? It makes the number of positions on a chessboard look small. It is truly beyond astronomical. Now you’ve got this powerful tool to search that space that is robustly a very solid three orders of magnitude (faster), in some cases, seven orders of magnitude.

For the battery electrolyte, the way that was discovered was we started with 32 million candidate materials — that is fictitious materials invented in a computer with a kind of random number generator — and then screened them with machine learning. The machine-learning screening pipeline is 1000 times faster than what people would have used until very recently. We could start from 30 million instead of 20,000. What we’re doing now is instead of generating these materials at random, generating them in a targeted way.

Q: What other types of materials do you think are the most exciting that we might be able to build?

A: I’m particularly interested in sustainability. I’m of a generation that’s done a lot to help screw the planet up. And I’d like to undo some of the mess that we’ve created. There are some obvious things: CO2 capture, metal organic frameworks as a class of porous, crystalline materials, but with organics inside where there’s a lot of flexibility around the design. They already work, but it’s a huge space to explore, and looking for more efficient ways to absolve the CO2. That’s one example. It’s the design of semiconductors, better lubricant — a lot of energy gets wasted in friction — the list is endless.

Q: So why is Microsoft funding this?

A: What Microsoft brings to the table is, first of all, AI expertise. The other is Microsoft’s compute platform. And there’s an opportunity to partner with organizations like GHDDI, with drug companies, with materials companies, and together make these transformational advances. There should be good business in that. But it also should be good for the planet and good for other companies as well. Everybody wins in this sort of partnership.

Q: In other words, there are revenue opportunities.

A: The potential is so huge. There are opportunities both for societal impact and for commercial purposes for many organizations, including Microsoft.

Silicon Valley teams up with U.S. government to democratize access to AI resources

The News

The View From Christopher Bishop