The next artificial intelligence frontier: Causal AI

The Scene

The advent of ChatGPT has ignited a debate over whether AI is advancing so rapidly that these powerful large language models will eventually evolve into artificial general intelligence, a computer algorithm that can think like a human. But a key missing component to AGI may be the ability for a computer program to determine cause and effect relationships. True intelligence, many argue, requires not just an ability to predict that A causes B, but to understand why.

While much of the media attention has been on large language models, the field of causal AI has gotten comparatively little. Darko Matovski, CEO of London-based causaLens, has become focused on this narrow field because he believes it’s the next frontier in AI.

Major advances in causal AI have allowed causaLens to offer some powerful analytics to a diverse range of companies. But the promise of the technology goes far beyond enterprise software. If causal reasoning is combined with large language models, it could have a major impact on humanity. Below is an edited conversation with him.

The View From Darko Matovski

Q: How is causal AI different from generative AI used by ChatGPT and others?

A: The way correlation-based approaches work is you throw a lot of data at them and they learn a mathematical function. Ultimately, you’re saying, ‘okay, here’s a new data point, where does it land?’ And it can infer from that curve where it should land, and predict the next word and so on. That’s how all of today’s machine learning works.

With causal AI, you’re learning a cause and effect relationship from the data. You no longer throw a lot of data at a mathematical equation. You use a causal diagram, which is a lot more computationally intensive than learning a correlation. Is altitude causing pressure? Or is pressure causing altitude? You wouldn’t be able to tell with just a number that goes between minus one and one. There’s no direction. A causal diagram has direction. So you know that it’s the altitude that causes pressure and not the other way around.

Q: How are you using causal AI right now?

A: One Fortune 100 company runs its entire supply chain through us. Until today, they had basically just a lot of models in production and dashboards and things like that. We’ve been able to create a way for them to ask these millions of causal models questions. Across all my million products, what are the top three things I can do to make sure that my inventory is the right amount?

Q: This stuff has been around for a long time, right? FedEx is known for these algorithms and Apple is known for its extremely efficient supply chain. How much better are these causal algorithms?

A: I’ll give you a very stupid example but I think it’s going to illustrate a point. If you collect data on the number of shark attacks and the number of ice cream cones sold at a beach in Sydney, you will find there’s a 99.9% correlation between the two. We, as humans, know that there’s no causal relationship between how many people get eaten by a shark and how many ice creams get sold. The causal driver is the warm weather.

So if you are trying to make a decision about how many ice creams to bring to the beach, you can’t really look at shark attacks. Causal models allow you to eliminate the shark attacks. When you throw a lot of data at mathematical functions, you are just going to get shark attacks and your predictions suffer in the real world. When you move this model to a beach in England where there are no sharks, this will fall over.

Q: So you still have a lot of noise in the data, but the model is just smarter?

A: That’s right. Causal models are not only better at predictions, they can do what is called counterfactuals, like ‘what would happen if my customers asked for 2x the product than last quarter?’ A traditional machine learning model would not know what to do because it’s never seen that data before.

Q: Just order double the number of whatever you’re ordering, right?

A: There are real world constraints. For example, a supplier for part A may only be able to provide a million units a month. So your model says order 2 million, but you can only order a million.

Q: How does the model know they can only supply a million a month?

A: The human can actually come in and say this is a supplier’s capacity. So rather than saying, predict how many units I need to order with this supplier, a causal model will tell you, I recommend that you order a million from this supplier and another million from this other supplier. So it gives you tangible actions as opposed to just predictions.

Q: What’s stopping customers from switching out legacy systems for newer causal AI algorithms?

A: A lot of companies do that. But before COVID-19, the legacy machine learning systems worked fine. It was just predicting these seasonal changes really well, because every year they were the same. And that’s when correlation-based machine learning does really well. It’s able to repeat the patterns that I’ve seen before.

When COVID-19 hit, supply chains got disrupted and everything failed. So they came to us and said, ‘we need causality here, because our correlation stuff is picking up the ice cream sales and shark attack stuff.’ Now they can navigate changing environments, because they have the true cause and effect relationships in the data, rather than the ice creams and the sharks hidden in there.

Q: Do you ask customers about a time they screwed up, and show them how your models would have gotten it right?

A: All the time. We had a customer in the telco space. They said we have machine learning that predicts who is going to churn [or cancel their service]. But we don’t actually care about that. What we care about is what can we do to prevent churn? What levers do we pull?

It was counterintuitive because the more usage customers had on the service, the greater the churn was. More usage usually means more retention, so they were very puzzled. We were able to show them with a causal model that customers were given promotional credits when the data showed they were about to turn. In other words, the credits weren’t working. And then you can say, well, actually, to prevent churn, maybe we need to do something else. Maybe we need to call them.

Q: How much historical data do you need to figure out cause and effect?

A: Causal models are less data hungry than traditional machine learning. In traditional machine learning, you need millions and billions of examples to learn the pattern. With causal models, you may get away sometimes with as little as 20 data points, because you can get the human to validate it. In fact, you can even build causal models with no data.

How is that possible? Imagine a manufacturing process. You have a machine with lots of cogs and wheels. Imagine you have a spinning thing connected to another spinning thing through a cable. This part of the machine has never failed, so as far as statistical correlation models are concerned, failure cannot happen.

In a causal model, an expert that understands the machine can say, this is a causal driver and there’s a thing that is connecting these two. So there’s a possibility that this cause and effect relationship can lead to failure. And we can encode this in the model, even though we’ve never seen it in the data before.

Q: Do you think these models will be applied in everything?

A: We’re focused on a couple of areas. Pricing and promotion in retail is very big for us. What if my competitor sells their toothpaste at $1.50? What drives the price of Colgate? If you understand that, you can set the right price.

In manufacturing, it’s figuring out the root cause of failure. Physically-inspired systems are really good fits for causality. We can create a digital twin of an entire manufacturing process through cause and effect relationships. And then we can ask the digital twin, how do we tweak the machine in order to produce the most amount of products. We’ve been able to help a leading U.S. asset manager save $240 million in customer churn. They’ve been trying to predict it. We were able to decide what to do about it.

Q: What was the answer?

A: For you, maybe calling you three times this week will make you feel great. But there were these “sleeping dogs” where, if you call them, they’re more likely to churn. In that case, it was better not to do anything.

Q: Could this be used in education, healthcare, and other areas?

A: We’ve done some work with the Mayo Clinic in the U.S. to show that causal AI can help discover cancer biomarkers much more effectively. So these are really, really complex datasets with loads of confounders, loads of spurious correlations.

Q: What about clinical trials?

A: We published a paper on this. The problem with clinical trials is they’re super expensive, and very slow. Causality helps you in two ways. One, to design the optimal trial. What is the sequence of steps in order to collect enough information to do causal inference on it? And then it helps you infer causality. Ultimately, we could halve the costs of a clinical trial.

Q: Do we need causality to create artificial general intelligence?

A: If we assume causal understanding is a hallmark of human intelligence, and MLMs can’t do causality, we’re saying they’re not intelligent. We really do not have artificial general intelligence today and we’re very far from it. Causal AI is just a building block towards AGI. But there’s still a lot of other building blocks missing. Understanding natural language is also a building block, so you’re starting to get the building blocks.