The generative AI boom runs through this behind-the-scenes startup

The Scene

Anyscale is one of the most important artificial intelligence companies that you may have never heard of, though it’s used by the biggest names in the business, including OpenAI.

The “picks-and-shovels” startup makes it easy for other companies to train and deploy AI in the cloud while using its open-source software called Ray. That gives co-founder Robert Nishihara, who has a computer science PhD from Berkeley, a bird’s eye view of the burgeoning field.

New AI products are birthed and then hatched on the Anyscale platform. Nishihara sees a future in which companies like OpenAI increase the size of their models exponentially and use Anyscale products to make it happen faster and easier. I talked to him about what he’s seeing and where AI is going in the edited conversation below.

The View From Robert Nishihara

Q: What’s the next big frontier in AI?

A: The next generation of LLMs will be multimodal, meaning they’re not just working with text data, but also images and video. And this is compelling because it opens up a lot of applications. If I’m using a customer support chatbot, I may want to include a photo of a product that has some defect.

But text data is pretty small. Image data is bigger. Video data is even bigger. So these applications are going to become far more data intensive. Every time these innovations happen, the infrastructure challenges get harder. And they’re going to keep getting harder.

When OpenAI released GPT-4, they demonstrated these multimodal capabilities. But they haven’t shipped that yet. It was a great demo and they’re going to ship it, but it’s hard to make these things real. There’s just a lot of infrastructure challenges.

Q: Is this a matter of ramping up to handle the demand? Or do we need some breakthroughs in energy production or GPU efficiency?

A: There will be breakthroughs on all of those fronts. I don’t think we need those breakthroughs in the near term to make it possible. To get to the point where AI is truly adopted by every business and every industry, you need it. But you don’t need it for OpenAI to ship their product.

These models are still inefficient compared to what we’ll be playing with in a few years. There’s significant R&D effort going into making these models faster and cheaper and more efficient. There’s a lot of low hanging fruit there.

Q: What’s the killer use case for multi-modal AI?

A: This might be a little further out, but everything related to robotics will be multimodal. If you want it to perform tasks in the real world, there are many tasks where text is not enough. A lot of stuff I say is going to be descriptions of the physical world. It’s going to have to fuse that language data and that video data.

Self-driving cars may be one of the earlier places because that’s the first place robotics is really being rolled out. You’ll have self-driving cars far before you have household robots. Autonomous driving is going to be one where a lot of the communication with the car is done in English, but it has to interpret that in the context of everything it sees visually.

Q: Are you seeing the size of these models get bigger or is it a bipolar world where some companies want to make them huge and some want to make them smaller?

A: OpenAI and some of the leaders in the space are going to continue to push the limits of building the most capable models out there. That generally means making them bigger and bigger.

Given the money and compute that you put into training these models, it’s not necessarily the case that bigger is always better. Maybe if you have infinite compute, then that’s the case. But if you have a given data set size, a given compute budget, then there’s going to be an optimal model size as well. But they are pushing the limits there.

And these models are getting bigger. For every given model, and given set of capabilities that people achieve, there’s also a huge amount of effort going into achieving the same thing with less. Because people want to run these on phones. They want to run them on edge devices. They want to pay less money. So you have both of these directions happening in parallel.

A lot of our energy goes into cost efficiency. We talk to businesses that often have two phases. One phase where what matters is validating that the product idea makes sense. They want to iterate quickly, they want to ship a product, they don’t want to spend a lot of time training a model.

So for that, you want to use just an API, integrate that into your products. Once you’ve proven it out, then it’s about making it scale. And that means now I have to make it cheaper. I have to reduce latency.

Q: So are customers starting out just using something like OpenAI and then fine tuning their own open source models?

A: Fine tuning comes somewhere in between. We released a fine tuning API. People normally think of fine tuning as a technique for improving the results. That’s true, but another way to think about it is in the role it plays in reducing costs.

Because if you have this fine tuning ability to make the model better, then that can actually make small models, which aren’t usually as good, viable. And small models can be much cheaper.

Q: Are a lot of your customers in this transition phase right now?

A: I would say so. A lot of companies have at least shipped some kind of chatbot and built some internal tool for you, trying to streamline customer support or whatever. Airbnb, for example, used Ray to build their LLMs. And they’re now in this phase of making it faster and cheaper.

Q: One problem companies are having now is what model should I use. There are thousands of them. Will Anyscale be able to help with that?

A: Model evaluation in general is a big challenge and it’s going to continue to be a big challenge. We can help with that. But there won’t be one answer. It will be problem dependent. You’ll have a lot of different models that are fine tuned for different purposes. It’s not the case, for example, that GPT-4 is always better than GPT-3.5. There are some problems where GPT-3.5 can actually do better.

Q: Are there places you’re seeing limitations of what these LLMs can do?

A: There are a lot of limitations. Just knowing when they’re right or wrong is a limitation and a pretty important one. One interesting thing is that it’s possible that these models actually know when they’re hallucinating and we just haven’t figured out how to ask them or prompt them in the right way to get them to not hallucinate.

It’s always a battle to make your models perform better. With LLMs, you see huge gains just by phrasing the question in a different way. If you tell it to pretend to be Albert Einstein, it will be smarter. Or for math problems, if you tell it ‘think step by step,’ it does way better. It suggests that there’s a lot of latent capabilities in these things that are not being fully exploited. We can possibly get a lot of mileage from just better prompting.

Q: What are you using it for?

A: Certainly writing-related tasks. Such as ‘here’s my blog post. Help me think of a catchy title.’ Or ‘my intro paragraph is a bit verbose. Help me suggest some alternatives.’ Sometimes if I’m stuck, I’ll just ask it to generate an idea. I do ask it for advice. It’s hit or miss.

Q: What advice do you ask for?

A: Things like, ‘I’d like to give some feedback to an employee.’ I’ll write it the way I would say it and ask for advice on how to make it come across better. It’s pretty good at that stuff.

I do use it for coding-related tasks. I’ll try running some code and get an error message. I just copy and paste the error message, and 50% of the time, it’ll tell me what the solution is.

Q: What about at Anyscale?

A: We are using LLMs to basically build better debugging tools. I have a log file that’s thousands of lines long. And in fact, I have log files from all these different machines in my cluster. And I have these dashboards showing different metrics. Debugging can take hours or days. We’re trying to create a magical debugging experience for people.

Q: How is Anyscale doing from a business standpoint?

A: I can’t share revenue numbers, but we think about how we’re doing on two different axes. There’s open-source adoption with Ray. And there’s our two products. With Ray, since the start of the year, adoption has grown by over six times. I attribute this to AI changing a lot and Anyscale becoming more important. Ray is kind of starting to emerge as just the standard for scaling AI, especially among tech companies. For the Ubers, the Spotifys, Instacarts, and Pinterests, Ray is quite successful in that space.

Q: You’re a picks and shovels company, which is a good place to be.

A: We’re at a slightly different layer in the stack, sitting above the cloud providers and above Nvidia. We’re trying to provide the absolute best infrastructure for LLMs and AI. The stuff we’re building is hard for us to build, which is also why it’s hard for our customers to build, which is why we can add a lot of value.