Updated Sep 13, 2023, 12:13pm EDT

Google gives public data project an AI makeover


Sign up for Semafor Technology: What’s next in the new era of tech. Read it now.

Title icon

The Scoop

Google’s effort to make public data easily accessible is getting an AI supercharge so anyone can conveniently parse complex information files, the company exclusively shared with Semafor.

On Wednesday, Google unveiled the new interface that allows Data Commons users to ask simple questions to retrieve and analyze data in ways that would have, in the previous version, required complex queries or coding.

The feature is an example of how large language models could transform the way people use computers. Since ChatGPT burst onto the scene just under a year ago, big companies and startups alike have been trying to incorporate the technology into existing products in order to reduce their complexity and save time.

“I think we’re going to see profound innovations around user interface and human-computer interaction,” said James Manyika, Google’s senior vice president for research, technology and society.

The new interface is based on Google’s large language model, known as PaLM, which can now open up complex data inquiries to anyone who’s curious about how the world works, he said. It can understand a question like, “what is the correlation between poverty and diabetes in India?” and then retrieve and analyze the relevant data.


But unlike chat bots such as Google’s Bard or ChatGPT, the interface is limited to what’s available in Data Commons and its answers cite those sources. Google executives told Semafor that the technique eliminates the possibility of “hallucinations,” when large language models return inaccurate information.

Title icon

Know More

R.V. Guha, a Google Fellow who created Data Commons, said the goal is to help policymakers make decisions in everything from public health to climate change. “This data is sitting in so many different silos, it’s really very difficult for the people making long-term policies to put all of this stuff together easily,” he said.

Guha believes the easily-accessible language interface will open up data analysis to more people. “What the language models and this whole layer does is it acts as a translator,” he said

TechSoup, a nonprofit that helps other nonprofits incorporate the use of technology, has been testing out the new interface for Data Commons. “These small organizations are never going to have data engineers or data scientists working for them,” said Marnie Webb, TechSoup’s chief community impact officer. The new large language model makes it as if “they had somebody sitting there working to pull that data down,” she said.

Part of the excitement for Guha is not knowing exactly how the data will be used or by whom. Guha has had a long career in the industry creating impactful new technologies, such as RSS, which was integral to the growth of the web. “Somebody told me recently that RSS is keeping the podcast ecosystem open. In ‘99, I would have never dreamt of something like a podcast,” he said. “It can go places where you cannot imagine.”


Guha and Manyika are quick to point out that Data Commons is still a work in progress, building out the system so that all of the data is accessible through the language interface.

Another challenge is that a lot of publicly available data is still not easily available. Data sources need to be standardized so they are readable by Data Commons and municipalities, especially in developing countries, don’t always have the resources to make that happen.

When Semafor asked Guha whether LLMs could be used not just to access data, but to automatically standardize it as well, Guha smiled and said he may have more to share in the near future.

Title icon

Reed’s view

The most significant part of Google’s new feature is what it suggests about the future of computing.

It’s not just that it’s easier to ask a computer to do something than it is to click on graphical user interfaces. It’s that it opens up the power of computers in ways that, in the past, were inaccessible to all but a small group of people who have the time to learn Python and other coding languages, or to take classes to learn every nook of powerful software.


Our computers — both the ones in our pockets and in our laps — are increasingly amazing machines and we barely use a fraction of their power in our daily lives. And what we use them for is determined by a relatively small cross section of the population that happens to work in the software business. Imagine if that were to change and the unrealized creativity of humanity was more, well, realized.

The other takeaway is that when Google makes data available in new ways, strange and unexpected things tend to happen. I recently took my family to Grand Teton National Park. We hadn’t seen a Grizzly bear after a few days in the park, and then someone gave us a tip to look at Google maps in traffic view. There was a random red spot on the map, indicating people had stopped their cars for some reason. We drove to that spot and sure enough, a crowd of people had pulled over to watch a family of grizzly bears grazing in a meadow.

It will be interesting to see what people do with new access to public data.

Title icon

Room for Disagreement

Christian Bauersachs, in his analysis of large language model interfaces, outlined challenges ahead in implementing them.

“The integration of LLMs as a primary user interface is exciting, but it’s not without challenges,” he wrote. “The accuracy and reliability of LLMs in understanding diverse user inputs are crucial. Misinterpretations could lead to errors, especially in data-sensitive tasks. Moreover, the shift to a verbal or text-based UI may not be suitable for all applications, especially where visual feedback is essential.”