A Stanford study made headlines this week with the prediction that the largest AI models could run out of new text to scrape by the end of this year — but the study’s leader said he believes AI companies won’t really feel the crunch until at least the end of the decade.

Epoch, an AI forecasting research institute, projected the amount of data required to train models as they continue to scale in size, and compared this to how much data is expected to be published online in the future. Its results appear in the AI Index Report published by the Stanford Institute for Human-Centered AI this week.

The amount of data on the internet is growing at a pace of about 7% per year, Epoch’s director, Jaime Sevilla, said, while the amount of data AI is being trained on is increasing at 200% per year. If the biggest models have ingested most of the content already, there won’t be much new information for them to learn from.

But while the Stanford study says AI companies could run out of text within months, Epoch has adjusted its predictions and is planning to publish a new research paper with updating its estimates, and believes that there will still be enough public data left to train AI models “five or six years from now,” he said.

The shift comes because analysts initially only considered the high-quality text from reputable sources that have been edited by people for accuracy like news articles and Wikipedia pages.

“We’re less sure about how important it’s going to be to train only on high-quality data. We think that broader kinds of data might still be useful, perhaps not to the same degree, but they might still be enough to continue the pace of scaling so we have become a bit more optimistic,” Sevilla said.