• D.C.
  • BXL
  • Lagos
  • Dubai
  • Beijing
  • SG
rotating globe
  • D.C.
  • BXL
  • Lagos
Semafor Logo
  • Dubai
  • Beijing
  • SG


Dec 1, 2023, 12:34pm EST
tech

Has ChatGPT really gotten ‘lazy’?

Screenshot/OpenAI
PostEmailWhatsapp
Title icon

The News

While OpenAI was undergoing a leadership crisis last week, the startup’s users were contending with a different problem: They felt like ChatGPT, now a year old, was getting lazier.

A number of entrepreneurs, tech executives, and other professionals say that OpenAI’s most advanced large language models have begun refusing to answer some prompts or are instead giving people instructions on how to complete the tasks by themselves.

When startup founder Matthew Wensing asked GPT-4 to generate a list of upcoming calendar dates earlier this week, it initially suggested that he try using a different tool to find the answer, according to screenshots he shared on X. In another case, Wensing asked the chatbot to produce roughly 50 lines of code. It responded with a few examples, which it said Wensing could use as a template to complete the work without the help of AI.

AD

“GPT has definitely gotten more resistant to doing tedious work,” Wensing wrote, echoing similar experiences described by OpenAI users across the internet.

It’s not clear how widespread these issues are, but at least some of them can be attributed to a known software bug, according to public statements made by two OpenAI employees. The company did not respond to requests for comment.

Title icon

Louise’s view

ChatGPT is inherently a black box, making it impossible for users to know exactly what’s going on under the hood. That mystery can be enticing for researchers and people using the tool for entertainment purposes, but it’s detrimental if you’re trying to incorporate AI into consistent, reliable workflows.

AD

OpenAI regularly announces new features, but it currently doesn’t disclose the fine-tuning adjustments it makes to its models over time, which can have a major impact on their performance. In another incident earlier this year, those small tweaks led some users to incorrectly conclude that ChatGPT was losing its capabilities and getting “dumber” over time.

But the current accusations over “laziness” show there’s a big difference between capabilities and behavior. ChatGPT clearly still has the ability to perform the same work, but the prompts people are using aren’t getting the same results. In some cases, asking the chatbot to complete the task another way is enough to overcome its supposed lethargy.

Figuring out new prompting strategies, however, takes time and experimentation. Going through that process periodically might not be worth it to some people, especially if you’re paying $20 a month for a subscription to OpenAI’s premium service, ChatGPT Plus.

AD

There’s also another thing worth taking into consideration: Running ChatGPT is incredibly expensive for OpenAI. The research firm SemiAnalysis estimated in February that ChatGPT was costing the startup nearly $700,000 a day, and that was before it released its most advanced models, GPT-4 and GPT-4 Turbo.

The “lazier” the chatbot gets, the more money OpenAI saves, and the less strain there is on its systems — a problem that OpenAI has admitted it is currently struggling to overcome. Two weeks ago, it temporarily halted people from signing up for ChatGPT Plus after a wave of new users “exceeded our capacity,” CEO Sam Altman said.

There’s no evidence that the current problems people are experiencing with OpenAI’s models are the result of a deliberate corporate strategy. But the reality is that in many ways, AI is currently limited not by what the technology is capable of, but the amount of resources it takes to make it work.

Title icon

Notable

  • Users struggling with ChatGPT’s “laziness” could try using ChatGPT Classic, the original version of GPT-4 with no additional updates, suggested The Neuron tech newsletter.
  • Google researchers recently got ChatGPT to spit out its training data by simply asking it to repeat the same word over and over again, showing there’s still a lot to learn about how exactly large language models work.
Semafor Logo
AD