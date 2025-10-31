With all the talk of AI replacing humans at work, a new study suggests the technology couldn’t if it tried. Scale AI and the Center for AI Safety (CAIS) put AI to the test completing various real-world freelance projects, including tasks across product design, game development, data analysis, and scientific writing.

Manus performed the best, but with only 2.5% of its tasks considered acceptable work by a reasonable client, as determined by a panel of 40 judges. Gemini 2.5 Pro came up last, with only 0.8% of its work meeting expectations. The data suggests that while models are improving on benchmarks, there is still significant work to be done in meeting on-the-ground quality demands.