A new study from Scale AI and the Center for AI Safety has found that even the most advanced AI agents — including Gemini 2.5 Pro, GPT-5, and Claude Sonnet 4.5 — can automate less than 3% of freelance work, revealing major gaps between current AI capabilities and human performance in real-world tasks. The findings challenge predictions that AI will soon replace large portions of white-collar work.
The study introduces a new benchmark, the Remote Labor Index (RLI). It measures how well AI agents perform economically valuable freelance tasks across 23 categories. Including graphic design, CAD, product design, and game development. Researchers provided AI systems with project briefs and files identical to those given to human freelancers, then evaluated whether the AI deliverables met realistic client expectations.
Key results include:
- Manus achieved the highest automation rate at 2.5%, followed by Grok 4 and Claude Sonnet 2.5 at 2.1%.
- None of the tested agents could autonomously complete diverse, multi-step projects at a level acceptable for paid freelance work.
- Nearly 43% of the U.S. workforce — about 73 million people — worked as freelancers in 2025, underscoring the economic significance of this trend.
While AI agents show potential for automating structured digital tasks, they remain far from handling the full complexity of remote work. This often requires human judgment, communication, and creativity. The results emphasize that current AI systems lack the adaptability and contextual reasoning needed for economically valuable autonomous labor.
As AI investment accelerates, experts caution that freelancers have little to fear—at least for now. The study highlights the “stark gap” between AI’s promise and its real-world performance. This suggests that artificial general intelligence (AGI) capable of replacing human freelancers remains a distant goal.

