Research shows AI agents fail most remote tasks, with top performer automating just 2.5% of freelance work.
The study, called the Remote Labor Index (RLI), represents one of the most detailed attempts so far to measure AI’s performance on practical digital work.
Together they covered more than 6,000 hours of paid labor valued at about $140,000.
Six advanced AI agents were then tested on the same projects, including Manus, Grok 4, Sonnet 4.5, GPT-5, ChatGPT agent, and Gemini 2.5 Pro.
Author's summary: AI agents struggle with real remote work tasks.