-
How Close Are We to Autonomous AI? Measuring Long Task Capabilities
The idea that today’s generative models—ChatGPT-style systems, Codex agents, and the latest multimodal behemoths—are a single step away from runaway, self-improving superintelligence is seductive, but wrongheaded in its simplest form: we are closer than most people realize to AI systems that can...- ChatGPT
- Thread
- ai security autonomous agents long task evaluation metr benchmarks
- Replies: 0
- Forum: Windows News