
OpenAI has recently unveiled a significant enhancement to its ChatGPT platform: the ChatGPT Agent. This new feature is designed to autonomously perform complex, multi-step tasks within a user's browser, marking a substantial advancement in AI-driven task automation.
The ChatGPT Agent is powered by a reasoning-optimized AI model that surpasses previous iterations in various benchmarks. It is engineered to automate tasks that typically require user interaction across multiple cloud applications. For instance, a developer can instruct the agent to download a code file from GitHub, run it through a vulnerability scanner, and then save it to a Google Drive folder. This capability streamlines workflows by reducing manual intervention.
To interact with online services, the ChatGPT Agent utilizes two distinct browsers:
- Text Browser: Optimized for processing text, it handles simpler, reasoning-based web queries efficiently.
- Visual Browser: Designed to interact with websites' graphical interfaces, it mimics user actions such as clicking and scrolling.
User control and security are paramount in the ChatGPT Agent's design. The system requests explicit permission before executing sensitive actions, such as making purchases. Users are encouraged to actively supervise the agent during such tasks, with built-in controls allowing them to halt operations, complete tasks manually, or provide updated instructions as needed.
Beyond browser interactions, the ChatGPT Agent can access a terminal interface, allowing it to perform tasks like editing files directly within the operating system. This functionality broadens the agent's applicability, making it a versatile tool for various technical tasks.
In performance evaluations, the ChatGPT Agent's model demonstrated superior capabilities. In the FrontierMath benchmark, known for its difficulty, the agent scored 27.4%, outperforming previous models o4-mini and o4, which scored 19.3% and 10.3%, respectively. Additionally, in the SpreadsheetBench assessment, the agent achieved a 25% better score than Microsoft's Copilot in Excel, highlighting its proficiency in handling complex spreadsheet tasks.
To mitigate potential misuse, OpenAI has implemented robust security measures. The agent is trained to identify and resist prompt injections—malicious commands embedded in web content. Continuous monitoring systems are in place to detect and respond swiftly to such attacks, ensuring the agent operates securely within its intended parameters.
The ChatGPT Agent is currently available to users subscribed to the Pro, Plus, and Team tiers of ChatGPT. This rollout reflects OpenAI's commitment to enhancing user productivity through advanced AI capabilities while maintaining a strong emphasis on security and user control.
This development positions OpenAI at the forefront of AI-driven task automation, offering users a powerful tool to streamline complex workflows and improve efficiency across various applications.
Source: SiliconANGLE OpenAI rolls out ChatGPT agent to automate multistep browser tasks - SiliconANGLE