Data has emerged that the launch of OpenAI’s artificial intelligence (AI) agent is imminent. Nevertheless, in response to the leaked information, the performance continues to be below expectations.
A soft engineer named Tibor Blaho reported on the twentieth (local time) through X (Twitter) that he discovered a shortcut key option within the ChatGPT client for Mac OS. This includes ‘Toggle Operator’ and ‘Force Quit Operator’.
Operator was generally known as the code name for the AI agent being developed by OpenAI through a Bloomberg report last November. It was also reported that this feature might be released in January 2025. The inclusion of this feature signifies that launch is imminent.
As well as, Nick Turley, OpenAI product manager, said in an interview with The Verge last December, “It’s unlikely that the OpenAI agent might be released individually,” and “Agent functions will regularly be added inside ChatGPT.”
As he said, the screenshot released by Blaho showed commands to call and block the operator in ChatGPT. Currently, this feature has reportedly disappeared.
Here, Blaho shared 4 screenshots on OpenAI’s website, saying that there are tables which have not yet been released. It is a comparison of the performance of OpenAI CUA, or OpenAI’s Computer Use Agent, Antropic’s ‘Claude 3.5 Sonnet Computer Use’, and Google’s ‘Mariner’.
Antropic and Google’s products are graphical user interface (GUI) agents. It handles tasks reminiscent of using a pc’s mouse and keyboard, searching web sites, and writing documents, identical to a human.
This can be information that’s already known. In October last yr, The Information reported that OpenAI demonstrated an internal agent function, through which it was in a position to display functions reminiscent of searching web sites and ordering pizza. The reason is that OpenAI’s agent can be a GUI agent.
Specifically, OpenAI Agent received a rating of 38.1% within the OSWorld benchmark, which imitates an actual computer environment, significantly surpassing Antropic’s 22%. Nevertheless, it falls far in need of the 72.4% recorded by humans.
The flexibility to perform actual tasks was found to be still significantly lacking. The command to create a Bitcoin wallet only had a ten% success rate, however the command to create an API proxy failed.
There was also news that OpenAI was delaying the discharge of its agent on account of concerns about jailbreak, and the leaked information also included data on safety. In line with this, most cases of stopping harmful output had a 100% success rate. It recorded 99% in stopping jailbreak and 97% in stopping prohibited financial activities.
It’s unclear what type of response might be received if the agent is released on this condition. Specifically, even though it is an early version, its performance continues to be far below that of humans.
Meanwhile, the day before, news broke that OpenAI CEO Sam Altman would present a ‘PhD-level super agent’ against the Trump administration on the thirtieth.
Also, per week ago, the initial agent function ‘Task’, which sets notifications and schedules recurring tasks, was integrated into ChatGPT.
Reporter Lim Da-jun ydj@aitimes.com