On Tuesday, OpenAI released new tools designed to help developers and enterprises build AI agents — automated systems that can independently accomplish tasks — using the company’s own AI models and frameworks.
The tools are part of OpenAI’s new Responses API, which lets businesses develop custom AI agents that can perform web searches, scan through company files, and navigate websites, much like OpenAI’s Operator product. The Responses API effectively replaces OpenAI’s Assistants API, which the company plans to sunset in the first half of 2026.
The hype around AI agents has grown dramatically in recent years despite the fact that the tech industry has struggled to show people, or even define, what “AI agents” really are. In the most recent example of agent hype running ahead of utility, Chinese startup Butterfly Effect earlier this week went viral for a new AI agent platform called Manus that users quickly discovered didn’t deliver on many of the company’s promises.
In other words, the stakes are high for OpenAI to get agents right.
“It’s pretty easy to demo your agent,” Olivier Godement, OpenAI’s API product head, told TechCrunch in an interview. “To scale an agent is pretty hard, and to get people to use it often is very hard.”
Earlier this year, OpenAI introduced two AI agents in ChatGPT: Operator, which navigates websites on your behalf, and deep research, which compiles research reports for you. Both tools offered a glimpse at what agentic technology can achieve, but left quite a bit to be desired in the “autonomy” department.
Now with the Responses API, OpenAI wants to sell access to the components that power AI agents, allowing developers to build their own Operator- and deep research-style agentic applications. OpenAI hopes that developers can create some applications with its agent technology that feel more autonomous than what’s available today.
Using the Responses API, developers can tap the same AI models (in preview) under the hood of OpenAI’s ChatGPT Search web search tool: GPT-4o search and GPT-4o mini search. The models can browse the web for answers to questions, citing sources as they generate replies.
OpenAI claims that GPT-4o search and GPT-4o mini search are highly factually accurate. On the company’s SimpleQA benchmark, which measures the ability of models to answer short, fact-seeking questions, GPT-4o search scores 90% while GPT-4o mini search scores 88% (higher is better). For comparison, GPT-4.5 – OpenAI’s much larger, recently released model – scores just 63%.
The fact that AI-powered search tools are more accurate than traditional AI models is not necessarily surprising – in theory, GPT-4o search can just look up the right answer. However, web search does not render hallucinations a solved problem. Beyond their factual accuracy, AI search tools also tend to struggle with short, navigational queries (such as “Lakers score today”), and recent reports suggest that ChatGPT’s citations aren’t always reliable.