OpenAI launches new tools to help businesses build AI agents

On Tuesday, OpenAI released new tools designed to help developers and enterprises build AI agents — automated systems that can independently accomplish tasks — using the company’s own AI models and frameworks.

The tools are part of OpenAI’s new Responses API, which lets businesses develop custom AI agents that can perform web searches, scan through company files, and navigate websites, much like OpenAI’s Operator product. The Responses API effectively replaces OpenAI’s Assistants API, which the company plans to sunset in the first half of 2026.

The hype around AI agents has grown dramatically in recent years despite the fact that the tech industry has struggled to show people, or even define, what “AI agents” really are. In the most recent example of agent hype running ahead of utility, Chinese startup Butterfly Effect earlier this week went viral for a new AI agent platform called Manus that users quickly discovered didn’t deliver on many of the company’s promises.

In other words, the stakes are high for OpenAI to get agents right.

“It’s pretty easy to demo your agent,” Olivier Godement, OpenAI’s API product head, told TechCrunch in an interview. “To scale an agent is pretty hard, and to get people to use it often is very hard.”

Earlier this year, OpenAI introduced two AI agents in ChatGPT: Operator, which navigates websites on your behalf, and deep research, which compiles research reports for you. Both tools offered a glimpse at what agentic technology can achieve, but left quite a bit to be desired in the “autonomy” department.

Now with the Responses API, OpenAI wants to sell access to the components that power AI agents, allowing developers to build their own Operator- and deep research-style agentic applications. OpenAI hopes that developers can create some applications with its agent technology that feel more autonomous than what’s available today.

Using the Responses API, developers can tap the same AI models (in preview) under the hood of OpenAI’s ChatGPT Search web search tool: GPT-4o search and GPT-4o mini search. The models can browse the web for answers to questions, citing sources as they generate replies.

OpenAI claims that GPT-4o search and GPT-4o mini search are highly factually accurate. On the company’s SimpleQA benchmark, which measures the ability of models to answer short, fact-seeking questions, GPT-4o search scores 90% while GPT-4o mini search scores 88% (higher is better). For comparison, GPT-4.5 – OpenAI’s much larger, recently released model – scores just 63%.

The fact that AI-powered search tools are more accurate than traditional AI models is not necessarily surprising – in theory, GPT-4o search can just look up the right answer. However, web search does not render hallucinations a solved problem. Beyond their factual accuracy, AI search tools also tend to struggle with short, navigational queries (such as “Lakers score today”), and recent reports suggest that ChatGPT’s citations aren’t always reliable.

The Responses API also includes a file search utility that can quickly scan across files in a company’s databases to retrieve information. (OpenAI claims that it won’t train models on these files.) In addition, developers using the Responses API can tap OpenAI’s Computer-Using Agent (CUA) model, which powers Operator. The model generates mouse and keyboard actions, allowing developers to automate computer use tasks like data entry and app workflows.

Enterprises can optionally run the CUA model, which is releasing in research preview, locally on their own systems, OpenAI said. The consumer version of the CUA available in Operator can only take actions on the web.

To be clear, the Responses API won’t solve all the technical problems plaguing AI agents today.

While AI-powered search tools are more accurate than traditional AI models – a fact that is unsurprising given they can just look up the right answer – web search does not render AI hallucinations a solved problem. GPT-4o search still gets 10% of factual questions wrong. Beyond their accuracy, AI search tools also tend to struggle with short, navigational queries (such as “Lakers score today”), and recent reports suggest that ChatGPT’s citations aren’t always reliable.

In a blog post provided to TechCrunch, OpenAI said that the CUA model is “not yet highly reliable for automating tasks on operating systems,” and that it’s susceptible to making “inadvertent” mistakes.

However, OpenAI said these are early iterations of their agent tools, and it’s constantly working to improve them.

Alongside the Responses API, OpenAI is releasing an open-source toolkit called the Agents SDK, which offers developers free tools to integrate models with their internal systems, put in place safeguards, and monitor AI agent activities for debugging and optimization purposes. The Agents SDK is a follow-up of sorts to OpenAI’s Swarm, a framework for multi-agent orchestration that the company released late last year.

Godement said he hopes OpenAI can bridge the gap between AI agent demos and products this year, and that, in his opinion, “agents are the most impactful application of AI that will happen.” That echoes a proclamation OpenAI CEO Sam Altman made in January: that 2025 is the year AI agents enter the workforce.