Inside OpenAI’s ‘Deep Research’ Model

Good morning. It’s Monday, February third.

Did you realize: On this present day in 1986, the term “vaporware” was first utilized by Philip Elmer-DeWitt in a TIME magazine article? The term is now commonly used to explain software that has been long announced but hasn’t actually been released.

OpenAI’s Deep Research
Reasoning Models Suffer From “Underthinking”
Stories You May Have Missed
3 Latest AI Tools
Latest AI Research Papers

You read. We listen. Tell us what you’re thinking that by replying to this email.

Transform your marketing effortlessly with NEX’s Marko.

Create on-point campaigns in minutes using AI-powered tools for content, strategy, and design—multi function platform.

Loved by 20k+ pros, it’s your shortcut to brand consistency and faster results.

Today’s trending AI news stories

OpenAI Launches Latest ChatGPT agent for ‘deep research’ targeting skilled analysts

OpenAI has introduced “deep research,” a brand new ChatGPT agent designed to tackle complex research tasks across fields like finance, science, policy, and engineering, in addition to for consumers making high-stakes decisions. Unlike basic AI queries, this agent pulls from multiple sources, synthesising information to deliver more detailed and reliable insights.

Today we’re launching our next agent able to doing give you the results you want independently—deep research.

Give it a prompt and ChatGPT will find, analyze & synthesize a whole lot of online sources to create a comprehensive report in tens of minutes vs what would take a human many hours.

– OPENAI (@OpenAI)
1:04 am • Feb 3, 2025

Available now to ChatGPT Pro users with a 100-query limit monthly, deep research will soon extend to Plus and Team users. Currently text-based, future updates will bring images, visualisations, and deeper analytic features. The tool is powered by OpenAI’s o3 “reasoning” model, optimised for web browsing and data evaluation. While impressive, the model isn’t flawless—errors and misinterpretations still occur. To mitigate misinformation, all outputs include full citations. Read more.

Reasoning models like Deepseek-R1 and OpenAI o1 suffer from ‘underthinking’, study finds

The variety of tokens generated and the variety of “thoughts” (solution approaches) for various models. On average, o1-like LLMs use 225 percent more tokens for incorrect answers than for proper ones, which is as a result of 418 percent more frequent thought changes. | Image: Wang et al.

A recent study by Tencent AI Lab, Soochow University, and Shanghai Jiao Tong University reveals that reasoning models equivalent to Deepseek-R1 and OpenAI’s o1 fall victim to “underthinking” — prematurely discarding viable solutions, which ends up in inefficiencies in resource usage and suboptimal accuracy. These models continuously alter their problem-solving approaches, especially in additional complex tasks, leading to a 225% increase in computational tokens and 418% more strategy shifts when delivering incorrect answers.

Astonishingly, 70% of those errors involved untapped lines of reasoning. To mitigate this, the researchers introduced a “thought switching penalty” (TIP) mechanism, which discourages premature shifts, enhancing accuracy and consistency in math and science challenges without requiring significant modifications to the models. Read more.