OpenAI introduced on Wednesday the launch of o3 and o4-mini, new AI reasoning fashions designed to pause and work by way of questions earlier than responding.
The corporate calls o3 its most superior reasoning mannequin ever, outperforming the corporate’s earlier fashions on checks measuring math, coding, reasoning, science, and visible understanding capabilities. In the meantime, o4-mini presents what OpenAI says is a aggressive trade-off between value, velocity, and efficiency — three components builders usually think about when selecting an AI mannequin to energy their functions.
In contrast to earlier reasoning fashions, o3 and o4-mini can generate responses utilizing instruments in ChatGPT similar to net looking, Python code execution, picture processing, and picture era. Beginning right this moment, the fashions, plus a variant of o4-mini referred to as “o4-mini-high” that spends extra time crafting solutions to enhance its reliability, can be found for subscribers to OpenAI’s Professional, Plus, and Staff plans.
The brand new fashions are a part of OpenAI’s effort to beat out Google, Meta, xAI, Anthropic, and DeepSeek within the cutthroat world AI race. Whereas OpenAI was first to launch an AI reasoning mannequin, o1, rivals rapidly adopted with variations of their very own that match or exceed the efficiency of OpenAI’s lineup. In actual fact, reasoning fashions have begun to dominate the sector as AI labs look to eke extra efficiency out of their programs.
O3 practically wasn’t launched in ChatGPT. OpenAI CEO Sam Altman signaled in February that the corporate supposed to dedicate extra sources to a classy different that included o3’s expertise. However aggressive strain seemingly spurred OpenAI to reverse course ultimately.
OpenAI says that o3 achieves state-of-the-art efficiency on SWE-bench verified (with out customized scaffolding), a check measuring coding skills, scoring 69.1%. The o4-mini mannequin achieves comparable efficiency, scoring 68.1%. OpenAI’s subsequent greatest mannequin, o3-mini, scored 49.3% on the check, whereas Claude 3.7 Sonnet scored 62.3%.
OpenAI claims that o3 and o4-mini are its first fashions that may “assume with photos.” In follow, customers can add photos to ChatGPT, similar to whiteboard sketches or diagrams from PDFs, and the fashions will analyze the pictures throughout their “chain-of-thought” section earlier than answering. Due to this newfound means, o3 and o4-mini can perceive blurry and low-quality photos and may carry out duties similar to zooming or rotating photos as they purpose.
Past image-processing capabilities, o3 and o4-mini can run and execute Python code straight in your browser through ChatGPT’s Canvas characteristic, and search the online when requested about present occasions.
Along with ChatGPT, all three fashions — o3, o4-mini, and o4-mini-high — will probably be out there through OpenAI’s developer-facing endpoints, the Chat Completions API and Responses API, permitting engineers to construct functions with the corporate’s fashions at usage-based charges.
OpenAI is charging builders a comparatively low value for o3, given its improved efficiency, at $10 per million enter tokens (roughly 750,000 phrases, longer than the Lord of the Rings collection) and $40 per million output tokens. For o4-mini, OpenAI is charging the identical as o3-mini, $1.10 per million enter tokens and $4.40 per million output tokens.
Within the coming weeks, OpenAI says it plans to launch o3-pro, a model of o3 that makes use of extra computing sources to supply its solutions, completely for ChatGPT Professional subscribers.
OpenAI CEO Sam Altman has indicated o3 and o4-mini could also be its final stand-alone AI reasoning fashions in ChatGPT earlier than GPT-5, a mannequin that the corporate has mentioned will unify conventional fashions like GPT-4.1 with its reasoning fashions.