OpenAI Introduces Two Compact Models Offering Near-Flagship Performance at Reduced Costs

OpenAI has launched its most capable small-scale models to date, GPT-5.4 mini and GPT-5.4 nano, on Tuesday. These new offerings significantly narrow the performance gap with flagship models while providing substantially lower latency and cost.

GPT-5.4 mini surpasses the previous generation GPT-5 mini across core capabilities including programming, reasoning, multimodal understanding, and tool usage. It operates at more than double the speed and approaches the performance of the larger GPT-5.4 model on benchmarks like SWE-Bench Pro.

GPT-5.4 nano is positioned as the most cost-effective and lowest-latency lightweight option. It is available exclusively via API for developers and is specifically designed for data classification, extraction, and simple programming subtasks.

The release of these two models aims to address the gap in real-time interactive applications where high latency has hindered the deployment of large models. This directly impacts fast-growing commercial markets such as programming assistants, AI agent systems, and multimodal applications.

GPT-5.4 mini is available starting today across three channels: the OpenAI API, the Codex platform, and ChatGPT. Its API pricing is set at $0.75 per million input tokens and $4.50 per million output tokens. It supports text and image inputs, tool usage, function calling, web search, file retrieval, computer control, and skill expansion, with a context window of 400,000 tokens.

On the Codex platform, GPT-5.4 mini consumes only 30% of the quota allocated to GPT-5.4, reducing the cost for developers handling simple programming tasks to approximately one-third of the flagship model's cost. Codex also supports delegating workloads to sub-agents running on GPT-5.4 mini, allowing tasks with lower reasoning density to be automatically routed to the more economical model.

For ChatGPT users, Free and Go tier users can access GPT-5.4 mini via the "+" menu by selecting the "Thinking" function. For other paying users, this model will serve as an automatic fallback option when the rate limit for GPT-5.4 Thinking is reached.

GPT-5.4 nano is currently available only via API for developer use, priced at $0.20 per million input tokens and $1.25 per million output tokens, making it the lowest-priced option among the two new models. OpenAI stated that the nano model is suitable for scenarios involving sub-agents that are orchestrated by higher-tier models to handle secondary support tasks.

Evaluation data released by OpenAI shows that GPT-5.4 mini performs particularly well in programming and multimodal tasks. On the SWE-bench Pro programming benchmark, the mini model scored 54.4%, narrowing the gap with GPT-5.4's 57.7% to just 3.3 percentage points, and significantly outperforming GPT-5 mini's score of 45.7%.

On the OSWorld-Verified computer control benchmark, the mini model achieved 72.1%, approaching GPT-5.4's 75.0% and substantially leading over GPT-5 mini's 42.0%.

In terms of tool usage capability, GPT-5.4 mini scored 93.4% on the τ2-bench telecom test, a marked improvement over GPT-5 mini's 74.1%. On the general intelligence test GPQA Diamond, the mini model scored 88.0%, while the nano model also reached 82.8%, both exceeding GPT-5 mini's score of 81.6%.

It is noteworthy that GPT-5.4 nano lags behind GPT-5 mini in some visual tasks, scoring 39.0% on OSWorld-Verified compared to the latter's 42.0%. However, the nano model still shows clear improvements over its predecessor in programming and tool-usage tasks.

OpenAI indicated that the design priority for the nano model is low latency and cost-effectiveness, not comprehensive performance, and developers need to weigh these factors against specific task requirements when selecting a model.

OpenAI's release materials emphasized the role of the two new models within a multi-model hierarchical system. Using its self-developed programming assistant Codex as an example, GPT-5.4 handles planning, coordination, and final judgment, while GPT-5.4 mini sub-agents concurrently process more granular subtasks such as codebase retrieval, large file review, and auxiliary document processing.

OpenAI stated that as smaller models become faster and more capable, developers no longer need to use a single model for all tasks. Instead, they can build systems where large models are responsible for decision-making, and small models execute tasks rapidly and at scale. OpenAI described GPT-5.4 mini as its most powerful small model to date for such workflows.

This architecture is particularly critical for high-concurrency workloads. In scenarios like programming assistants, screenshot analysis, and real-time image understanding, response latency directly impacts the user experience. The optimal choice is often not the most capable model, but the one that achieves the best balance between speed, tool reliability, and task performance.

For developers, the release of GPT-5.4 mini and nano signifies a clearer path to significantly reducing inference costs without compromising the overall intelligence level of their systems.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Tiger Brokers

OpenAI Introduces Two Compact Models Offering Near-Flagship Performance at Reduced Costs

Most Discussed