Domestic Large Models See Intensive Releases, Driving Inference Demand Through Scale Applications

Zheshang Securities released a research report stating that recent intensive launches of domestic large models are underway. The new DeepSeek model is undergoing灰度 testing, boasting a 1 million token context window, a significant increase from the previous maximum of 128K. GLM-5 has been launched on the Zhipu website, focusing on enhancements in programming and intelligent agents. Currently, the usability of Agents is steadily improving, with large models transitioning from simple chat functions to collaborative roles. In the multimodal domain, Seedance 2.0 can substantially lower the barriers and costs associated with video creation. Initially, the primary token consumption for large models came from dialogue and image generation. As Agent technology and video production move towards large-scale application, the computational power required for large model inference is expected to rise rapidly. The main viewpoints from Zheshang Securities are as follows.

Domestic models have seen intensive releases around the Spring Festival period, marking the beginning of an AI arms race. The new DeepSeek model is in灰度 testing with a 1 million token context capability. GLM-5 launched on the Zhipu website, emphasizing programming and agent improvements, where it ranked first globally in agent programming tests, surpassing the latest Claude Opus 4.6 model released by Claude in February. MiniMax's new model, MiniMax M2.5, is undergoing internal testing within its overseas MiniMax Agent product. ByteDance released Seedance 2.0, which significantly reduces the threshold and cost for video creation, potentially reshaping the video production industry. Alibaba released Qwen-Image-2.0, representing another evolution in image generation. Furthermore, the release of Qwen 3.5 is anticipated. Besides the video model Seedance 2.0 and the image model Seedream 5.0, ByteDance also plans to release a new large language model in February.

Model advancements are accelerating the implementation of Agent and multimodal applications. The usability of Agents is increasingly robust, with large models shifting from chat to collaboration. Claude Opus 4.5 is already capable of autonomously programming continuously for five hours. Since 2024-2025, the duration of tasks that AI coding agents can handle has been doubling every four months, compared to a slower growth rate of task duration doubling every seven months between 2019 and 2024. OpenClaw is positioned as a personal AI agent with self-evolution capabilities and the ability to learn new skills. Its application cases include automating email processing, reading documents, writing code, publishing social media content, and drafting reports. In the multimodal space, Seedance 2.0 supports various combinations of video, audio, and text inputs. The generated videos exhibit excellent camera work, storyboarding, and realistic details, significantly lowering the barriers and costs for video creation.

Large-scale applications are driving inference demand, with a positive outlook on AI infrastructure. The primary token expenditure for early large models stemmed from dialogue and image generation. As Agent technology and video production achieve widespread adoption, the computational consumption for large model inference is likely to increase quickly. For Agent execution, large models need to perform multiple rounds of reasoning and browse numerous webpages, leading to a substantial rise in token consumption compared to simple dialogue scenarios. Concurrently, increased webpage browsing drives growth in overall web traffic. CDN services can help distribute the load from content providers' origin servers and reduce network traffic costs, positioning them to benefit from this traffic growth. In video creation, generating a 5-second, 720p video costs approximately 4 RMB with Kling and about 2.3 RMB with Seedance, indirectly reflecting significant computational expenses. However, these costs represent a clear advantage compared to manual production. The increasing penetration of AI in video creation is also expected to boost demand for computational power.

Related investment targets include MINIMAX-WP, KNOWLEDGE ATLAS, Yunsai Zhulian, UCloud Technology, Capital Online, QingCloud Technology, Wangsu Technology, and Nanxing Shares.

Risk warnings include commercialization progress falling short of expectations, delays in model releases, competitive risks, and policy uncertainty.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Tiger Brokers

Domestic Large Models See Intensive Releases, Driving Inference Demand Through Scale Applications

Most Discussed