High investment, small market, high growth, non-consensus. For a startup track to gather these four elements is no easy feat. Yet VAST founder and CEO Song Yachen is fortunate.
High investment and small market mean they're temporarily outside the "range" of tech giants who haven't fully committed to this space yet. High growth signifies an excellent historical opportunity. The lack of consensus provides entrepreneurs with a brief window to gain an early advantage—before intense competition emerges.
This field is 3D AI generation, something humanity invented just two short years ago.
During a recent in-depth conversation, Song Yachen, who introduced the product to the Premier during WAIC 25 and appeared on national news, appeared excited. I sensed that this young entrepreneur born in 1997 is extremely passionate, gesticulating enthusiastically when discussing topics of interest, sometimes drawing "diagrams" in the air with his hands.
Since its establishment in March 2023, VAST has launched the world's first AI 3D workspace Tripo Studio and the Tripo model series, garnering significant attention. The company has attracted over 3 million global professional developers, more than 40,000 small and medium enterprise partners, and over 700 major clients—including NetEase, Tencent, Sony, Microsoft, and other major companies.
Tripo Studio is an AI-driven 3D content creation workflow that enables precise controllable editing in 3D modality. No similar product exists globally, positioning VAST at the forefront. Song Yachen describes Tripo Studio as being like "Cursor," allowing users to rapidly generate 3D models and optimize them to ideal states using rich tools. Some call it the "cheat tool" of the AI modeling world, replacing traditional DCC modeling software with 10x efficiency improvements.
At the model algorithm level, VAST has continuously iterated the Tripo large model, successively launching the Tripo 1.0 to Tripo 2.5 series of multi-billion parameter 3D models, with Tripo 3.0 reaching 200 billion parameters. They've also released TripoSR, TripoSG, TripoSF, and other 3D foundation models widely recognized by the global open-source community.
Indeed, they're quite famous on overseas social media, with the GPT-4o + Tripo combination going viral. Song Yachen told me that overseas markets are experiencing natural growth beyond expectations. He's also surprised by the growing popularity in niche markets, with the list of countries and regions representing 1% of users expanding to include Ukraine, Thailand, and others.
However, despite these achievements, the company currently has only two sales personnel. Song Yachen explains that in his view, the essence of sales isn't sales, the essence of products isn't products, and the essence of growth isn't growth. At least during the 2023-2024 period, the essence of products should be technology, and the essence of sales should be products. In other words, he believes that when the industry is still in its early stages with rapid iterations every 3-5 months, "patching old walls" isn't important—"not going all-in on sales and focusing on products is actually emphasizing sales performance."
Another testament to their product focus is their daily persistence in "user interviews," also called the "CEO program." They've interviewed over 1,000 creators, spending two hours with each, and sharing the gathered information through company-wide emails. Song Yachen hopes to maintain the team's sensitivity to products and markets this way, ensuring VAST creates truly valuable offerings for users.
Over the past two years, VAST has continuously explored, shifting from initially aiming for the "ultimate goal" of a "3D TikTok" to starting as "tool sellers," developing model algorithms and creation platforms. Song Yachen maintains his own sense of rhythm.
VAST's latest move is the upcoming release of the long-rumored Tripo 3.0. Song Yachen describes Tripo 3.0 as something "very challenging"—something they didn't have complete confidence in "at the beginning."
The Tripo 3.0 model jumps from Tripo 2.5's tens of billions of parameters directly to over 200 billion parameters, adopting entirely new technical routes and new "expression forms"—SparseFlex from TripoSF. This new expression form allows significant improvements in model resolution and efficiency while supporting larger Scaling Laws.
Song Yachen gave an example: the Tripo 3.0 model can handle complex 3D structures, such as brakes under car seats, which were previously difficult to express. Tripo 3.0 can handle these easily with high resolution, fast speed, and low inference costs.
More importantly, Song Yachen says Tripo 3.0 has "injected" more "product thinking" and is a model that truly understands creators and customers. One reason is that the team observed customers' extremely high requirements for "controllability." Professional creators showcasing creativity care most about fine-grained editability and strong controllability—both are indispensable.
Tripo 3.0 addresses these user pain points: in pose control, it can specify standard poses; achieve symmetry control; implement multi-view control to ensure users generate what they want; upgrade 3D generation "surfaces" to control "roughness" and "smoothness"; and introduce standard and high-definition modes—standard mode for one-click generation with fast speed, and high-definition mode for superior clarity, detail, and effects.
This is a company in rapid growth. Song Yachen divides the 3D generation industry into four stages: first, the "past" stage represented by gaming companies; second, the "present" stage represented by independent developers; third, the "native" stage dominated by mass entrepreneurs; fourth, the "future" stage when "3D TikTok" emerges—when 3D generation creation and consumption both flourish greatly.
According to Song Yachen's thinking, now is the time to demand growth from the "native" and "future" stages, which is precisely VAST's window of opportunity.
To achieve set goals, their team is expanding. Song Yachen focuses daily on fundraising and recruitment. Over the past two years, VAST maintained around 30 people, scaling to 50 in 2025. He told us they haven't deliberately controlled team size but seek matching talent for different stages.
VAST's recruitment materials feature phrases like "witness the explosive growth of a leading AI company from 1 to 100," "change history," and "define the next decade's technological paradigm."
In 2023 and 2024, nearly all 30 VAST team members were in technology and algorithms, benefiting greatly from this focus. Meanwhile, overseas competitors bet on products and growth, "ignoring technology at the time, causing them to fall behind."
Starting in 2025, Song Yachen judges the timing is right to strengthen product and engineering capabilities. According to plans, the next step will see VAST strengthen operations and growth as 3D generation culture, community, and creation become important. When the "3D TikTok" era arrives, content and creative talent will become crucial.
He firmly believes "3D TikTok" is his opportunity, or at least definitely an opportunity for entrepreneurs. Song Yachen says he's an entrepreneur who "sees because he believes," evident from his unwavering gaze and decisive speech.
However, reaching the peak requires more than belief. Two soul-searching questions are: What if big tech companies all enter 3D generation? What if this market represents false demand?
Regarding the first question, Song Yachen says it's not a problem. Throughout history, domestically and internationally, no major UGC platform was created by big tech companies—TikTok, Kuaishou, Douban, Zhihu, Xiaohongshu, Instagram, YouTube, X—without exception. Moreover, big companies won't invest resources in high-investment small markets where growth hasn't become consensus. If they're just testing waters and don't meet KPIs the following year, they'll reduce investment.
For the second question, Song Yachen responds that the world consists of objects and laws. Objects are inherently 3D, while text, images, and videos are "dimensional reductions" and abstractions of 3D. Laws are coding. Since most world information is originally 3D, we just lack mass-market tools. When users can create 3D content with almost zero barriers, zero cost, and in real-time, 3D UGC content platforms will definitely be 5-10 times more explosive than current TikTok.
He's adept at finding inspiration from history, comparing it this way: before short videos emerged, not many people watched movies; before Weibo appeared, not many people read text. But this didn't prevent TikTok and Weibo from becoming mass content platforms—the logic being that when new platforms emerge, content begins flourishing greatly, and content prosperity creates platforms.
What excites him more is that even before 3D content platforms emerge, so many people are already engaging and gaming, and 3D can do much more—3D can generate everything.
We also discussed the XR industry, closely related to 3D content. Song Yachen's judgment is that hardware development is merely icing on the cake, not timely assistance—3D content is the necessary condition for XR explosion. The biggest problem with the metaverse then was failing due to lack of content. If developing an XR application costs one to two million, the economics don't work. Only when developing an application costs 200 yuan will explosion become inevitable.
Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.