Goldman Sachs Silicon Valley AI Research: Foundation Models Converging, Competition Shifts to "Application Layer" as "Reasoning" Drives GPU Demand Surge

Deep News
Aug 25, 2025

From August 19-20, Goldman Sachs analyst teams completed their second Silicon Valley AI field research, visiting leading AI companies including Glean, Hebbia, and Tera AI, as well as top-tier venture capital firms such as Lightspeed Ventures, Kleiner Perkins, and Andreessen Horowitz. They also conducted in-depth exchanges with professors from Stanford University and UC Berkeley.

The research reveals that as open-source and closed-source foundation models rapidly converge in performance, pure model capabilities are no longer the decisive competitive moat. The focus of competition is comprehensively shifting from the infrastructure layer to the application layer, where the real barriers lie in deeply integrating AI into specific workflows, leveraging proprietary data for reinforcement learning, and building solid user ecosystems.

The report cites viewpoints from top venture capital firms like Andreessen Horowitz, stating that open-source foundation models have matched closed-source models in performance since mid-2024, reaching GPT-4 levels, while leading closed-source models have shown virtually no breakthrough progress in benchmark testing.

Meanwhile, reasoning models represented by OpenAI o3 and Gemini 2.5 Pro are becoming the new frontier of generative AI. Their single-query output tokens can reach 20 times that of traditional models, driving a 20-fold surge in GPU demand and supporting continued high levels of AI infrastructure capital expenditure in the foreseeable future.

**Foundation Model Performance Convergence Shifts Competition Focus to Applications**

Goldman Sachs' research clearly indicates that the AI "arms race" is no longer solely centered around foundation models.

Multiple venture capitalists noted that foundation model performance is increasingly commoditized, with competitive advantages shifting upstream to focus on data assets, workflow integration, and domain-specific fine-tuning capabilities.

Andreessen Horowitz partner Guido Appenzeller mentioned in discussions that the performance gap between open-source and closed-source large models was eliminated in less than twelve months, reflecting the remarkable development speed of the open-source community. Simultaneously, the performance of leading closed-source models has remained virtually stagnant since GPT-4's release.

Against this backdrop, how AI-native applications establish competitive moats becomes crucial.

AI startup Hebbia believes that the real barriers for applications are not the technology itself—top engineering teams can replicate any technology within 6-8 months—but rather in cultivating user habits and establishing distribution channels. This logic mirrors Excel's success: creating irreplaceable network effects through deep embedding in workflows and cultivating "power users."

Companies like Everlaw also emphasize that by deeply integrating AI into legal case document processing workflows, they provide users with integrated convenience and efficiency that standalone AI models cannot match.

Notably, leading AI labs themselves are recognizing this shift. The report states that OpenAI, Anthropic, and Google DeepMind are increasingly venturing into the application layer, leveraging their insights into model internal structures and development roadmaps to build tighter product feedback and reinforcement learning loops, creating new competitive pressure for independent startups.

**Reasoning Models Emerge as New Frontier, Igniting GPU Demand**

Report data shows that over the past three years, the cost of running a model achieving consistent MMLU benchmark scores has dropped from $60 per million tokens to $0.006, a 1000-fold decrease—despite large models' per-unit operational costs declining dramatically. However, this doesn't mean overall computational spending will decrease.

Multiple VCs point out that new demand growth drivers are rapidly emerging. The research found that following DeepSeek R-1's breakthrough, a new generation of reasoning models represented by OpenAI o3, Gemini 2.5 Pro, and Claude 4 Opus has emerged, marking a fundamental transformation in foundation models.

Traditional large models primarily recite memorized answers, while reasoning models simulate thought processes through deduction, verification, and iteration. This results in output text lengths reaching 10,000 tokens, compared to traditional LLMs' typical 500 tokens. The 20-fold increase in output tokens directly translates to a 20-fold demand for GPU inference computing power.

Goldman Sachs notes that while this transformation makes reasoning costs expensive, it also enables AI to be more accurately applied to complex domains requiring rigorous analysis, such as code synthesis, legal, financial, and engineering fields. Therefore, VCs generally believe that current high AI infrastructure capital expenditure is "appropriate and necessary"—it's not a threat to profits but a prerequisite for gaining competitive advantage, especially for leading AI labs.

**AI-Native Application Moats: Workflows, Data, and Talent**

As models themselves are no longer scarce resources, successful AI application companies are building barriers through other means. Goldman Sachs' research summarized several key characteristics:

First is workflow integration and user ecosystems. Successful application companies can rapidly create value for enterprises, reducing deployment time from months to weeks.

For example, customer service AI company Decagon can help clients launch automated customer service systems within 6 weeks, saving $3-5 million in costs for every $1 million invested. This seamless integration with existing business processes is crucial.

Second is proprietary data and reinforcement learning. The report points out that static "walled garden" proprietary datasets hold tremendous value in vertical sectors like legal and financial services.

However, more valuable than static data is dynamic user-generated data, which can power reinforcement learning loops. Companies that can achieve scale users early can leverage high-quality user feedback signals to continuously optimize models, creating snowball-effect leading advantages.

Third is the strategic value of specialized talent. Unlike the previous SaaS wave, generative AI application success heavily depends on top engineering talent. Building efficient AI systems requires specialized skills in model encapsulation, agent reasoning, and reinforcement learning loop design.

VCs believe that AI talent capable of building self-improving systems is extremely scarce and has become the primary bottleneck for sustainable innovation.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10