On October 18-19, the Global Wealth Management Forum's 2025 Shanghai Suhewan Conference is being held in the Jing'an District of Shanghai. Yang Qiang, a fellow of the Royal Society of Canada and the Canadian Academy of Engineering, attended the conference and provided an in-depth analysis of the six major contradictions in the development of large models and potential solutions.
Yang Qiang first highlighted the contradiction between the “Scaling Law” and “Moore’s Law” in the AI field. The “Scaling Law” indicates that AI costs decrease by an order of magnitude each year while its capabilities increase by tenfold. In contrast, "Moore’s Law" suggests that chip hardware capabilities double every 1.5 years. Over time, the gap between the blue curve representing AI and the red curve representing hardware continues to widen, indicating that hardware support for AI struggles to keep pace with AI demand. As models become more complex, their capabilities increase significantly, with deep learning networks expanding to over a thousand layers showing 20-50 times improvement over baseline performance. To address this issue, he proposed optimizing AI infrastructure by integrating multiple expert systems into a computational power center and comprehensively optimizing operations during the training, communication, inference, and deployment phases. Companies like Oracle, which focus on infrastructure and computational power optimizations, have seen significant stock price increases, showing the importance of GPU optimization.
The second contradiction revolves around agents. Although agents are seen as effective solutions for implementing large models, the accuracy of multi-agent combinations is not 100%. As noted in a Berkeley report, the overall accuracy diminishes when combining agents due to the "multiplication of numbers less than one." Yang Qiang mentioned that a belief mechanism could alleviate this, and he and his students proposed the concept of an “Agent Factory,” where agents produced natively from large models could significantly address the decline in accuracy.
The third contradiction pertains to the demands for safety, values, and ethical standards by humans and the development of large models. This includes issues of model illusions, privacy protection, and intellectual property rights. Existing research addresses output safety, internal model security, and system audits, allowing for model tracking, data generation, and audits. However, Yang Qiang's theorem indicates an irreconcilable conflict between model privacy protection and inference capabilities, asserting that it is impossible to achieve both 100% safety and 100% efficiency. To protect model intellectual property, he and his colleagues proposed copyright protection and watermarking solutions that are easily detectable and difficult to remove, which can also prevent the dilution of model quality in collaborations. Relevant findings have been included in the book "Watermarking AI Models."
The fourth contradiction concerns the allocation of computational resources between training and inference phases of large models. With limited computational resources, a balance must be struck between investments in training and inference, as these decisions impact overall model performance. Research indicates that when more inference resources are allocated, training resources can be reduced accordingly, and inference can utilize CPUs without relying on GPUs.
The fifth contradiction highlights the disparity between human data supply and AI data demands. While the growth of human-generated training data is slow, the demand for AI training data is increasing rapidly. It is projected that by 2028, these two curves will intersect, potentially leading to stagnation in AI growth due to the exhaustion of public domain data. Yang Qiang pointed out that over 90% of private domain data (such as that from individuals, hospitals, and financial institutions) has yet to be utilized in training large models. He has focused research on this issue and proposed a solution based on “federated learning and student-teacher large models.” While cloud-based general large models lack industry knowledge, local private data can only support domain-specific small models due to insufficient computational power. Federated learning can bridge the two, enabling large models to train small models' inference capabilities while small models impart domain knowledge back to large models, all while ensuring privacy protection. He also mentioned the concept of “transfer learning,” which can apply knowledge from established domains to new areas with limited data, allowing multiple “students” to support each other and learn from a “teacher.”
The sixth contradiction is between “catastrophic forgetting” and “active forgetting.” Subsequent learning for models can erase prior knowledge, termed “catastrophic forgetting.” Active forgetting is also crucial, especially in compliance with data security laws when customer data needs to be purged. Yang Qiang has designed related algorithms to address this.
Yang Qiang indicated that large models have found successful applications in the financial sector, including customer service, marketing, fraud detection, and anti-money laundering. If these contradictions are not effectively managed, AI may face a harsh winter or encounter significant challenges.