Establishing Standards for Multimodal AI in Credit: A Live Discussion on Creating a Financial Equivalent of ImageNet

A recent live discussion, co-hosted by Qifu Technology, Fudan University, and South China University of Technology researchers, focused on the topic of setting standards for multimodal AI in the credit sector. The session provided an in-depth analysis of FCMBench-V1.0, the first multimodal evaluation benchmark designed specifically for credit scenarios. This benchmark establishes assessment tasks centered on key areas such as multimodal perception, reasoning, and decision-making, while simultaneously open-sourcing its dataset and evaluation tools in an effort to create a widely recognized "measuring stick" for financial AI. The one-hour presentation blended cutting-edge academic insights with practical industry experience, offering valuable references and development ideas for financial institutions, academic researchers, and industry professionals. The following is a summary of the core content from the discussion.

From an industry practice perspective, FCMBench provides a unified standard for measuring the capabilities of financial AI models. Yang Yehui, head of multimodal AI at Qifu Technology, began by analyzing the developmental challenges of financial AI and the motivations behind creating FCMBench-V1.0. He likened AI to a "hoe" and high-barrier industries like finance and healthcare to "fertile land" with great potential. The inherent high demands of financial services for privacy, security, and compliance mean that model validation cannot rely on self-assessment; it requires an objective, unified evaluation system. The creation of FCMBench-V1.0 aims to address the core dilemmas financial institutions face when selecting models. Yang pointed out that the current financial industry suffers from different models claiming high scores without a consistent comparison standard, and models often experience significant performance drops when moving from lab environments to real-world production. The core value of FCMBench is to serve as a "unified ruler" that places different models on a level playing field, testing their capabilities under realistic business conditions.

Regarding the design of this "ruler," Yang Yehui outlined three fundamental principles upheld by FCMBench: fairness, scientific rigor, and practical applicability. Fairness prevents self-serving claims by establishing a unified evaluation baseline. Scientific rigor is reflected in reasonable data distribution, task design, and difficulty settings that can effectively differentiate between algorithms. Practical applicability is the core principle, ensuring that excellent performance on the benchmark translates directly to real-world business scenarios. To make evaluations more realistic, FCMBench recreates various risk scenarios encountered in credit operations by simulating over ten types of real-world capture interferences and setting reasoning tasks such as judging the plausibility of document information and comparing multiple documents. For example, Yang cited a scenario where a user reports an annual income exceeding 500,000 RMB but has a tax payment ratio below 10%; such an obvious risk indicator is incorporated into FCMBench's reasoning tasks to test a model's risk identification and anti-fraud judgment capabilities, ensuring the practical value of the evaluation tasks.

In Yang's view, FCMBench was not created for its own sake. Its core objective is to give back to the business and the industry itself. It is positioned as a public resource for the financial sector, aiming to deeply align AI capabilities with business value through unified standards. Furthermore, FCMBench serves as a bridge between academic research and industrial application of large financial models. Technically, it will continue to expand its tasks, data types, languages, and modalities to achieve full-scenario coverage for credit AI. At the industry level, it will collaborate with universities to tackle technical challenges and invite deep participation from banks and various financial institutions to enrich real business data and scenarios. The goal is to evolve it into an industry-recognized evaluation standard, potentially even a group standard, becoming a practical threshold for model selection and collaboration among financial institutions.

From an academic research perspective, the "ImageNet moment" for financial AI is urgently needed. While the industry focuses on how to use the "ruler," academia is more concerned with why such a standard has been missing and how to create a truly credible one. Professor Chen Tao from Fudan University delved into the history of AI, pointing to the core issue: "The development of large AI models highly depends on an open-source ecosystem, yet the financial sector currently lacks a universally recognized, unified evaluation dataset and standard, both domestically and internationally. Without a unified 'ruler,' it is difficult for companies and academia to coordinate research efforts and form a powerful developmental ecosystem, which fundamentally restricts the emergence of large financial models." He drew a parallel to the milestone in deep learning—ImageNet. "The ImageNet dataset catalyzed the explosion of deep learning and became the unified benchmark for image recognition. A similar evaluation standard is crucial for breakthroughs in the AI industry." Chen believes the financial field currently lacks such a unified, comprehensive evaluation dataset, making it difficult to form a collaborative development ecosystem, and thus urgently needs to create its own "ImageNet."

Regarding FCMBench-V1.0 launched by Qifu Technology, Professor Chen assessed it as one of the large-scale, authoritative unified evaluation benchmarks currently available in the global financial credit domain. Compared to other fragmented evaluation datasets in the industry, FCMBench-V1.0 is the first to achieve modality unification, covering multiple core tasks like credit assessment and risk control. Furthermore, its design is entirely oriented toward real business scenarios, and its status as an industry-first initiative by Qifu Technology lends it both comprehensiveness and practicality, making it a significant exploration in creating a dedicated "ImageNet" for finance.

From a perspective of industry-academia-research integration, financial AI demonstrates significant advantages in deployment, and FCMBench bridges industry needs with talent development. Professor Xu Yanwu from South China University of Technology interpreted the current state of practical application and deployment advantages of financial AI, highlighting the important value of FCMBench in cultivating industry talent. He first clarified a common misconception: "Many people intuitively feel that AI's 'presence' in finance is not strong, which is actually inaccurate. AI is already deeply involved in core scenarios like insurance pricing, asset valuation, and quantitative trading; it's just that this value isn't directly visible in ToC products, so it goes 'unseen'." Additionally, Xu pointed out that compared to other high-barrier industries like healthcare, financial AI possesses a significant advantage in deployment efficiency, potentially being tens or even hundreds of times faster. This advantage stems from the ability in financial credit to quickly validate model effectiveness through back-testing on historical data and parallel testing of dual models, resulting in extremely short model adjustment cycles. In contrast, changing an algorithm in healthcare requires re-initiating the entire validation process, including pre-clinical trials, which can take three to five years, representing a vast difference in practical costs.

For the creation of financial datasets, Professor Xu proposed three core elements: value-driven, comprehensive and精巧 (ingenious/well-designed), and fair and inclusive. He believes that a high-quality financial dataset must first address a valuable and innovative topic that genuinely solves industry problems. Secondly, its design must be comprehensive and精巧 (well-crafted), catering to the multi-dimensional application needs of the industry. Finally, the evaluation method must be fair and just, built on public industry value rather than private interests. The launch of FCMBench-V1.0 aligns well with these three elements and also plays a crucial role in cultivating talent for the financial industry. Xu stated that FCMBench is a vital link connecting talent cultivation with financial industry needs and perfecting the industry's talent pipeline. It provides students minoring in AI with finance applications with real-world industrial practice scenarios, enhancing their employability. It also offers algorithm-focused students practical financial industry application contexts, helping them quickly adapt to job requirements in finance, thereby continuously supplying the financial sector with quality talent and strengthening its talent pipeline development.

During the live discussion, the three guests provided deep insights from three distinct dimensions—industry practice, academic research, and industry-academia-research integration—centered on establishing standards for multimodal AI in credit. This provided the industry with a clearer understanding of the current state, pain points, and future direction of financial AI. Looking ahead, with the continued operation and collaborative development of FCMBench-V1.0, and increased participation from more financial institutions and research universities, the financial sector is poised to gradually form an open-source ecosystem akin to ImageNet. This will enable deeper integration of AI technology with financial services, propel financial AI towards standardization and规范化 (normalization/standardization), and ultimately achieve mutual empowerment between technological breakthroughs and industrial implementation.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Tiger Brokers

Establishing Standards for Multimodal AI in Credit: A Live Discussion on Creating a Financial Equivalent of ImageNet

Most Discussed