Discover more from ChinaTalk

Deep coverage of technology, China, and US policy. We feature original analysis alongside interviews with leading thinkers and policymakers.

Over 52,000 subscribers

Already have an account? Sign in

Deepseek: From Hedge Fund to Frontier Model Maker

Part 2 of our AI Lab translation series

Lily Ottinger

Dec 09, 2024

Before he became the CEO of world-beating AI lab Deepseek, Liang Wenfeng 梁文锋 was best known for founding High-Flyer (幻方), one of China’s top hedge funds.

High-Flyer is a quantitative fund that manages around $8 billion worth of assets. In Mandarin, the company’s name is “magic square,” a reference to the quirky mathematical object thought to have been first discovered in China.

How and why did High-Flyer start down the path of frontier LLM research? In this interview from May 2023, translated here by former Deepseek intern and first-year CS PhD student at Northwestern Zihan Wang, Deepseek’s CEO lays out a grand strategy for AGI development. It explores:

Why High-Flyer decided to make early GPU purchases,
Liang’s belief in LLMs and the linguistic nature of human intelligence,
Methods to sustainably manage high research costs, including innovative uses of philanthropic budgets,
How High-Flyer plans to democratize AI access,
Organizational designs that facilitate innovation, from unconventional hiring to rejecting KPIs,
How curiosity-driven startups can succeed in an era dominated by tech giants,
Why High-Flyer pursues “hardcore innovation” instead of a business model based on imitation.

ChinaTalk is at NeurIPS this week! Respond to this email if you’d like to meet up.

WeChat, Archive link. Interview by An Yong Waves (暗涌Waves, a 36kr subbrand), published May 24, 2023. Text by Lily Yu 于丽丽. Edited by Liu Jing 刘旌. Translated by Zihan Wang 王子涵.

In the crowded battlefield of large models, High-Flyer stands out as perhaps the most unconventional player.

This is a game destined for a select few. Many startups, after large corporations enter the market, begin to adjust their direction or even consider retreating, but this quant fund continues to forge ahead alone.

In May 2023, High-Flyer launched an independent new organization called DeepSeek for its large-model venture, emphasizing its dedication to building truly human-level AI. Their goal isn’t just to replicate ChatGPT but to research and unravel more mysteries of Artificial General Intelligence (AGI).

Moreover, in this field, which is considered highly reliant on scarce talent, High-Flyer is striving to assemble a group of dedicated individuals, wielding what they believe to be their greatest weapon: the collective curiosity of a bunch of people.

In the quant investment field, High-Flyer is a top-tier fund that has reached a scale of hundreds of billions. However, its spotlight in this new wave of AI attention is quite dramatic.

As the shortage of high-performance GPU chips became a direct constraint on the development of generative AI in China, a report from Finance Eleven (财经十一人) revealed that fewer than five companies in the country owned over 10,000 GPUs. Apart from major tech giants, one of them was High-Flyer. Generally, 10,000 NVIDIA A100 chips are considered the computational power threshold for training large models.

In fact, High-Flyer, a company rarely scrutinized through the lens of AI, has long been a mysterious AI giant. In 2019, it launched an AI company and invested nearly 200 million RMB (28M USD) in developing its proprietary deep learning training platform, “Yinghuo 萤火 (Firefly) One,” equipped with 1,100 GPUs. Two years later, it invested 1 billion RMB (140M USD) in “Yinghuo Two,” which featured around 10,000 NVIDIA A100 GPUs.

This means that, in terms of computational resources alone, High-Flyer had secured its entry ticket to developing a ‘ChatGPT-like’ model earlier than many tech giants.

However, large-scale models are heavily dependent on computational power, algorithms, and data, making the initial investment as high as $50 million and each round of training costing tens of millions. Sustaining the race is nearly impossible for companies without multi-billion-dollar resources. Despite these challenges, High-Flyer remains optimistic. Founder Liang Wenfeng told us, “The key is that we want to do this, can do this, so we are one of the best-suited candidates.”

This inexplicable optimism stems first from High-Flyer’s unique growth path.

Quant-investing originated in the United States, which is why almost all of the founding teams behind China’s leading quant funds have, to some extent, experience working at U.S. or European hedge funds. High-Flyer, however, is an exception: it was founded entirely by a local team and has grown independently through its own exploration.

By 2021, just six years after its founding, High-Flyer had surpassed the 100 billion RMB milestone and was recognized as one of the “Four Kings of Quant-Investing".

As an outsider breaking into the field, High-Flyer has always been viewed as a disruptor. Multiple industry insiders told us that High-Flyer consistently uses innovative approaches in research, product development, and sales to carve out its place in the industry.

A leading Quant Fund founder remarked that High-Flyer “has never followed conventional paths” and do things “in their own way.” Even if it’s unorthodox or controversial, they would “boldly articulate their views and act accordingly".

High-Flyer attributes its development to “selecting high-potential while less-experienced individuals, supported by an innovation-driven structure and culture". They believe this approach could also enable startups to compete with tech giants in the large-model arena.

But perhaps the most critical factor is the vision of High-Flyer’s founder, Liang Wenfeng.

While pursuing an AI degree at Zhejiang University, Liang was convinced that “artificial intelligence would change the world” — a belief dismissed by many in 2008.

Upon graduation, instead of joining a tech giant as a programmer like his peers, he retreated to a cheap rental in Chengdu. There, he experienced multiple failures in applying AI to various fields before tackling one of the most complex areas: finance, leading to High-Flyer’s founding.

An interesting detail is that, in the early years, a similarly eccentric friend who was building “quirky” flying devices in an urban village in Shenzhen invited him to join his venture. That friend went on to create DJI, a company now valued at tens of billions of dollars.

Thus, beyond the discussions of funding, talent, and computational power, we also spoke with High-Flyer’s founder, Liang Wenfeng, about how to build an organization that fosters innovation and how long human “madness” can endure.

After more than a decade in entrepreneurship, this was the first public interview with this reclusive “tech nerd” founder.

Coincidentally, on April 11, when High-Flyer announced its entry into the large-model field, they quoted a remark by François Truffaut, a French New Wave director, who once advised young filmmakers: “Be desperately ambitious, and desperately sincere.”

On Research and Exploration

“Do the most important and difficult things.”

Waves: High-Flyer recently announced its entry into the large-model space. Why is a Quant Fund undertaking such an endeavor?

Liang Wenfeng: Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this.

Many in our High-Flyer team come from an AI background. Years ago, we experimented with various applications before entering the complex domain of finance. AGI may be one of the next most challenging frontiers, so for us, the question is not “why” but “how".

Waves: Are you training a general-purpose model, or focusing on vertical domains like finance?

Liang: We’re working on AGI — Artificial General Intelligence. Language models are likely a prerequisite for AGI and already exhibit some AGI characteristics. So we’ll start there and later expand into areas like computer vision.

Waves: Due to the entry of tech giants, many startup companies have abandoned the pursuit of solely developing general-purpose large models.

Liang: We won’t prematurely focus on applications. Our focus is solely on the large model itself.

Waves: Some say it’s too late for startups to enter this space after tech giants have reached a consensus.

Liang: Currently, neither tech giants nor startups have an unassailable lead. With OpenAI paving the way, everyone is working with published papers and open-source code. By next year, both groups will likely have their own large-language models.

Both major corporations and startups have their own opportunities. Existing vertical scenarios are not controlled by startups, making this phase less favorable for them. However, as these scenarios involve dispersed and fragmented niche demands, they are actually better suited to the flexibility of entrepreneurial organizations. In the long term, as the barriers to applying large models continue to lower, startups will have opportunities to enter the field at any time over the next 20 years.

Our goal is clear: to focus on research and exploration rather than vertical domains and applications.

Waves: Why do you define your goal as “to focus on research and exploration"?

Liang: It’s driven by curiosity. From a broader perspective, we want to validate certain hypotheses. For example, we hypothesize that the essence of human intelligence might be language, and human thought could essentially be a linguistic process. What you think of as “thinking” might actually be your brain weaving language. This suggests that human-like AGI could potentially emerge from large language models.

From a closer perspective, GPT-4 still holds many mysteries waiting to be unraveled. While reproducing it, we are also conducting research to uncover these secrets.

Waves: But research comes at a higher cost.

Liang: Reproduction alone is relatively cheap — based on public papers and open-source code, minimal times of training, or even fine-tuning, suffices. Research, however, involves extensive experiments, comparisons, and higher computational and talent demands.

Waves: How do you fund research?

Liang: High-Flyer is one of our investors, with ample R&D budgets. Additionally, we have several hundred million RMB allocated annually for philanthropy, which we could redirect if necessary.

Waves: However, building foundational large models requires at least two to three hundred million dollars just to get a seat at the table. How can we sustain such continuous investment?

Liang: We’re in discussions with different funding sources. From our interactions so far, many VCs seem hesitant about investing in research. They have exit requirements and prioritize rapid product commercialization, which makes it difficult to secure funding from VCs given our research-first approach. But we already have computing power and an engineering team, which is equivalent to holding half the stakes in hand.

Waves: What analyses and projections have been made regarding the business model?

Liang: What we’re considering now is to make most of our training results publicly available in the future, which could also align with commercialization efforts. We hope that more people, even small app developers, can access large models at a low cost, rather than the technology being controlled by only a few individuals or companies, leading to monopolization.

Waves: Tech giants will also offer services at later stages. What differentiates you from them?

Liang: Giants may integrate their models with their platforms or ecosystems. Our offering is entirely open and independent.

Waves: After all, a commercial company embarking on limitless research seems irrational.

Liang: It might be hard if we must find a commercial justification, because it’s not cost-effective.

From a business perspective, fundamental research has a very low return on investment. When early investors backed OpenAI, their motivation was certainly not about how much return they would get, but a genuine desire to pursue the mission.

Things we are sure now are that we want to do this, can do this, and are capable of doing this, so we’re among the best-suited candidates to tackle it at this moment.

Ten Thousand GPUs and Their Cost

“An exciting pursuit can’t always be measured in money.”

Waves: GPUs are the scarce commodity in this wave of ChatGPT-related startups, yet you had the foresight to stockpile 10,000 of them as early as 2021. Why?

Liang: It was a gradual process — from a single card in the early days to 100 cards in 2015, 1,000 cards in 2019, and then 10,000 cards. Up to a few hundred cards, we relied on external Internet data centers. When the scale expanded, we began building our own facilities.

People may think there’s some hidden business logic behind this, but it’s mainly driven by curiosity.

Waves: What kind of curiosity?

Liang: Curiosity about the boundaries of AI capabilities. For many outsiders, the wave triggered by ChatGPT has been particularly disruptive; however, for those within the field, the impact of AlexNet in 2012 has ushered in a new era. AlexNet’s error rate was significantly lower than that of other models at the time, reviving neural network research that had been dormant for decades.

While specific technical directions have constantly evolved, the combination of models, data, and computing power has remained a constant. Especially after OpenAI released GPT-3 in 2020, the direction became clear: massive computing power would be essential. Yet even in 2021, when we were investing in the construction of Yinghuo Two, most people still couldn’t grasp the rationale.

Waves: So you did start paying attention to computational power in 2012?

Liang: Researchers have an insatiable hunger for computational resources. Small experiments often lead to a desire for larger-scale trials, prompting us to continuously expand our capacity.

Waves: Some assumed your clusters were primarily for financial market predictions.

Liang: If purely for quant investing, even a small number of GPUs would suffice. Our broader research aims to understand what kind of paradigms can fully describe the entire financial market, whether there are simpler ways to express it, the boundaries of these paradigms’ capabilities, and whether they have broader applicability, among other questions.

Waves: But this process is also a money-burning endeavor.

Liang: An exciting endeavor perhaps cannot be measured purely in monetary terms. It’s like someone buying a piano for a home — first, they can afford it, and second, such a group of people are eager to play beautiful music on it.

Waves: GPUs typically depreciate at about 20% (annually).

Liang: We haven’t calculated precisely, but it’s likely less. NVIDIA GPUs hold their value well, and older cards still find buyers. Our previously retired GPUs still held decent value when sold second-hand, so we didn’t lose too much.

Waves: Clusters require significant expenses — maintenance, labor, and even electricity.

Liang: Electricity and maintenance are relatively inexpensive, constituting about 1% of hardware costs annually. Labor is more significant but represents an investment in our future and a key asset for the company. The people we choose tend to be relatively humble, driven by curiosity, and have the opportunity to conduct research here.

Waves: In 2021, High-Flyer was one of the first companies in the Asia-Pacific region to obtain A100 GPUs. How did you manage to acquire them earlier than some cloud providers?

Liang: We proactively tested and planned for new GPUs early on. Cloud providers historically catered to fragmented demands. It wasn’t until 2022 that some cloud providers began building the infrastructure, with the rise of autonomous driving and the need for rented machines to support training — along with the ability to pay for it. It is typically challenging for tech giants to focus purely on research or training, as their efforts are more driven by their business needs.

Waves: What’s your view of the large-model competition?

Liang: Giants certainly have their advantages. However, without rapid application deployment, they may struggle to sustain, as they are more driven by the need to see the outcome.

Leading startups also have solid technical foundations, but like the earlier wave of AI startups, they still face significant challenges in commercialization.

Waves: Some think High-Flyer’s AI emphasis is PR for its other businesses as a quant fund.

Liang: In reality, our quant fund has mostly stopped external fundraising.

Waves: How do you distinguish AI believers from opportunists?

Liang: Believers were here before and will remain after the hype. They’re the ones buying GPUs in bulk or signing long-term agreements, not just renting short-term resources.

Enabling True Innovation

“Innovation often arises naturally; it is not orchestrated, nor can it be taught.”

Waves: How is DeepSeek’s recruitment progressing?

Liang: The initial team is in place. We are borrowing temporary support from High-Flyer due to a shortage of human resources in the early stages. Since ChatGPT-3.5’s surge last year, we’ve been hiring actively, but we still need more people.

Waves: Talent in large-model startups is scarce. Investors say top talent is often confined to AI labs at giants like OpenAI and Facebook AI Research. Will you recruit from overseas AI labs?

Liang: For short-term goals, hiring experienced individuals makes sense. But long-term success does not depend that much on past experiences. Rather, it depends more on foundational skills, creativity, and passion. In this sense, domestic candidates are abundant.

Waves: Why does experience matter less?

Liang: The right person doesn’t always need prior experience. High-Flyer prioritizes capability over credentials. Core technical roles are primarily filled by recent grads or those 1–2 years out.

Waves: Is experience sometimes a hindrance to innovation?

Liang: Experienced people will tell you how something should be done without hesitation, while those without experience will explore repeatedly, think carefully, and find a solution that fits the current situation.

Waves: High-Flyer starts from an outsider to a top-tier quant fund within several years. Is this hiring philosophy a secret to its success?

Liang: Our core team, including myself, initially lacked quant experience, which is unique. It’s not necessarily a “secret” but part of our culture. We don’t deliberately avoid experienced individuals, but we focus more on ability.

For example, our top two salespeople were outsiders — one came from exporting German machinery, and the other wrote backend code at a securities firm. When they entered this field, they had no experience, no resources, and no prior connections.

Today, we might be the only large private equity firm primarily relying on direct sales — we don’t need to share fees with intermediaries, resulting in higher profit margins at the same scale and performance. Many firms have tried to imitate us, but none have succeeded.

Waves: Why hasn’t this model been successfully replicated by others?

Liang: Because this alone isn’t enough to drive innovation. It requires alignment with the company’s culture and management.

In fact, our sales team achieved nothing in their first year, and it was only in the second year that they started to see some results. But our evaluation standards are quite different from those of most companies. We don’t have KPIs or so-called quotas.

Waves: So, what are your evaluation standards to them?

Liang: Unlike most companies that focus on order volume, we don’t predefine commissions based on sales figures. Instead, we encourage our salespeople to build their own networks, connect with more people, and create greater influence.

We believe that an honest and trustworthy salesperson may not immediately drive orders in the short term, but they can make clients see them as reliable and dependable.

Waves: After selecting the right person, how do you help them get into the groove?

Liang: Assign them important tasks and avoid interfering. Let them figure things out and unleash their potential.

In reality, a company’s core essence is incredibly difficult to replicate. For example, hiring inexperienced individuals requires judging their potential and figuring out how to help them grow after they join — none of which can be directly copied.

Waves: What do you think are the necessary conditions for building an innovative organization?

Liang: In our experience, innovation requires as little intervention and management as possible, giving everyone the space to explore and the freedom to make mistakes. Innovation often arises naturally — it’s not something that can be deliberately planned or taught.

Waves: This is unconventional. How do you ensure that people work efficiently and head in the desired direction under such circumstances?

Liang: We ensure value alignment when hiring and rely on culture to maintain direction. There’s no written corporate culture, as rules can stifle innovation. More often, it’s about leadership setting an example — how you make decisions can become an unspoken guideline.

Waves: In this AI wave, could such an innovative structure of startups be a decisive edge against tech giants?

Liang: Conventional wisdom often concludes that startups with such ambitions can’t survive. However, in an ever-changing market, true success hinges on adaptability and the ability to adjust, rather than on fixed rules or conditions. Many giants struggle with inertia and can’t respond quickly to change, and this wave of AI will undoubtedly birth new companies.

True Madness

“Innovation is expensive, inefficient, and sometimes wasteful.”

Waves: What excites you most about this endeavor?

Liang: Verifying whether our hypotheses are correct. If they are, that’s immensely satisfying.

Waves: What are the must-have criteria for your hiring talent for large models this time?

Liang: Passion and solid foundational skills. Everything else is secondary.

Waves: Are such individuals easy to find?

Liang: Their passion usually shows — they genuinely want to do this and they are often the ones actively seeking you out as well.

Waves: Large models may require endless investment. Does the cost make you hesitant?

Liang: Innovation is inherently expensive and inefficient, often accompanied by waste. That’s why it only emerges when economic development reaches a certain level. When resources are scarce or in industries not driven by innovation, cost and efficiency become essential. Even OpenAI only succeeded after burning through substantial funding.

Waves: Do you see your endeavor as madness?

Liang: I’m unsure if it’s madness, but many inexplicable phenomena exist in this world. Take many programmers, for example — they’re passionate contributors to open-source communities. Even after an exhausting day, they still dedicate time to contributing code.

Waves: There is a sense of spiritual reward in it.

Liang: It’s like walking 50 kilometers — your body is completely exhausted, but your spirit feels deeply fulfilled.

Waves: Do you think curiosity-driven madness lasts long-term?

Liang: Not everyone can stay passionate their entire life. But most people, in their younger years, can wholeheartedly dedicate themselves to something without any materialistic aims.

For more, check out our translation of Liang’s 2024 longform interview.

Deepseek: The Quiet Giant Leading China’s AI Race

Jordan Schneider, Angela Shen, and 4 others

November 27, 2024

Deepseek: The Quiet Giant Leading China’s AI Race

Deepseek is a Chinese AI startup whose latest R1 model beat OpenAI’s o1 on multiple reasoning benchmarks. Despite its low profile, Deepseek is the Chinese AI lab to watch.

Read full story

F. Ichiro Gifford

Dec 9

Brilliant brilliant brilliant—this same ethos of hiring could work in so many other industries, especially as the established hiring pools for XYZ Industry dry up and the old guard retire en masse.

Not only are experienced staff potential hindrances to innovation, but they’re less and less available.

Expand full comment

Like (4)

Ray Wang

Dec 11

Very interesting piece..

ChinaTalk