share_log

Acquired by NVIDIA for $20 billion! A profound dialogue with Jonathan Ross, founder of Groq, you won’t want to miss.

Smart Investor ·  00:24
  1. Rather than asking 'Is AI a bubble?', it is better to ask 'What are the smart money players doing?'. What is Google doing? What is Microsoft doing? What about Amazon? And what are some countries doing? The answer is: they are all making significant bets on AI.

  2. I would say that if the inference computing power of OpenAI and Anthropic were doubled today, then within a month from now, their revenue would almost double.

  3. Many people cannot truly comprehend the instinctive importance of 'speed' in human experience.

  4. I think there is a misconception about making chips: many believe the hardest part is manufacturing the chip. But once you get into it, you realize: the hardest part is software. Then as you continue, you will see: the truly hardest part is keeping up with the evolution of the industry.

  5. If we look at this from the perspective of value, or the 'long-term weighing machine,' the most valuable asset in the economy is labor. Now, by providing stronger computing power and better AI, we are effectively injecting more 'additional labor' into the entire economic system. This has never happened before in human economic history.

  6. You should maintain brand trust at as high a level as possible, because trust compounds like interest. Similarly, I believe profit margins should be low enough to make customers feel they are always getting a good deal.

  7. If NVIDIA hasn't reached a market cap of $10 trillion five years from now, I would be very surprised.

NVIDIA made a blockbuster announcement over Christmas: it will acquire AI chip startup Groq for approximately $20 billion, bringing its founder and other key executives under its wing, while also linking Groq's name with Jonathan as a tipping point. Founded in 2016, Groq had just completed a new round of funding in mid-September this year, raising about $750 million, valuing the company at roughly $6.9 billion post-money.

As some details emerged, industry analysts generally view this deal as NVIDIA essentially absorbing potential rival Groq through 'technology licensing, talent acquisition, and asset integration,' rather than a traditional equity transaction.Mergers and acquisitionsThis approach helps bypass strict antitrust reviews and speeds up implementation, while preventing Groq’s technology from falling into the hands of other major companies.

According to an internal email from NVIDIA CEO Jensen Huang, the strategic objective of this transaction is clear: to integrate Groq's low-latency processors into NVIDIA’s 'AI factory' architecture. This move aims to address NVIDIA’s structural shortcomings in the AI inference domain, enabling it to serve a broader range of AI inference and real-time workloads.

Groq’s core competitiveness lies in its LPU chip, which is specifically designed for AI inference and operates independently of the CUDA ecosystem. As Jonathan Ross put it, 'One of our biggest selling points is that our supply chain is completely different from that of GPUs.'

In any case, this transaction is regarded as one of NVIDIA’s largest deals ever, underscoring a key point: the supply and efficiency of inference computing power are becoming central variables in the new wave of competition.

It is time to present a deep conversation with Jonathan Ross. This dialogue took place at the end of September, shortly after Groq’s financing round, and serves as a valuable window for understanding the underlying logic of the AI revolution.

His insights are worth pondering not only because he is at the center of the AI storm but also due to his professional background.

Before founding Groq, Jonathan was one of the key creators of Google’s TPU. His understanding of chip development cycles, first-pass success rates, and the concept of a 'time moat' is grounded in engineering implementation and supply chain constraints, rather than theoretical strategy.

During his conversation with Harry Stebbings, Jonathan repeatedly brought the discussion back to a pressing issue: there simply isn’t enough computing power. He noted that the market continues to heavily rent the long-released H100 chips because, as long as demand outstrips supply, older chips can still generate profits.

Jonathan speaks frankly and articulates his ideas with precision. Many of his viewpoints are highly insightful, while some are somewhat counterintuitive.

For instance, he drew an analogy between fast-moving consumer goods and humanity’s instinctive need for speed encoded in our genes. He also discussed how profit enables a company to 'stay in the game,' and likened his approach to AI investment by avoiding Bitcoin to using a 'weighing scale' methodology. These perspectives echo familiar principles of value investing. His analysis of the seven major US stocks was particularly incisive.

The smart money is making big bets on AI.

Harry, I’d like to start with a fundamental question to clarify where we stand right now. It feels like the world is changing faster than ever before. If we look at the current state of the market, how do you assess it?

Jonathan, you’re asking whether there’s a bubble, right?

Harry: Sort of.

Jonathan: Regarding the question of whether there’s a bubble, my perspective is this: if you keep asking a question and never get a clear answer, maybe it’s time to ask a different question. Instead of asking 'Is it a bubble?', ask 'What are the smart players doing?'

What is Google doing? What about Microsoft? And Amazon? And what are some countries doing?

The answer is: they are all making significant bets on AI. Their capital expenditures are increasing, and nearly every new round of investment announcements involves larger amounts than the previous one.

A compelling example of the value these investments generate can be seen with Microsoft: during a particular quarter, after deploying a batch of GPUs, they announced that these GPUs would not be made available to Azure customers because Microsoft was earning more by using them internally than they would by leasing them out.

This clearly demonstrates the enormous commercial potential in this market.

To me, the current market closely resembles the early days of oil drilling—plenty of dry wells but also occasional gushers.

I’ve heard a statistic indicating that currently, around 35 or 36 companies in the AI sector account for 99% of the revenue, or at least 99% of token consumption.

Harry: Yes, the market is highly concentrated. I’m even a bit surprised that this figure isn’t lower. Honestly, look at NVIDIA—its revenue concentration is extremely high; two customers might account for the vast majority of its revenue.

Jonathan: Yes, NVIDIA probably accounts for 98% of that 99%.

But when the market is this concentrated, it indicates that it’s still in a very early stage. It’s like in the past when people didn’t know how to find oil and could only rely on intuition, almost investing based on feel. Those with good instincts would make a fortune, while others might lose everything.

But over time, this will gradually become a science. It will become more predictable, more stable, and the market will also become more balanced.

However, once it reaches that stage, investor returns will actually decline, especially for those who could have made significant profits—opportunities will become fewer.

Therefore, I believe this is precisely the best time for investors. Right now, overall income from AI exceeds expenditures, though this distribution is extremely uneven.

Harry: Are you saying that, overall, earnings exceed spending?

Jonathan: Yes, that’s true overall. Of course, there will still be many who suffer heavy losses, but generally speaking, the funds invested are less than the returns generated.

Harry: But if you look at the capital expenditures of these major companies now, every one of them is reacting with 'okay, okay, okay' because they believe there will definitely be tangible results at the end. The problem, however, is that capital expenditures keep rising.

Jonathan: Exactly. Even if you consider this purely from a financial perspective, I think the financial returns are positive. And the real reason why many people are willing to invest now isn’t actually to make money.

I recently went to Abu Dhabi to attend the first event organized by Goldman Sachs there (we are also currently sponsoring the McLaren team).

At the event, Zak Brown gave a speech, and so did I. The atmosphere was quite good. Then someone asked me a similar question: 'Is AI a bubble?'

Instead of answering directly, I posed a question to the entire audience: Who here can be 100% certain that AI will not replace your job ten years from now?

Among the attendees were more than fifty investors managing assets exceeding $10 billion each, and not a single person raised their hand.

I said, 'Good, this reflects the current mindset of the tech giants.'

So they will continue to invest heavily like drunken sailors, or else risk being completely shut out of core business areas.

They are no longer calculating based on economic returns but instead addressing a more critical question: Can we maintain our industry leadership?

Future developments will increasingly reflect the influence of scaling laws. If you can't stay in the top ten, you will lose pricing power.

We often talk about the 'Magnificent Seven' in U.S. stocks these days; if you're not among these seven companies, you won’t even come close to their valuation range.

So how do you secure this position? You need to keep investing continuously.

As long as you remain in the top seven or top ten, the market will assign you a high valuation, so this investment is worthwhile.

Harry: But the problem is, ultimately, these investments must translate into real, tangible revenue. If they don’t deliver, whether you are part of the 'Big Seven' becomes irrelevant, right?

Jonathan: You’re absolutely right. The reality, however, is that AI has already started to unleash enormous value. It’s just unevenly distributed across different applications, but it is indeed creating immense value.

Let me give you an example from our own experience.

I’ve tried coding through prompts. I’m not particularly good at it myself, but we have several interns who excel at it.

Once, during a client visit, I held a meeting with them, and they proposed a feature request on the spot.

I described the functionality in very rough, prompt-based terms and handed those prompts over to the engineers. Four hours later, the feature was live. Not a single line of code was manually written, nor was there any manual debugging—everything was completed through prompts.

We now use Slack to integrate code submission capabilities, and all operations are performed directly within Slack. Four hours later, the feature was delivered for use.

Just think about how much value this level of efficiency brings!

But if you look further ahead, six months from now, perhaps this feature could be completed and go live before the client meeting even ends. That would represent not just a difference in efficiency, but a qualitative leap forward—and it wouldn’t merely be about cost savings anymore.

Of course, from a financial perspective, the faster the development speed, the less money you spend pushing features into the production environment, which directly impacts the return on investment (ROI).

From a qualitative perspective, when you can turn requirements into reality before a client meeting ends, you secure orders that competitors could never win.

The current demand for computing power is both insatiable and unmet.

Harry, I want to go back to the topic of the 'Magnificent Seven' in the U.S. stock market. Do you think everyone realizes that to maintain their positions among the top seven, they must delve into the chip layer and achieve full vertical integration from start to finish?

Jonathan, I believe you won’t see many companies successfully breaking through at the chip level.

Many view Google's TPU as a massive success, but what they don’t know is that Google was simultaneously working on three chip projects at the time, and only one outperformed GPUs.

Looking across the industry, while several companies are developing AI chips, numerous projects have been canceled, such as the recent termination of Dojo.

Building chips is incredibly difficult.

When a company says, 'We will build our own AI chips to compete with NVIDIA,' it’s like saying, 'Hey, Google Search is great; let’s replicate it.'—it’s simply madness!

The level of optimization, design, and engineering required for those chips is extraordinarily high. Attempting to replicate it has an extremely low probability of success.

Of course, if there are many people attempting it in the market and you have the option to choose, then as long as one of them succeeds, you will have an additional chip to select from.

Harry, as we mentioned earlier, in order to maintain the position of the 'Big Seven,' you must keep investing. For example, NVIDIA invests $100 billion into OpenAI, and then OpenAI uses that money to buy NVIDIA's chips. Isn't this just an endless cycle of a financial game?

Jonathan, if they were just circulating the money back to themselves, then yes, it would be. But the issue is that this money is actually used to build chips for suppliers, so it is not simply capital spinning its wheels.

Consider it from another perspective: what proportion of this investment truly flows into chip infrastructure construction? About 40%. Therefore, at least 40% of the funds enter the entire supply chain ecosystem.

This cannot be considered an endless cycle.

Harry, so is it a 'partial cycle'? 60% returns to NVIDIA, and 40% flows out? NVIDIA does indeed take back 60%, and their stock price has risen by hundreds of billions of dollars as a result. What is your view on this structure?

Jonathan, well, we can look at it from different perspectives.

From an economic standpoint, it makes perfect sense. As long as it can create a customer lock-in effect, then naturally, you would be willing to do it.

Why can revenue growth drive market capitalization to increase more than the revenue itself? It is because the market believes that this growth is sustainable.

I think this logic holds true for NVIDIA. But it’s not only because NVIDIA is strong—although it certainly is—but also because the total global computing power is fundamentally insufficient.

Indeed, the current demand for computing power is extraordinary and insatiable.

I would say that if the inference computing power of OpenAI and Anthropic were doubled today, then within a month from now, their revenue would almost double.

Harry, wait. Could you break down this statement for me? Are you suggesting that their revenue is currently constrained due to limited computing power? But if they were given double the computing power, why would their revenue also double?

Jonathan, one of the biggest complaints about Anthropic right now is the severe rate-limiting. Users simply can’t get enough tokens. If they had more computing power, they could generate more tokens, which would allow them to charge more.

As for OpenAI, which provides chat services, how do you control traffic for chat services? The way to do it is to slow it down, resulting in reduced user interaction.

Harry, how important do you think speed really is? After all, many people believe that latency doesn’t matter much. I’m perfectly fine with some delay. After I input a prompt, I can go do something else while waiting for the result to come out slowly.

Jonathan, hmm, that’s an interesting perspective...

But let’s look at an analogy, such as the CPG (consumer packaged goods) industry.

Let’s rank consumer packaged goods by profit margin: the highest margin is for cigarettes, followed by chewing tobacco, then carbonated beverages, and then bottled water and the like.

What factor correlates most strongly with high profit margins? It’s the speed at which the product’s ingredients take effect on you.

In other words, the faster that dopamine-triggering loop takes effect, the stronger your brand loyalty becomes.

The quicker you experience feedback, the more likely you are to form an emotional connection with the brand, which in turn helps build brand value more easily.

This is also why Google places such a strong emphasis on speed and why Facebook continuously optimizes response times. For every 100 milliseconds improvement in loading speed, conversion rates can increase by approximately 8%.

Therefore, the idea that 'it's fine if the prompt runs slowly in the background and users can just wait' is completely wrong. One hundred percent incorrect.

When we first started accelerating the chip, we actually knew how fast we could make it. We even created a demo video showcasing the actual speed of the chip.

But many people, after watching the video, would say: 'Why does it need to be faster than human reading speed? As long as it's readable, isn't that enough?'

I usually counter by asking them: 'Then why does webpage loading need to be faster than reading speed?'

There is a cognitive gap here; many people fail to truly grasp the instinctive importance of 'speed' in the human experience.

Humans actually find it difficult to determine which factors truly affect engagement and final outcomes, but we learned this lesson early on from the experiences of early internet companies.

The core motivation for independently developed chips

Harry, do you think OpenAI will eventually develop its own chips? I believe NVIDIA would certainly be concerned that OpenAI might aim for vertical integration and bring the chip-making process in-house. Do you think they will succeed?

Jonathan: I think there's a misconception about making chips: many people assume the hardest part is producing the chip. But once you get into it, you realize the hardest part is actually the software.

And as you continue, you'll realize that the truly hardest part is keeping up with the industry's evolution.

I have no doubt that OpenAI will make its own chips, and so will Anthropic. In the future, every tech giant will develop its own chips.

I remember when I was at Google, during a lab visit, AMD wasn't particularly strong at that time (although they are doing well now).

We deployed 10,000 servers at the time, all using AMD chips.

I walked through the lab and saw them pulling the servers out of the racks, removing the AMD chips, and throwing them straight into the trash bin.

Harry: Really...?

Jonathan: Really. And you could say it was fate, because everyone knew back then that Intel would win with that generation of chips.

So why did Google build 10,000 AMD servers? It’s simple—they did it to secure better pricing when purchasing Intel chips.

When you are a player of this magnitude, you will find that the cost of designing your own motherboard for AMD, building a testing platform, and deploying production is entirely worth the procurement discount from Intel.

Therefore, when a company decides to make its own chips, the motivation is not necessarily for mass production and deployment; often, it is driven by game theory and bargaining strategies.

There is another lesser-known fact: NVIDIA is now almost in a monopsony position in HBM (High Bandwidth Memory).

A monopsony is the opposite of a monopoly—not having only one seller, but only one buyer.

Currently, the supply of HBM in GPUs is limited. The GPU itself is manufactured using the same process as mobile phone chips, so if NVIDIA wanted, they could produce 50 million GPU cores annually.

However, the reality is that this year they will likely only manufacture 5.5 million GPUs due to the capacity limitations of HBM and interposers.

Thus, the situation becomes like this: a tech giant comes along and says, 'I want one million GPUs.'

NVIDIA responds, 'Sorry, I have other customers.'

Then the tech giant counters, 'No problem, I’ll make them myself.'

And then you will find that suddenly NVIDIA produces a batch of GPUs, prioritizing delivery to this company.

The reality is that the overall market capacity is limited. By manufacturing your own chips, what you truly gain is not just 'a chip of your own,' but control over your own destiny.

This is the core selling point of developing proprietary chips.

Harry: Taking control of one’s destiny—what does that imply?

Jonathan: It means NVIDIA can no longer dictate to you: 'This quarter, you can only purchase a certain number of GPUs.' Of course, this may come at a higher cost, as self-developed chips might not be as good as NVIDIA's.

Let’s think again: why can NVIDIA's GPUs dominate the market with only slightly better performance than AMD's?

If the total cost of deploying an entire system is many times higher than the price of the chip itself, even a small increase in the chip’s price will have a negligible impact.

For example: suppose I am deploying a system where the chip accounts for only 20% of the total BOM (Bill of Materials) cost. In that case, even if the chip becomes 20% more expensive, the impact on the total system cost would still be minimal; however, if the chip’s performance improves by 20%, the value of the entire machine increases by 20%.

Therefore, you’ll realize that even a slight advantage in chip performance leads to a significant increase in the value of the entire machine. Even minor performance differences can result in a substantial competitive edge in sales.

HBM Supply Bottleneck and Data Center Amortization

Harry: You mentioned monopsony earlier. Under such circumstances, do companies like OpenAI, Anthropic, or other members of the 'Big Seven' still have the opportunity to enter the chip layer?

Jonathan: If the HBM market is a monopsony, it is indeed challenging. However, from the perspective of HBM suppliers, they still have an incentive to distribute resources among more customers.

NVIDIA, due to its enormous purchasing volume, has exceptionally strong bargaining power and can push prices down. Suppose you are an HBM manufacturer planning a new packaging plant and the entire supply chain; when NVIDIA walks in with a large check, you naturally prioritize serving them.

Therefore, NVIDIA always secures the capacity it needs ahead of time. The issue is that this check must be issued more than two years in advance.

Currently, the AI market has entered a phase of explosive growth. Even for NVIDIA, which has ample cash reserves, it is difficult to calculate all future demand and sign contracts two years ahead.

Thus, supply bottlenecks are inevitable. This is not only because the market structure favors buyers but also because of the massive capital expenditure required, while HBM manufacturers tend to be very conservative.

Another point: the profit margin for HBM is so high that manufacturers are reluctant to expand production because any increase in capacity could lead to a price drop.

Harry: I fully understand. But I would like to ask, when we see OpenAI and Anthropic developing their own chips, does this also explain why they are raising funds?

For instance, Sam (Sam Altman) mentioned that they might need hundreds of billions of dollars in the future. Are these considerations already factored in?

Jonathan: Actually, no. The real cost driver is not purchasing chip systems but building data centers.

The cost of data centers is higher, primarily because their amortization period is much longer. Even if they account for only one-third of your annual costs, while chips may be amortized over three to five years, data centers could stretch to ten years. On an annualized basis, they end up being more expensive.

So you see those tech giants investing $75 to $100 billion annually; they are actually preparing for the next decade or even longer, expanding data center capacity.

Viewed this way, the figure doesn't seem so exaggerated.

Harry: But if the chip refresh cycle is faster than three to five years, has our current amortization model become obsolete?

Jonathan: Exactly. Many companies have indeed set overly optimistic amortization cycles.

Internally, we calculate using a more conservative approach: five to six years is the upper limit, and often we prefer to use three years.

We essentially plan based on a 'one generation of chips per year' rhythm.

You should understand the value of chips in two phases: one is whether it's worth deploying, and the other is whether it's worth continuing to operate. These are actually two completely different economic models.

At deployment, you need to ensure that the capital expenditure is recovered and there is a return on investment; but once deployed, as long as the chip generates enough value to cover electricity and hosting fees, it’s worth keeping it running.

Therefore, you can accept that the book value of the chip decreases over time, as long as it continues to function.

What everyone is betting on now is this: after the next generation of chips comes out, will the value of old chips drop so low that they can’t even cover operational costs?

In my view, the assumption of 'five-year amortization' is actually not very realistic. By the fifth year, the performance of new chips will have significantly surpassed that of older ones, and the value generated by the old chips may no longer justify their electricity costs and data center rental fees.

Harry: What should we do then? Will there be a pile-up of obsolete chips that can't function properly?

Jonathan: Not necessarily. Many companies have signed long-term contracts, and the cost of breach of contract is also a variable to consider. Sometimes, 'running at a loss' might be more cost-effective than 'paying a penalty for breach of contract.'

Harry: I understand. But what should we really do when it comes to that point?

Jonathan: I can't tell you that because we are doing our best to avoid such a situation.

Therefore, in all our models, we aim for a faster payback period. I would never bet on an investment with too long a cycle. The shorter your betting cycle, the clearer and more controllable your outcome will be.

More computing power leads to better products.

Harry: So essentially, the strategy is to shorten the payback period as much as possible while minimizing operational costs, so that underperforming chips can be phased out more quickly?

Jonathan: Exactly. However, there is one rather crazy point that you might not have considered.

If you look at the accounts from an accountant's perspective, this model does seem quite poor and appears not to be worth pursuing.

However, empirical observations tell me that people are still renting H100 chips. These chips have been around for nearly five years, but they are still performing well and generating far more revenue than the cost of running them. You wouldn't deploy a new H100 today; you would keep it running because it is still profitable.

Harry: It has moved from the deployment phase into the maintenance phase.

Jonathan: Exactly. The fundamental reason is that people still can't get enough computing power.

If that weren't the case, the rental price of the H100 would have already dropped to rock bottom. But as long as computing power remains in short supply, this situation will persist.

The question is: Are there any alternative solutions that are less constrained by supply?

This is where we hope to step in.

Let’s go back to the 'speed' issue you initially asked about. Do you know how many clients initially approached us because they were pursuing speed?

Harry: How many?

Jonathan: All of them. But do you know how many of them continued to ask about speed after understanding the 'current state of market supply'?

Harry: Were there any?

Jonathan: None at all.

At first, everyone was saying, 'I care about speed; I know what this means for end customers.' But then they realized, 'Wait a minute, I can't even get the compute power I need.'

So the real value proposition becomes: Can you provide enough compute capacity?

Just two weeks ago, a customer came to us saying they needed five times the compute power of their current system. They had approached all the hyperscale cloud companies, and none could meet their demand. Neither could we; no one could. There simply isn't enough compute power in the entire market.

So the choice you face is, if I have this compute power, I can secure this customer. As you mentioned earlier, if OpenAI or Anthropic gets double the compute power, they will generate double the revenue.

If you are a company that doesn’t even have enough compute power to serve your customers, you would be willing to do whatever it takes to win over this client. Because you believe: securing the client first establishes long-term locked-in value.

One of our biggest selling points is: Our supply chain is entirely different from that of GPUs.

If you want to order GPUs, you have to write a check two years in advance; but if you want to order our system, just give us a purchase order for a million LPUs (Groq’s inference chips), and we can start delivery within six months.

Harry: Six months? That's a year and a half faster than GPUs!

Jonathan: Exactly, the difference is 18 months.

I previously met with the head of infrastructure at a hyperscale cloud computing company. I discussed performance, cost, and similar topics, and he remained relatively calm.

However, when I mentioned our six-month delivery capability, he interrupted me directly and began asking serious follow-up questions. The only thing he cared about was the supply chain.

Harry, think about it from another perspective. With the current rapid pace of model iteration, do you think a 'two-year delivery cycle' still makes sense?

Jonathan, have you heard of Sara Hooker?

Harry: No, I haven't.

Jonathan: She wrote a paper called 'The Hardware Lottery.' To summarize its core idea in one sentence: People design models based on hardware.

So theoretically, there may be architectures better than attention, but attention performs exceptionally well on GPUs.

This leads to a result: If you are the current dominant player, you have a natural advantage because everyone will design models around your hardware. Even if there are better architectures out there, they won’t be considered superior if they can’t run effectively on your hardware.

This creates a kind of small feedback loop.

Therefore, if you are an established player, planning according to a 'two-year horizon' isn’t problematic. But if you are a new player trying to enter the market, no one will adapt their models for your chips two years in advance.

You must accelerate the iteration cycle.

Harry, you just mentioned that OpenAI will develop its own chips, and so will Anthropic. In such a world, what is NVIDIA's position? Its customer base remains highly concentrated...

Jonathan, that's true. No one can truly predict the pace of AI development.

Didn't we start by discussing whether AI is a bubble? Just look at the pace of data center infrastructure construction over the past decade. You need to plan two, three, four, or even five years in advance, and what happens?

Almost everyone’s predictions were wrong. They always built too little.

This has become the pattern of the past decade. So after ten years of underbuilding, you say, 'This time I’ll build more, exceeding even my most optimistic projections.' Then, it still falls short.

So you raise your projections again, build more, and still fall short... And the cycle repeats.

That’s the reality. Moreover, the key point that most people haven’t realized is: AI doesn’t operate like SaaS (Software as a Service).

In the SaaS model, you have a team of engineers building a product, and the quality of the product depends on how they write the code.

But AI isn’t like that. In an AI model, I can run prompts multiple times to select better answers, directly improving the quality of the output.

I can also spend more money on each prompt to make the model perform better. I can even decide based on the user's value — if this client is important, I will give him higher-quality results.

OpenAI recently disclosed a similar strategy: they will launch some products that have very high operational costs, so they will only be available to a small number of users, and at a higher price. They want to see how much better their products can become when the model gains access to more computing power.

This is the future direction.

As long as you allocate more computing power to applications, product quality will improve. This is why you'll find that for many AI companies, their 'token-as-a-service' expenditures are almost equal to their revenue. The more they spend, the better their products become, and the more customers they attract.

The advantage of American models stems from an overwhelming superiority in computing power.

Harry fully understands this point. But let me ask you frankly: we see GPT-5 pursuing efficiency, and many people say that Sam has shifted from 'performance first' to 'efficiency first,' because investment in computing power does not linearly bring performance returns.

Do you think this statement is correct? Would it contradict the view you just expressed?

Jonathan: I don't think it contradicts. You need to understand they are pursuing different outcomes.

For instance, OpenAI is now entering some extremely price-sensitive markets. Take India as an example. If you want to win the market in India, what is one core metric?

The monthly subscription fee must be kept around 99 rupees, which is approximately $1.13. You must keep the service priced at just over one dollar per month to serve those users who 'otherwise wouldn’t be able to use AI.'

Harry, can Indian users also choose DeepSeek?

Jonathan: This is another common misunderstanding in the market. Let’s break down a few misconceptions.

Harry: Sure, I enjoy debunking misconceptions.

Jonathan: For example, when some open-source models from China are released, the immediate reaction is often, “Wow, they’ve trained models almost as good as those in the U.S.!”

We even discussed this topic and recorded a podcast episode about it. To be honest, I was initially swayed by the hype. But what many people overlook is: Are these models really cheaper? Are they truly more suitable for practical deployment?

Now that I have a better understanding of mainstream foundational models and Chinese models, I can confidently say: The operational costs of Chinese models are not low; in fact, they are about ten times higher.

Let’s take OpenAI’s recently released GPT-OSS model as an example. Its optimization goals differ from Chinese models, but its quality is exceptionally high. I would even argue that, within its specific focus area, it is clearly superior to Chinese models. Of course, Chinese models each have their own unique emphases.

But the key point is that running this OSS model costs approximately one-tenth of what it takes to run Chinese models.

So why did people think Chinese models were cheaper? They were misled by pricing. Often, when a model becomes the “only available option” in the market, everyone specifies its use, and with only one provider, prices naturally go up. Higher prices lead people to mistakenly believe that costs are also higher.

In reality, the optimization focus of Chinese models is on saving costs during the training phase, rather than being more efficient during inference. However, if you compare the inference efficiency of the OSS model with that of Chinese models, a clear fact emerges: The U.S. still far surpasses in terms of the intensity and density of training.

This makes economic sense as well. The money spent on training a model ultimately needs to be offset by each inference operation. How to recover the cost more quickly? Each inference must be made cheaper.

How to achieve this? During the training phase, more resources must be invested to make the model more compact and efficient.

In this regard, the U.S. has a particularly clear advantage — we possess overwhelming computational power. We can leverage stronger computing resources to train models more thoroughly, resulting in faster inference and lower costs.

Harry: Why do we have such a computational advantage? Is it simply because we can obtain chips?

Jonathan: Yes, it's that straightforward.

Harry: Can't China rely on subsidies to absorb the high costs of inference? I understand they are indeed subsidizing now. Although the operational costs are higher, couldn't they sustain it through heavy investment?

Jonathan: This is where we need to distinguish between 'home ground' and 'away ground.'

The home ground refers to our own efforts to build sufficient computational capacity for the U.S. domestic market; away ground includes allies such as Europe, South Korea, Japan, and India — where we aim to provide computational support.

On home ground, China can certainly make significant investments. They plan to build 150 nuclear power plants. Although their chip energy efficiency may not be as high, abundant electricity, combined with government subsidies, can indeed lower operational costs.

But the situation differs when it comes to away ground. Consider a country with only 100 megawatts of electricity — could they easily build a nuclear power plant? Clearly not. This is something China can do but other countries cannot.

Therefore, whoever can provide more energy-efficient chips will gain an overwhelming advantage in the away game.

My assessment is that the United States will significantly outpace China in AI's away-game battle over the next two to three years.

If we act swiftly enough, we can bring a group of allies on board to enter the AI competition on the right track.

Harry: Then do you think we should open-source our models? After all, China has already become highly capable in this aspect of model development.

Jonathan: I believe the model itself may not necessarily be the decisive advantage.

Do you remember when you first invited me on the show? I predicted back then that OpenAI would open-source their models.

Harry: Yes, you did say that at the time.

Jonathan: The main reason for my judgment was actually based on OpenAI’s brand power.

To be honest, even if they simply rebranded Meta's LLaMA 2 model released two years ago, many people would still use it. That's the influence of branding.

Of course, their current models are indeed very strong, but even if one day they no longer lead in performance, people will still prefer to use their models.

In my view, Anthropic should open-source its previous-generation model. The purpose is not to compete for the state-of-the-art (SOTA) but to provide users with an alternative to Chinese models.

This way, those who are willing to use Chinese models can at least choose to use Anthropic's open-source model. This would bring several benefits:

First, prompts can be reused, similar to compatibility between software systems.

Second, if users start with Anthropic’s model and later decide to switch to their commercial model, the migration cost will be low. In contrast, switching from a Chinese model often involves incompatible prompt structures, making it harder to transition.

For instance, when OpenAI released its OSS model, many people began adopting it partly because the prompts they had written could be used directly without rewriting.

Of course, if you are developing a low-cost application and cannot yet afford OpenAI’s paid model, you might start with an open-source model. But as soon as your business grows and generates revenue, you will naturally want to upgrade to a better model.

At that point, prompt compatibility becomes a significant advantage.

Additionally, open-source models offer another benefit: they drive optimization across the entire infrastructure ecosystem, further reducing operational costs.

This positive feedback loop at the ecosystem level fosters substantial innovation.

Without energy, there is no computing power.

Harry, the topic we just discussed was about building as much computing power as possible. But as you know, this requires a massive amount of energy. So let me ask a straightforward question: Is nuclear energy the only viable solution to support this wave—or tsunami—of computing power?

Jonathan, no. While nuclear energy is indeed efficient and cost-effective, renewable energy can also be efficient and cost-controlled.

Let me give you a simple idea: if America's allies are willing to deploy computing power in places with cheap energy, they can actually access more energy resources than China.

Take the comparison between the US and Europe: overall, the US is a country that is more 'afraid of making mistakes.'

Harry, when you say 'afraid of making mistakes,' do you mean in the energy sector? Or is it a general statement?

Jonathan, it’s a general statement. The US tends to be more cautious in many aspects.

But we need to differentiate between two types of 'mistakes': one is doing the wrong thing, and the other is missing opportunities.

The US is particularly afraid of 'missing opportunities.' In a rapidly growing industry, the losses caused by 'not doing' something often outweigh those caused by 'doing the wrong thing.'

Europe, on the other hand, is more inclined to accept the risks of 'missing out.'

Look at how they are participating in the AI race now—by legislating boundaries, such as requiring that 'data must be stored domestically' or 'must remain within Europe,' and so on.

But in fact, if Europe really wants to participate in this AI race, it only needs to do one thing: let Norway fully develop wind energy.

Why Norway? Because their wind power availability rate is close to 80%. In other words, they generate stable electricity 80% of the time throughout the year.

Coupled with their already substantial hydropower infrastructure, if they scale up wind power to five times the current hydropower capacity, Norway alone could generate as much electricity as the entire United States—stable and green electricity at that.

Harry: So you mean... Norway alone could meet the entire electricity demand of the United States?

Jonathan: Yes, absolutely.

This also highlights how much potential non-nuclear clean energy resources we are still wasting. Of course, I also support the development of nuclear energy. Modern nuclear power technology has become very safe.

Harry: Then why don’t we massively adopt nuclear energy? Is it just because people are afraid?

Jonathan: It’s basically due to fear.

Harry: When you discuss these energy topics with European governments, how do they respond?

Jonathan: I generally avoid bringing up nuclear energy proactively. It’s a topic that will face widespread opposition the moment it’s mentioned, and I don’t want to trigger political resistance right from the start.

But I was recently in Japan, where they are seriously discussing restarting nuclear power. Outsiders often perceive Japan as being slow to act, but this is due to a lack of understanding of the details. Japan is indeed slow during the 'decision-making phase,' but once a decision is made, implementation is very swift.

For instance, when they decided to build a 2-nanometer wafer factory, on my last visit, they were already showcasing their produced 2-nanometer wafers.

Of course, the current yield rate is not yet sufficient for mass production, but the facility has been completed, and products have been manufactured; it only remains to further improve the yield rate.

They have also pledged to invest $65 billion in AI development, acting with remarkable speed.

Moreover, they will restart nuclear power.

Harry: Wow, if even Japan is restarting nuclear energy, then Europe really should be worried.

Jonathan: Exactly, they need to start catching up now, right?

Harry: However, I've been thinking about what you said regarding 'Norwegian wind energy.' The problem is, constructing that many wind turbines would take at least several years, wouldn't it? Do you think the Norwegian government would actually fund the construction of ten thousand turbines along the coastline?

Jonathan: Why does it have to be the Norwegian government funding it?

Harry: Then who else could it be?

For instance, tech giants could contribute, or countries that wish to deploy AI in Europe could also fully participate.

Take Saudi Arabia as an example. They have a substantial amount of gigawatt-level surplus power resources and are currently building a new generation of data centers around this electricity.

So why doesn't Europe collaborate with Saudi Arabia?

Saudi Arabia is currently advancing a plan called 'data embassies': the data remains under the ownership of the original sovereign nation but can utilize Saudi energy for operations. Why not collaborate? This way, the problem can be solved.

I estimate that they will soon complete the construction of 3 to 4 gigawatts of power generation capacity.

Therefore, the likely path forward is that tech giants will invest, lease clean energy from Norway or Saudi Arabia, and then deploy large-scale computing power based on these energy sources.

Harry: But these giants always complain, saying that administrative approvals are too slow and processes too complicated.

Jonathan: I previously spoke with a director of a large energy company involved in building nuclear power plants. He said that in the United States, the cost of the approval process is three times the construction cost of the nuclear power plant itself.

I am not sure about the situation in Europe, but generally speaking, the U.S. tends to be slightly faster than Europe in this regard.

Harry: In Europe, when building a nuclear power plant, what costs more: the engineering itself or the lengthy approval process?

Jonathan: I think what everyone should really remember is this: whoever controls computing power controls AI, and without energy, there is no computing power.

Harry: Let's bring it back to Europe itself – how do you see the current situation? How far behind is Europe? Is there still hope to catch up? I don’t want to be pessimistic, but do we really still have a chance to turn things around?

Jonathan: I don’t think Europe has missed its opportunity. As long as we start acting now, it’s not too late.

China is indeed faster in terms of execution efficiency, but let’s not forget: Europe has a population of 500 million, the United States has over 300 million, and together with all their allies, such as South Korea, which has mature experience in building nuclear power plants – for example, the UAE’s nuclear power plants were constructed by South Korea.

So why can’t we launch an 'Energy-focused Manhattan Project'?

I traveled to many European countries this summer; the heat was astonishing in the summer, and the winters are bone-chillingly cold.

This kind of extreme climate experience is almost nonexistent in other parts of the world. So why not take this opportunity to accelerate the construction of energy infrastructure?

Harry: I fully agree with your assessment. But the reality is that whether it’s a single government or multinational coordination, the speed of policy implementation will never keep up with the development of AI.

If we cannot advance our energy plans at a pace that matches AI, what will happen?

Jonathan: The future economic structure of Europe may degrade into a 'tourism economy,' where people come to see old buildings, take pictures, and that’s about it.

If you lack the foundational resources required to support the new generation of the economy, you are destined to be left out of the future.

And this future is built on the foundation of computational power for an AI-driven economy.

Harry: Is 'model sovereignty' enough? For example, if Europe trains its own models, can it seize the initiative?

Jonathan: Far from it. Without sufficient computational power, the model simply cannot function. Superior performance is useless without it.

Even if you train a model ten times better than OpenAI's, as long as OpenAI has ten times your computational power, their real-world performance will always surpass yours.

Harry: But aren’t companies like Misra promoting 'European model sovereignty'? They say, 'The German healthcare system and Croatia’s Ministry of Transportation use our models because we are not an American company.'

Jonathan: This doesn’t actually constitute a true competitive moat. You need to ask: what is its unique value?

'We are a European company, not subject to U.S. jurisdiction'—this sounds persuasive, especially in certain political contexts. But it has nothing to do with whether you have sufficient computational power.

It addresses 'others can’t control me,' but not 'whether I am capable of getting things done.'

Of course, I’m not dismissing Misra. We collaborate with them, and personally, I admire them.

I would like to emphasize that you need infrastructure to have real competitiveness. Without sufficient computing power, any model is just empty talk.

Computing power is the most predictable and certain link in the AI supply chain.

Harry, after hearing you talk about this, I feel like buying CoreWeave's stock right away (laugh). Their on-demand computing power sounds almost too ideal.

Jonathan: CoreWeave is indeed an excellent company, but their GPU quotas are also tight. Every company is facing quota issues now.

Harry: Didn't you tell me before that GPUs are not the most ideal inference infrastructure, right?

Jonathan: Yes. As training gradually matures, AI is entering a phase dominated by inference.

Harry: Does that mean NVIDIA's dominance will be weakened?

Jonathan: No. NVIDIA will sell every GPU it can produce.

Even if we eventually deploy ten times as many LPUs as GPUs, it will only increase market demand for GPUs, allowing NVIDIA to sell them at higher prices.

Harry: This sounds a bit counterintuitive. Why is that?

Jonathan: The more you deploy, the more training is required to support the inference performance; the more you train, the more inference capabilities need to be deployed to amortize the training costs.

The two are in a positive feedback loop.

Harry: Do you think the current development of the inference market aligns with your initial predictions? For example, in terms of maturity and deployment speed?

Jonathan: One thing that completely surprised me is that AI is based on language.

This allows people to interact with AI in the simplest way possible. I originally thought it would be like AlphaGo—an odd kind of intelligent system.

But it turned out to be language-based, meaning anyone can use it.

My original prediction was that AI would arrive earlier but grow at a slower pace. However, the reality is that AI came a bit later but has grown far faster than I anticipated.

The barrier to interaction is incredibly low. Currently, nearly 10% of the global population uses ChatGPT weekly. That’s astonishing.

Harry: Indeed. So, do you know what the main obstacles are that currently limit its further adoption?

Jonathan: It’s computational power.

Computational power limits the quality of AI. But a more practical issue is this: even if the quality isn't perfect, as long as it supports more languages, more people will be willing to use it.

Harry, this is also one of the most common complaints we hear from users in the global market.

Jonathan, so how do we solve it? Just two words: more computational power.

With more computational power, you can process more data; with more data, you can train more thoroughly.

You can even generate synthetic data, which is also an important resource for driving training.

The three core elements of AI are data, algorithms, and computational power. Any improvement in these three areas will enhance overall capabilities.

There are no 'hard bottlenecks' between them. It’s not the case that if your algorithm remains unchanged, you cannot use more data; nor is it true that if the data remains fixed, you cannot add more computational power. As long as there is progress in one area, AI will become stronger.

This is precisely why the development path of AI is not overly complex. As long as you continue to invest in one dimension, the whole system will advance.

Among these three, computational power is the easiest to improve.

Algorithmic progress is slow; high-quality data is difficult to collect, and synthetic data still has uncertainties.

But computational power is different. You simply need to write a sufficiently large check, wait for some time, and you will get more.

Computational power is the most predictable and certain link in the AI supply chain.

Harry: Even though we have realized the importance of computational power, people still greatly underestimate the scale truly required, right?

Jonathan: Absolutely correct, the underestimation is very severe.

Harry: How exaggerated do you think this underestimation is?

Jonathan: It’s the same point again: every time you increase computational power for a model, its performance improves. This means our underestimation of the computational power needed has no upper limit.

This is different from the Industrial Revolution.

In the industrial age, having energy alone was not enough; you also had to build machines, which takes time. For instance, if you wanted more cars on the road, it wasn’t enough to just extract oil—you first had to manufacture the cars.

But AI is different.

Although you can improve efficiency through algorithm optimization, the most straightforward method is to scale up computational power. Doubling your computational resources allows you to support more users, enhance model quality, and expand the boundaries of AI applications.

We've never encountered a situation like this before. It's not a bottleneck issue, but rather a system where adding more resources improves performance. Every bit of computational power we inject takes the overall performance of AI to the next level.

In the future, there will be a 'labor shortage.'

Harry, you previously mentioned that AI can enhance economies. Is this based on the assumption that part of the $10 trillion currently spent on labor in global GDP will gradually shift towards AI? Do you think this structural transition will actually happen in the next five years?

Jonathan, I believe there will be a 'labor shortage' in the future. It’s not that AI will take everyone’s jobs, but rather that we won’t have enough people to fill the new roles created by AI.

There will be three major changes ahead: First, strong deflationary pressures. Your coffee, your rent—almost everything will become cheaper.

Harry, wait a minute, why will even coffee become cheaper?

Jonathan, because AI and automation will significantly improve efficiency: robots will handle coffee cultivation; AI will manage supply chain logistics; and even coffee itself can be optimized through genetic engineering to increase yields.

From cultivation to transportation to sales, every step in the process will become more efficient, naturally reducing costs.

Thus, we are about to enter a phase of systemic deflation, where living costs decrease and expenditures required to sustain life diminish.

Second, people will gradually exit the labor market.

People will choose to work less or even retire early. With the lower cost of living, the motivation to work naturally declines.

Third, entirely new jobs and industries will be created.

Think about it: a hundred years ago, 98% of Americans were engaged in farming. When agriculture only required 2% of the workforce, what did the remaining 98% do? They didn’t become unemployed; instead, they moved into new industries that didn’t exist back then.

The future will be similar: the profession of 'software engineer' was unheard of 100 years ago; the role of 'internet celebrity' was incomprehensible in the past. In the future, we might see 'vibe coding' become a new basic skill that everyone knows a bit about.

Thus, the conclusion is this: deflation lowers the cost of living; many people choose to work less or even leave the workforce; meanwhile, new jobs and companies emerge in large numbers, leading to a labor shortage.

Harry: This is completely counter to mainstream narratives. Everyone today is worried that AI will leave millions unemployed, but you’re saying the opposite — that we’ll actually lack people to fill new positions.

Jonathan: Exactly. It’s like how people once feared global famine on a massive scale, but as our productivity increased, humanity became abundant instead.

People always underestimate the impact of technological change on economic structures.

Harry: Speaking of change, how do you view the impact of Trump and his team on the development of AI in the U.S.? Is it an accelerator or an obstacle?

Jonathan: Overall, it’s an accelerator. They have indeed simplified certain institutional aspects, such as approval processes. From a policy perspective, their attitude toward AI has been positive.

Harry, you just mentioned vibe coding. I have to follow up on that. Do you think this will become a long-term trend? Many people believe that these new tools are just transitional products. What’s your take?

Jonathan: I think the development path of vibe coding is akin to literacy.

In the early days, only a small portion of the population was literate, and those who could read and write became scribes—a skilled profession with good compensation.

Programming is in a similar situation today.

But in the future, everyone will be able to read and write code—not to become programmers, but because it will become a fundamental skill across all industries.

People in marketing will need to write automation scripts; customer service staff may also need to handle system logic.

I have a friend who owns 25 coffee shops and has never written a single line of code. He used vibe coding to create a simple inventory management system that tracks stock levels across all his stores.

Not a single line of code was written—it was all generated through natural language.

Later, he started encountering various minor bugs—'This button doesn’t work,' 'That page is stuck'—exactly the scenarios we engineers deal with daily.

He even began fixing these issues himself, still using vibe coding.

This is the real trend: everyone knows a little bit of coding, and it becomes part of daily life just like literacy.

Why strive to maintain a low-profit margin

Harry: In a world of exponential growth, I want to ask a practical question. Does a strategy of maintaining low profit margins make sense? For instance, your self-developed AI platform doesn’t have a particularly high gross margin. Could low profits be considered a disadvantage?

Jonathan: This is how I see it: the true purpose of profit is to give you a buffer against volatility.

If your profit margin is very low, you may not be able to withstand challenges such as financing difficulties or market fluctuations. You might fail to secure the next round of funding; banks might be unwilling to lend.

But with profit, you have the strength to stay in the game.

Of course, high profits also attract competitors. As the saying goes, 'Your profit is my opportunity.'

So it’s always a trade-off: you need to choose between 'stability' and building a 'moat.'

Harry: So how does your company internally think about profit margins?

Jonathan: I believe companies must have profit margins, but that space should primarily be given back to customers.

You need to be flexible, able to tighten and loosen as needed. This way, you will be in a favorable position in the market.

I once interviewed a CFO candidate, and although we ultimately hired someone else, that candidate made an interesting statement: We can use price increases to lower demand.

Harry: It sounds quite reasonable and makes sense from an economic perspective.

Jonathan: Yes, it logically makes sense.

But I countered: Should we also abuse brand trust? Should we leverage customers’ trust in us to sell them things they don’t need?

Brand value is valuable. You should maintain brand trust at the highest possible level.

Because trust has a compounding effect.

By the same token, I believe profit margins should be low enough that customers feel you are always giving them a good price. If you make too much profit, you are positioning yourself against your customers.

I hope our profit margin is as low as possible while still ensuring the stable operation of the company.

Where does my cash flow come from? From increased sales. And the reason I like the computing power business is that the demand for computing power is endless.

This is Jevons Paradox (a concept proposed by British economist William Stanley Jevons in 1865. He observed that as the efficiency of steam engines improved, allowing more work to be done with less coal, people expected coal consumption to decrease, but in reality, total coal consumption increased instead.): for every tenfold increase in computing power you produce, you can sell ten times the amount.

As long as we continue to reduce costs, users will buy more.

So my goal is to continuously lower costs, increase sales volume, and allow users to obtain more value at a lower price, thereby encouraging them to keep buying more. This flywheel will keep spinning.

Harry, where are we now in this cost reduction process?

I still remember during our early shows, I would ask certain questions which now seem amusing in hindsight, such as whether I was concerned that Canva’s profit margin might be dragged down after implementing AI. As it turns out, the cost of AI implementation has dropped by 98%!

Where do you think we are in this current cycle of cost reductions?

Jonathan: Let's take a step back and use Canva as an example again, as you mentioned earlier.

Truly successful companies don’t obsess over their income statements; they focus on their customers. What they do is solve the problems their customers face.

If you’re always competing on price, you’re bound to lose. What you should really be doing is differentiation—solving problems that customers haven’t been able to resolve and others can't either.

Customers are more than willing to pay for that.

If you only look at the income statement, this AI expenditure may seem unreasonable; but if customers can solve an otherwise unsolvable problem through this AI service, then it is entirely justified.

Moreover, typically, AI can bring an additional benefit: expanding market size. For example, two years ago, using Photoshop was difficult; now, you just need to provide a prompt, and an image can be generated.

The barrier to entry has been lowered, naturally expanding the market size. You may charge less for a single image, but your total revenue increases because you are serving more people.

The talent war in the AI field has entered its most frenzied phase in history.

Harry, let me ask a financial question. The S&P 500 is almost hitting 7000 points, and the 'Magnificent Seven' of US stocks have been on a relentless upward trajectory. We haven’t seen such a concentrated and structural rally in a long time. Yet when you talk about AI, it feels like this is just the beginning.

How should I reconcile these two seemingly contradictory sentiments? On one hand, the market seems to be nearing its peak, while on the other, the potential of AI appears limitless?

Jonathan, value has two dimensions: weighing and beauty contest.

Some assets are purely about the beauty contest, like cryptocurrencies. I’ve never bought Bitcoin. Why did I miss out? Because I’m not good at playing the 'beauty contest.' I don’t know what will catch on and what won’t. It’s something I can’t navigate.

The only thing I can do is see tangible value.

And when I look at AI, I see real, realized value.

The best example is that private equity (PE) firms are now flocking in. They want access to low-cost AI computing power because as long as they can secure more affordable computing resources, it will directly enhance the bottom line of their portfolio companies.

This is tangible value. When PE firms start chasing something, it’s not about popularity contests; it’s driven by value.

There are two reasons why a company might receive a high valuation: one is that the market believes the company will deliver value in the future; the other is that it has entered a hype cycle, becoming a pure 'beauty contest.'

Market participants are diverse—some focus solely on popularity voting, while others genuinely analyze fundamentals. They may arrive at the same investment conclusion, but their starting points are entirely different.

If we look at this from the perspective of value, or through the lens of the 'long-term weighing machine,' the most valuable asset in the economy is actually labor. And now, by providing greater computing power and better AI, we are effectively injecting more 'additional labor' into the entire economic system.

Something like this has never occurred in human economic history.

Harry, are you concerned that in the short term, if we encounter some setbacks, given how highly concentrated value is right now, a significant portion of the economy could be derailed?

Everyone is currently experiencing skyrocketing growth, but what if NVIDIA, Meta, Google, or Microsoft suddenly hits a 'speed bump'? If the high-speed AI train slows down, its multiplier effect would be astonishing. Are you worried?

Jonathan: I am indeed concerned. But it’s not because AI itself lacks value—it’s a natural reaction after the system overheats.

You can think of the market as a system that could grow along a healthy trajectory but may also spiral out of control due to overheating.

Once overheated, people will keep adding fuel to the fire until, at some point, they suddenly realize it is unsustainable. Then the entire market rapidly corrects, potentially even dropping below a reasonable range.

In this process, many originally sound companies may collapse due to a loss of confidence or broken financing channels.

But we also know that after every market adjustment, a new batch of truly outstanding companies will emerge.

Harry: Do you think such a correction might occur in the next year?

Jonathan: I really can't predict that.

The premise of making a prediction is that the prediction itself should not alter the outcome. However, within economic systems, predictions often trigger feedback loops, which then make the situation unpredictable.

For example, if a small asteroid is heading toward Earth and we are powerless to stop it, we can accurately predict that it will collide with our planet.

But if we can predict its trajectory and develop interception measures as a result, then the act of prediction itself changes the outcome.

Do you see where the problem lies now?

Harry: Yes, I understand.

Jonathan: In the economic system, you don’t even need to change the 'actual elements'; just a shift in the direction of capital flow can cause significant fluctuations in the system.

Because these feedback mechanisms are so sensitive, it is very difficult to predict economic trends.

I can’t tell you what the economy will be like next year. The only thing I can say is the biggest problem I currently see in the AI field:

If you find an excellent engineer, even if you are willing to hire him, he may not join you because he can go out and raise $100,000, $20 million, or even $100 million on his own.

What would he do? Of course, he would start his own business.

This makes it very difficult for us to gather key talents in a startup company.

But from another perspective, AI itself is also improving the efficiency of small teams. Even if talent is scattered, tasks can still be accomplished.

Harry: Do you think the current market is getting a bit overheated?

Jonathan: To determine whether the market is overheated, there is a simple indicator: Is the economic system hindering companies from succeeding? If not, then I don’t think it’s overheated.

Harry: But look, now there is too much capital available, which is making it hard for companies like yours to recruit good people. Many people, after receiving funding, are unwilling to join Groq and instead choose to start their own businesses…

Jonathan: Exactly, so now I have to say, 'Please, stop doing this (laughs).'

That said, AI has indeed made everyone more efficient.

So perhaps, even as the economy continues to boom, companies can still achieve sustained success.

Who knows? We’ve never experienced an era like this before.

Harry: Hasn’t the war for talent gone completely insane now?

Jonathan: Absolutely. It’s the most intense it has ever been, but it’s only happening within the tech industry.

OpenAI and Google enter a 'two-power rivalry.'

Harry: If you look at the sports industry, it has always been pretty crazy, especially in recent years.

If you look back two or three decades, the salaries of top athletes were comparable to what tech professionals earn today. It’s just that people are gradually realizing that the real value lies with those top talents.

But the sports industry has many limitations: a limited number of teams and salary caps. The tech industry is entirely different—there’s no cap. You can have an unlimited number of 'teams' and startups.

Just imagine, if anyone could form their own football team, how high would player salaries rise? And what would the valuation of the entire league become?

Is there any company that has left a deep impression on you now? And is there any that concerns you?

Jonathan: I think the most significant change has been with Google.

Google has long enjoyed a structural advantage: its corporate culture allows engineers to freely propose good ideas, while management's role is to stay out of the way.

This actually represents a strong systemic advantage.

Harry: Do you consider Gemini a success for them?

Jonathan: Yes. Just looking at user adoption rates, it’s quite impressive.

Harry: What do you think about the integration of Gemini into consumer-grade products?

Jonathan: To be honest, it’s not very good. Although it has been integrated into Gmail, hardly anyone can actually use it. It also appears in other products, but it feels like it was hastily inserted, and the overall experience is still immature.

However, it’s too early to draw conclusions. At least they are collecting user data through these touchpoints, which will be very helpful for future product directions.

This reminds me of Google TV back in the day. It was a complete failure at first, but they kept iterating and eventually it became Google Chrome.

This is a typical product trajectory: a company launches a half-finished product, gets criticized, endures it, keeps optimizing, and finally creates something truly remarkable.

Harry: But the premise is that you need to have an advantage in distribution channels to withstand those criticisms. And now, OpenAI has significantly narrowed that first-mover advantage.

Jonathan: Exactly. Google may indeed be a step behind. This is actually a classic question: can large companies innovate before startups gain control of distribution?

The reality now is that startups have already seized distribution rights. OpenAI’s products are being used by 10% of the global population, which is astonishing. It’s hard to imagine OpenAI suddenly disappearing now—I think it’s basically impossible.

Harry: So are we now entering a 'duel between two giants' phase?

Jonathan: Exactly, it’s OpenAI and Google. Anthropic is actually doing something different.

Harry: Are you saying that Anthropic is more focused on code generation, while both OpenAI and Google are concentrating on chatbots?

Jonathan: OpenAI does chatbots, and so does Google; Google works on code generation, and so does OpenAI.

Harry: So which tool does your team use most frequently now?

Jonathan: Recently, engineers have shown a stronger preference for Codex over Anthropic's tools.

Harry: Wow.

Jonathan: Moreover, this preference seems to change on almost a monthly basis.

Our principle is: we do not mandate which tool to use, but there is one steadfast rule—AI must be utilized.

Without AI, you simply lack competitiveness.

We observed that they previously used Sourcegraph, then switched to Enro, and now moved to Codex. Next month, they might switch back to Sourcegraph.

Harry: In such frequent switching scenarios, do these tools still hold long-term value?

Jonathan: Our engineers are all on the front lines. If a tool proves better, they immediately switch to it.

Of course, not everyone behaves this way, but many within our circle do.

However, enterprise clients are different. They sign contracts lasting a year or even longer and generally do not switch tools easily.

Both OpenAI and Anthropic are significantly undervalued.

Harry, if you had to choose now: OpenAI valued at $500 billion, Anthropic at $180 billion, which would you invest in?

Jonathan: I'd invest in both.

Harry: You would?!

Jonathan: Of course. They're both significantly undervalued.

Many people still view them as competitors fighting for market share in a limited market. However, what they are actually doing is continuously raising the ceiling of this market.

Harry: If we follow this bull market logic further: how far can these two companies go?

Jonathan: I believe the current tech giants can continue to grow significantly.

But why can't AI labs catch up? The 'Big Seven' will rise, and AI labs can also catch up—these two things are not contradictory.

The real question is: Will AI labs eventually surpass the 'Big Seven'?

(Cong's Note: Like Ilya Sutskever, former Chief Scientist and co-founder of OpenAI, who started anew in 2024 by establishing a new AI lab called Safe Superintelligence (abbreviated as SSI) with his team. Its goal is highly ambitious: instead of rushing to develop products and commercialize, it focuses on building stronger models and achieving safer alignment, following the classic route of 'top-tier research teams laying the foundation first.')

Harry, what do you think the deciding factor is?

Jonathan, I’m not sure. But honestly, I think they might eventually merge into a new Mag Nine (nine giants).

Harry, then it might not just be Mag 9, but Mag 11 or even Mag 20. Do you think AI labs will eventually enter the application layer and take over that space as well?

That’s the natural trajectory for all successful tech companies. They start doing what their customers were originally doing, moving up the technology stack and swallowing up their customers' businesses.

And then a new generation of entrepreneurs will build new layers on top of them.

Take OpenAI, for example. Sam Altman said on your show: if you’re just making minor tweaks on top of OpenAI’s work, you’ll quickly become obsolete. He was just honestly stating a fact.

On our end, we’ve chosen a different path: we don’t build models ourselves or train our own large models.

It’s like drawing a line in the sand: you can confidently build on top of our infrastructure because we won’t compete with you.

Of course, maybe we’re wrong. Perhaps one day we’ll be reverse-acquired by our own clients.

But this also means that you can trust us not to touch your cheese.

Harry: But that sounds like a very expensive decision. How much money will it cost to train the model on your own?

Jonathan: Extremely expensive.

Harry: Speaking of money, didn’t you just raise funds recently?

Jonathan: We just raised $750 million, with a valuation of around $7 billion.

Harry: That’s impressive! Congratulations! But seriously, is it enough?

Jonathan: Actually, we initially planned to raise only $300 million.

You just mentioned profitability. For companies like ours that produce hardware, it’s actually manageable. Unlike companies that develop models, we can make a profit by selling hardware.

Harry: I thought you were selling hardware at a loss?

Jonathan: No, we make money from selling hardware.

It depends on the software. Some models are profitable when run on our chips, while others can cover operational costs but do not provide sufficient return on capital expenditure. We are very cautious about capital spending.

We definitely make money selling hardware. Even for the models with the thinnest profit margins, as long as our chips have a long enough lifespan, there is still hope for profitability. Right now, we just don’t know how long these chips will last.

Harry: In the long term, what is the trend for your gross margin? Will your profit margins increase in the future?

Jonathan: That’s one of the small advantages of being a private company—I can choose not to tell you. (Laughs)

Harry: But if you’re willing to share, that would be great. (Laughs)

Jonathan: That’s about the only benefit we have left!

Harry: Indeed, you don’t have a lock-up period, and your exit strategies are more flexible.

Jonathan: But I’ve never sold a single share. Truly, not once.

Harry: Then you really don’t understand the rules of the game. (Laughs) Returning to the topic of profit margins, how do you view them?

Jonathan: I still stand by my previous point: as long as there are no drastic fluctuations in the business, I hope our profit margins remain as low as possible.

The significance of profit margins lies in cushioning uncertainty; when you need funds, you can raise prices moderately to generate cash flow. However, under normal circumstances, prices should be kept as low as possible.

There is currently an extremely high demand for computing power. If a particular customer is in urgent need, we can slightly increase the price for them, thereby maintaining lower prices for other customers.

NVIDIA’s market capitalization will certainly reach $10 trillion.

Harry: Then can you predict the landscape of the chip industry five years from now?

Jonathan: I believe that five years from now, NVIDIA’s revenue share will still exceed 50%, but its shipment volume may account for less than half of the market.

Harry: You mean that it will capture more than half of the revenue but sell less than half of the chips?

Jonathan: Exactly. The brand itself holds value and can support higher pricing. But this also means it will 'lose its hunger'—with high profit margins, customers will still buy because purchasing NVIDIA products minimizes the risk of making mistakes or being fired.

It’s a good business with enduring value. If you invest in NVIDIA, it is highly likely to be a sound decision.

Harry: However, from the customer’s perspective, the current market concentration is very high—with just around thirty companies accounting for 90%, or even 99%, of procurement spending. For them, decisions won’t be based solely on branding but rather on which product truly enhances their business performance. Therefore, in the future, there will undoubtedly be increasing competition from various chips, as these large clients possess significant bargaining power.

You mentioned earlier that ‘investing in NVIDIA is almost certainly a safe bet.’ So, what are your thoughts on whether NVIDIA could reach a $10 trillion market capitalization five years from now?

Jonathan: If it doesn't happen within five years, I'll be surprised. But the real question is: Can Groq reach ten trillion?

Harry: Do you think it's possible?

Jonathan: Of course. Unlike NVIDIA, we are not constrained by the supply chain. Right now, we are the company with the greatest ability to scale up computational power globally.

And computational power is the scarcest resource of this era. Everyone is scrambling to acquire computational power at high prices, while we can supply it almost infinitely.

Harry: So what value of Groq do you think the market hasn't fully understood yet?

Jonathan: That depends on which month you're referring to. (Laughs)

For example, in terms of 'multi-user capability,' everyone originally thought we couldn't achieve it, but we demonstrated the ability to support multiple users in parallel on a single chip.

Harry: Is that because you use SRAM architecture?

Jonathan: Exactly. But do you know what the most frequently asked question I receive is? 'Isn't SRAM much more expensive than DRAM?'

The answer is yes, it is indeed more expensive.

Roughly speaking, the cost per bit of SRAM is about three to four times that of DRAM. This comparison is based solely on structure and does not account for other expenses.

Harry, can you briefly explain the difference between the two?

Jonathan: To use an informal analogy, SRAM is the internal memory within the chip, while DRAM is external memory.

In terms of design, SRAM requires 6 to 8 transistors, whereas DRAM only needs one capacitor and one transistor. As a result, SRAM occupies a larger silicon area, making it naturally more expensive. We are still deploying SRAM at the 3-nanometer process, which further increases costs.

Overall, the cost per bit of SRAM could be up to ten times that of DRAM.

However, what we truly focus on is the total system-level cost.

For instance, when running the Kimi model, we deploy it across 4,000 Groq chips, while others might use eight GPUs. This means we only need to maintain one copy of the model, whereas a GPU-based system would require 500 copies, consuming 500 times the memory.

So, although SRAM is expensive, the cost of DRAM may end up being higher. This exemplifies the principle of 'Don’t just look at the unit price of the chip; consider the overall system efficiency.'

Currently, we have expanded our optimization efforts from the system level to the global level.

We operate 13 data centers worldwide, covering the United States, Canada, Europe, and the Middle East. Based on the input and output characteristics of different regions, we deploy the most suitable models in each data center. In some cases, certain centers do not deploy specific models at all but instead rely on dynamic scheduling from other regions.

We don't aim for single-point optimization but rather global load balancing.

The core issue Groq is addressing now is how to meet the enormous demand.

Harry: Let me ask a more philosophical question: If you were completely unafraid of failure right now, what would you do?

Jonathan: To put it another way, what risks have we not yet taken?

One obvious choice is to double our supply chain orders. Our supply chain response cycle is six months, already faster than almost any other company.

Harry: So, how large is your current supply-demand gap?

Jonathan: Just last week, a client approached us saying they needed five times our current computing power. We can't simply say doubling capacity will be enough. You need to make the capacity not only larger but also sufficient in scale.

So, our risk decision is: Should we double the speed of capacity expansion?

We just completed a round of financing, initially planning to raise $300 million but ended up raising $750 million, doubling the amount, with oversubscription reaching four times the target.

We could have raised even more, but I was cautious about dilution and also considered the investors' interests. If we were willing to accept more dilution, we could have built more computing centers outright.

Moreover, we have a key advantage: our 'cost per token' is much lower than that of our competitors at the same inference speed.

This means we have stronger price competitiveness, and this is what truly adds value for customers.

Harry: So you're saying customers are that sensitive to price?

Jonathan: It's not about sensitivity or being frugal. Rather, if you halve the price, they can buy twice the computing power. Every dollar invested directly improves the output quality of the model.

Harry: Do you plan to go public in the future?

Jonathan: All our energy right now is focused on execution.

Going public is a different game, and the only core issue we need to solve now is whether we can meet the market's huge demand for computing resources.

Harry: What do you think Cerebras’ initial plan to go public and then abandoning it signifies?

Jonathan: Didn’t they just officially announce they’re not going public? That already says it all (laughs).

A rapid-fire Q&A session you can't miss

Harry, let's play a quick round of rapid-fire Q&A. What is the biggest misconception people have about NVIDIA right now?

Jonathan: Thinking that their software is a moat. The lock-in effect of CUDA is entirely a myth.

Harry: Really?

Jonathan: Maybe on the training side, but not at all on the inference side. We already have 2.2 million registered developers using Groq. NVIDIA says they have 6 million CUDA users? We'll see.

Harry: If you were starting Groq today, with NVIDIA already valued at four trillion and the AI boom in full swing, would you still choose to make chips?

Jonathan: No, I wouldn’t. That ship has sailed. It’s too late to start making chips now.

Harry: But there are still many chip startups just getting started, and they’ve raised substantial funding. Do you think they’re too late as well?

Jonathan: Too late. The reason I ventured into chips was because I worked on TPUs at Google, and together with friends from Google Brain, we conducted the strongest classification experiments on ResNet-50, beating all models at the time.

I could have gone into algorithms or even AI focused on formal logic. But I chose chips because they have a temporal moat.

Venture capital firms often ask me, 'Can others replicate your model?'

I said: Of course they can. But they are always three years behind us.

From design to mass production, even with perfect execution, it takes three years. And I’ve already completed three chip projects, all of which are now in production or preparing for mass production.

Moreover, we achieved A0 silicon (first-time-right). Do you know how many chips globally succeed on the first try? Only 14%.

Harry: So the probability of failure for each tape-out is 86%. Did you also allocate a budget for potential rework when developing the V2 chip?

Jonathan: Of course. We initially planned for the possibility of needing a second attempt. Surprisingly, however, it succeeded on the first try. That was completely unexpected.

That’s why I say 'the chip development cycle is three years'—and that’s under the assumption that everything goes smoothly.

NVIDIA also takes about three to four years to develop a generation, but they work on multiple product lines simultaneously.

Meanwhile, Groq has entered a 'one generation per year' rhythm: V3 followed within a year after V2, and V4 will come one year later.

Harry: What’s your take on Larry Ellison and Oracle’s 'second rise'?

Jonathan: Very smart business judgment combined with extremely fast execution.

Many people are now hesitating: Is AI overheating? Should we continue to invest? Oracle's choice is: go all in, no hesitation.

This is the way to win. What you're seeing now as a so-called 'greedy market' is actually just a few quick actors making big money.

Harry: If I were an investor, where should I be greedy and where should I be fearful?

Jonathan: Do you know Hamilton Helmer’s '7 Powers'? As long as you can see the moat, you should be greedy.

The reality is that most companies don’t have a moat, especially in the early stages.

What you need to do is: predict who will be able to build a moat.

So we should give these projects, where the 'moat hasn't been built yet,' a new name: pre-mo, short for pre-moat.

Harry: Over the past year, on what issues have you changed your mind?

Jonathan: Strictly speaking, it’s not a change, but rather a continuous focus. Our threshold for saying 'yes' is getting higher and higher, which has made our operations more efficient.

I used to think 'keeping options open' was important, but now I believe focus is the most crucial.

Of course, if we hadn't widely experimented in the early days, we wouldn't be where we are today. But now, we only pursue the path most likely to succeed.

Harry, do you think Elon Musk can make Grok and xAI a success?

Jonathan, I believe he can, but their final form might differ from others.

Every time a new track emerges, people assume they are competitors, but that’s not necessarily true. Anthropic is doing exceptionally well; they focus on code generation and have excelled in it.

xAI's strategy is to integrate chatbots into social scenarios, blending seamlessly with social networks. I wouldn’t use their models for coding or scientific research, although they do have a code model, they lack distribution channels for code.

The market will naturally differentiate. Just like the current 'Big Seven,' each company has overlapping businesses but different core directions. If you can't create differentiation, you will be eliminated.

Harry, what’s your take on Google, Microsoft, and Amazon? If you had to pick one to buy and one to sell, who would they be?

Jonathan, that depends on your time horizon.

In the short term, Microsoft might face some adjustments, given its somewhat delicate relationship with OpenAI at the moment. However, in the long run, they remain strong and will be fine.

Harry, do you think this poses a substantial blow to Microsoft?

Jonathan: No. There will be short-term impacts, but in the long run, it won't matter.

They have equity in OpenAI and can also use Anthropic's tools, which means they are 'hedging their bets.'

Moreover, they have already invested a significant amount of computing power. Even if OpenAI switches to another computing power supplier in the future, the computing power Microsoft possesses is itself as valuable as gold.

As for Amazon, I think they lack the DNA for AI.

You didn’t mention Meta just now, but Meta and Google have always had a strong AI culture.

Microsoft made up for it with OpenAI, while Amazon hasn’t caught up yet. However, at least they have the infrastructure.

Harry: One last question. I’d like to end on a positive note: What are you most looking forward to in the next five to seven years?

Jonathan: What I’m most looking forward to is, ironically, something that many people are afraid of.

I’m talking about the impact brought by AI.

We can draw an analogy with Galileo.

Hundreds of years ago, Galileo popularized the telescope. At that time, when people first used the telescope to observe the universe, they realized it was far vaster than imagined, making humanity appear incredibly insignificant. That moment was terrifying.

However, as time passed, we gradually came to understand that although we are small, the universe is even more magnificent and beautiful.

Today, large language models (LLMs) are like 'telescopes for the mind.' They make us feel that our intelligence is minuscule because they expand the boundaries of the concept of 'intelligence.'

But I believe that 100 years from now, we will realize that the world of intelligence is far more expansive than we ever imagined. And that will be a beautiful thing.

Looking to pick stocks or analyze them? Want to know the opportunities and risks in your portfolio? For all investment-related questions,just ask Futubull AI!

Editor/Joryn

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Airstar Bank. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.