Ban the H20: Competing in the Inference Age

Mar 7

How inference scaling should change US AI strategy

7 Comments

For better or worse, it seems BIS' philosophy has always been to allow the flow of inference chips into China. They have never targeted memory bandwidth despite the obvious implications around inference performance. Even December’s HBM-focused update specifically exempts memory chips "affixed to a logic integrated circuit."

Steelmanning their approach a bit... inference will likely constitute 90%+ of future AI lifecycle compute demand. Letting this market go to Chinese domestic chip makers would allow these firm to channel that revenue towards R&D spend on training chips. Blocking this reinvestment widens the gap between Nvidia/AMD and Chinese indigenous hardware companies. Depriving Chinese labs access to SOTA “training HW” hinders frontier model development without stymieing diffusion.

This all falls apart as frontier model development shifts towards RL-heavy pipelines. Today, leading labs are likely reallocating compute away from pre-training towards RL-heavy post-training approaches. Here, inference chips become more useful (harvesting samples, reward modeling, etc.).

Expand full comment

Failure to Launch

Mar 7

Are inference efficiency gains in China transferable to the West? Or are those improvements in inference not replicable by western firms?

Expand full comment

Reply (1)

Nate Boyd

Mar 8

They are readily transferable.

Expand full comment

Esteban Manuel

Mar 20

Madness is doing the same thing over and over again and expecting different results. Don't they realize that the greater the restrictions, the more Chinese develop? In a couple of years, China will develop its own chips that surpass Nvidia's.

Expand full comment

Rob L'Heureux

Mar 7

Really great analysis on multiple fronts. The one topic I was hoping would be addressed is rented inference. If inference matters so much and Chinese research labs just need inference outputs to make better models, doesn't GPU access broadly matter? Local GPUs are likely more cost-effective and give them Xi's desired self-reliance, but the models could still get better if they're pulling inference from anywhere. That was an undercurrent of DeepSeek's success, distilling GPT-4, and it opens the same can of worms of what exactly is the US success criteria. Is it about reduced training capacity? Is it about access to frontier or near-frontier models? Is it about ensuring a persistent lead in model capability? Or is it about dependence on the West?

Expand full comment

James Wang

Mar 7

I’ve personally become more and more skeptical about whether or not export controls make sense at all, but this is certainly correct.

I can do inference for quite beefy models on old Nvidia GPUs at home… or, salivate over the 512GB Mac Studio that just came out. Memory bound is totally right.

Expand full comment

DougAz

Mar 7

Perhaps Nvidia should be placed entirely under ITAR. Like optics were for decades. See ITEK, Corona satellites.

Expand full comment

ChinaTalk

Ban the H20: Competing in the Inference Age