For better or worse, it seems BIS' philosophy has always been to allow the flow of inference chips into China. They have never targeted memory bandwidth despite the obvious implications around inference performance. Even December’s HBM-focused update specifically exempts memory chips "affixed to a logic integrated circuit."
Steelmanning their approach a bit... inference will likely constitute 90%+ of future AI lifecycle compute demand. Letting this market go to Chinese domestic chip makers would allow these firm to channel that revenue towards R&D spend on training chips. Blocking this reinvestment widens the gap between Nvidia/AMD and Chinese indigenous hardware companies. Depriving Chinese labs access to SOTA “training HW” hinders frontier model development without stymieing diffusion.
This all falls apart as frontier model development shifts towards RL-heavy pipelines. Today, leading labs are likely reallocating compute away from pre-training towards RL-heavy post-training approaches. Here, inference chips become more useful (harvesting samples, reward modeling, etc.).
Really great analysis on multiple fronts. The one topic I was hoping would be addressed is rented inference. If inference matters so much and Chinese research labs just need inference outputs to make better models, doesn't GPU access broadly matter? Local GPUs are likely more cost-effective and give them Xi's desired self-reliance, but the models could still get better if they're pulling inference from anywhere. That was an undercurrent of DeepSeek's success, distilling GPT-4, and it opens the same can of worms of what exactly is the US success criteria. Is it about reduced training capacity? Is it about access to frontier or near-frontier models? Is it about ensuring a persistent lead in model capability? Or is it about dependence on the West?
I’ve personally become more and more skeptical about whether or not export controls make sense at all, but this is certainly correct.
I can do inference for quite beefy models on old Nvidia GPUs at home… or, salivate over the 512GB Mac Studio that just came out. Memory bound is totally right.
For better or worse, it seems BIS' philosophy has always been to allow the flow of inference chips into China. They have never targeted memory bandwidth despite the obvious implications around inference performance. Even December’s HBM-focused update specifically exempts memory chips "affixed to a logic integrated circuit."
Steelmanning their approach a bit... inference will likely constitute 90%+ of future AI lifecycle compute demand. Letting this market go to Chinese domestic chip makers would allow these firm to channel that revenue towards R&D spend on training chips. Blocking this reinvestment widens the gap between Nvidia/AMD and Chinese indigenous hardware companies. Depriving Chinese labs access to SOTA “training HW” hinders frontier model development without stymieing diffusion.
This all falls apart as frontier model development shifts towards RL-heavy pipelines. Today, leading labs are likely reallocating compute away from pre-training towards RL-heavy post-training approaches. Here, inference chips become more useful (harvesting samples, reward modeling, etc.).
Are inference efficiency gains in China transferable to the West? Or are those improvements in inference not replicable by western firms?
They are readily transferable.
Really great analysis on multiple fronts. The one topic I was hoping would be addressed is rented inference. If inference matters so much and Chinese research labs just need inference outputs to make better models, doesn't GPU access broadly matter? Local GPUs are likely more cost-effective and give them Xi's desired self-reliance, but the models could still get better if they're pulling inference from anywhere. That was an undercurrent of DeepSeek's success, distilling GPT-4, and it opens the same can of worms of what exactly is the US success criteria. Is it about reduced training capacity? Is it about access to frontier or near-frontier models? Is it about ensuring a persistent lead in model capability? Or is it about dependence on the West?
I’ve personally become more and more skeptical about whether or not export controls make sense at all, but this is certainly correct.
I can do inference for quite beefy models on old Nvidia GPUs at home… or, salivate over the 512GB Mac Studio that just came out. Memory bound is totally right.
Perhaps Nvidia should be placed entirely under ITAR. Like optics were for decades. See ITEK, Corona satellites.