The established order of AI chip usage, that was once almost entirely U.S.-based, is changing. China’s immense progress in open-weight AI development is now being met with rapid domestic AI chip development. Prior to now few months, highly performant open-weight AI models’ inference in China has began to be powered by chips corresponding to Huawei’s Ascend and Cambricon, with some models beginning to be trained using domestic chips.
There are two large implications for policymakers and AI researchers and developers respectively: U.S. export controls correlates with expedited Chinese chip production, and chip scarcity in China likely incentivized lots of the innovations which can be open-sourced and shaping global AI development.
China’s chip development correlates highly with stronger export controls from the U.S. Under uncertainty of chip access, Chinese corporations have innovated with each chip production and algorithmic advances for compute efficiency in models. Out of necessity, decreased reliance on NVIDIA has led to domestic full stack AI deployments, as seen with Alibaba.
Compute limitations likely incentivized advancements architecturally, infrastructurally, and in training. Innovations in compute efficiency from open-weight leaders include DeepSeek’s introduction of Multi-head Latent Attention (MLA) and Group Relative Policy Optimization (GRPO). A culture of openness encouraged knowledge sharing and enhancements in compute efficiency contributed to lower inference costs, evolving the AI economy.
Domestic silicon’s proven sufficiency has sparked demand and models are starting to be optimized for domestic chips. In parallel, software platforms are shifting as alternatives to NVIDIA’s CUDA emerge and challenge NVIDIA at every layer; synergy between AI developers and chip vendors are making a latest, fast-evolving software ecosystem.
The shifting global compute landscape will proceed to shape open source, training, deployment, and the general AI ecosystem.
The State of Global Compute
Utility of and demand for advanced AI chips has followed an upward trajectory and is predicted to proceed to extend. Over the past few years all NVIDIA chips maintained dominance. Recently, latest players are garnering attention. China has had long-term plans for domestic production, with plans for self-sufficiency and huge monetary and infrastructural investments. Now, the subsequent generation of Chinese open-weight AI models are beginning to be powered by Chinese chips.
Broader trends worldwide are intensifying, with each the U.S. and China citing national security in chip and rare earth resource restrictions. As U.S. export controls tightened, the rollout of Chinese-produced chips seemingly accelerated. The rise of China’s domestic chip industry is fundamentally changing norms and expectations for global AI training and deployment, with more models being optimized for Chinese hardware and compute-efficient open-weight models picking up in adoption. In the previous few months, Chinese-produced chips have already began to power inference for popular models and are starting to power training runs.
The changes can affect the whole lot from techniques utilized in training, to optimizing for each compute efficiency and specific hardware, to lower inference costs, to the recent open source boom. This might shift each U.S. trade policy and China’s approach to global deployment, resulting in a way forward for AI Advancements from an American-focused global ecosystem to at least one where China is at the middle.
The Starting of a Rewiring
China’s domestic chip production has been in progress for years before the trendy AI boom. One of the notable advanced chips, Huawei’s Ascend, initially launched in 2018 but expanded in deployment starting in 2024 and increasingly throughout 2025. Other notable chips include Cambricon Technologies and Baidu’s Kunlun.
In 2022, the Biden administration established export controls on advanced AI chips, a move targeting China’s access to high-end GPUs. The strategy was intended to curb the availability of high-end NVIDIA GPUs, stalling China’s AI progress. Yet, what began as a blockade has paradoxically change into a catalyst. The intent to construct a wall as a substitute laid the inspiration for a burgeoning industry.
Chinese AI labs, initially spurred by a fear of being cut off, have responded with a surge of innovation, producing each world-class open-weight models like Qwen, DeepSeek, GLM, and Kimi, and domestic chips which can be increasingly powering each training and inference for those models. There’s a growing relationship between chip makers and open source, as the power to locally run open-weight models also results in mutually useful feedback. That is resulting in e.g. more Ascend-optimized models.
China’s advancements in each open source and compute are shifting the worldwide landscape. Martin Casado, partner at a16z, noted that a significant slice of U.S. startups are now constructing on open-weight Chinese models, and a recent evaluation shows Chinese open-weight models leading in popularity on LMArena.
The vacuum created by the restrictions has ignited a full-stack domestic effort in China, transforming once-sidelined local chipmakers into critical national assets and fostering intense collaboration between chipmakers and researchers to construct a viable non-NVIDIA ecosystem. This isn’t any longer a hypothetical scenario; with giants like Baidu and Ant Group successfully training foundation models on domestic hardware, a parallel AI infrastructure is rapidly materializing, directly difficult NVIDIA’s best advantage: its developer-centric software ecosystem.
See the Appendix for an in depth timeline of chip controls and effects on hardware development and deployment.
The Response: Powering Chinese AI
The 2022 ban, coinciding with the worldwide shockwave of ChatGPT, triggered a panic across China’s tech landscape. The secure default of abundant NVIDIA compute was gone. Claims of smuggling NVIDIA chips arose. Still, the ban had destroyed the trust from the research community, who, faced with the prospect of being left permanently behind, began to innovate out of necessity. What emerged was a brand new, pragmatic philosophy where a “non-NVIDIA first” approach became rational, not merely ideological.
How China’s Compute Landscape Catalyzed the Cambrian Explosion of Open Models
Chinese labs took a special path, specializing in architectural efficiency and open collaboration. Open source, once a distinct segment interest, became the brand new norm, a practical selection for rapidly accelerating progress through shared knowledge. This paradigm allows organizations to leverage existing, high-quality pre-trained models as a foundation for specialised applications through post-training, dramatically reducing the compute burden. A primary example is the DeepSeek R1 model, which required lower than $300,000 for post-training on its V3 architecture, thereby lowering the barrier for corporations to develop sophisticated models. While not the total base model, the fee reduction for the reasoning model is substantial. Algorithmic advances that improve memory corresponding to Multi-head Latent Attention (MLA) with DeepSeek’s V3 model, likely incentivized by compute limitations, are a big a part of January 2025’s “DeepSeek moment”.
That moment also catalyzed a bigger movement for Chinese corporations, including people who were closed-source, to upend strategies and spend money on compute-efficient open-weight models. These models’ lower costs could result from many variables and likewise are influenced by efficiency; as Chinese corporations lowered compute and inference costs, they passed those lower costs to users, further evolving the general AI economy.
-
DeepSeek’s (Open) Weight: Along with high performance and low price that created waves in early 2025, DeepSeek’s pioneering as an openly compute-efficient frontier lab is a big a part of what has made the corporate and its models mainstays. These advances can likely be attributed to innovating in a compute-scarce environment. Funded by investor Wenfeng Liang with a “pure pursuit of open source and AGI,” DeepSeek became the most-followed organization on Hugging Face. Its highly detailed technical papers, including a groundbreaking _Nature_-published study on its R1 model, set a brand new standard for scientific communication. While a big draw is its open-weights over its API, in 2024, DeepSeek slashed its API prices to 1/thirtieth of OpenAI’s, triggering a price competition. In 2025, DeepSeek-OCR further proved their prowess in compute efficiency and with the discharge of DeepSeek-V3.2-Exp, they passed on an additional 50%+ discount to the users. Notably, DeepSeek’s-V3.2-Exp model was also released with day zero support for deploying on Chinese chips (Huawei’s Ascend and Cambricon). This release also marks emphasis on CUDA alternatives and exemplifies a full-stack hardware-software AI infrastructure in deployment.
-
Qwen’s Ecosystem Dominance: Alibaba is on a path to manage a full stack of high performance models and in-house designed chips, reducing reliance on NVIDIA. The corporate’s Qwen family became a primary resource for global open-source research. Its permissive Apache 2.0 license enabled industrial use, which was a barrier to comparable models that always used more restrictive customs licenses, resulting in over 100,000 derivative models on Hugging Face. Alibaba recently unveiled improved chips for higher inference, with its PPU being integrated into domestic infrastructure projects.
-
An Industry-Wide Tidal Wave of Low-Cost, High Efficiency: More open-weight models released boasting SotA performance with significantly lower pricing. Zhipu AI returned with its GLM-4.5 and 4.6 open-weight releases, with each quickly reaching top trending on Hugging Face and 4.6 becoming the highest performing open-weight model on LMArena. GLM’s API pricing continually lowered, boasting cost-effectiveness that even offered a $3/month plan as a substitute for Claude Code at 1/5 of the worth. While full transparency on the pricing decisions is unclear, efficiency likely plays a robust role.
-
Seeds of Training Fully on Domestic Chips: While many upcoming chips are designed primarily for inference, more models are hinting at being trained on domestic chips. Ant Group pioneered training its Ling model on complex heterogeneous clusters of NVIDIA, Ascend, and Cambricon chips. Baidu successfully conducted continuous pre-training on a cluster of over 5,000 domestic Kunlun P800 accelerators, producing its Qianfan VL model.
Advances in Compute-Constrained Environments Pushing the Technical Frontier
The innovation was not confined to model weights alone; it went deep into the software and hardware stack.
-
Architectural Exploration: Grassroots independent researchers corresponding to Peng Bo, have championed Linear Attention as a possible successor to the Transformer. This approach, sometimes dubbed the “revenge of the RNN” and seen in models like RWKV, has been scaled into industrial grade models like MiniMax M1 and Qwen-Next by Chinese labs who willingly bet on high-risk, high-reward research. Meanwhile, DeepSeek has taken a special path by iterating on the unique Transformer architecture. Their work introduces innovations like Multi-head Latent Attention (MLA) and DeepSeek Sparse Attention (DSA) introduced with its v3.2 model, that are designed to significantly reduce computational costs during inference without sacrificing performance, while also accelerating Reinforcement Learning (RL) exploration through faster rollouts. Highly performant proprietary models architectures aren’t public and are due to this fact difficult to check.
-
Open Infrastructure: In a radical departure from corporate secrecy, labs shared their deepest engineering secrets. The Kimi team’s work on the Mooncake serving system formalized prefill/decoding disaggregation. StepFun’s Step3 enhanced this with Attention-FFN Disaggregation (AFD). Baidu published detailed technical reports on overcoming engineering challenges in its Ernie 4 training, while ByteDance’s Volcengine contributed verl, an open-source library that puts production-grade RL training tools into the community’s hands. What was once proprietary know-how became community knowledge, fueling a self-iterating flywheel of progress.
-
Training breakthroughs: DeepSeek’s DeepSeekMath paper introduced a novel reinforcement learning (RL) methodology, Group Relative Policy Optimization (GRPO), that significantly reduces compute costs in comparison with prior similar methods Proximal Policy Optimization (PPO) while stabilizing training and even higher accuracy. GRPO has since been featured in a DeepLearning.AI course, built on by Meta’s researchers of their Code World Model, and lauded as having “in a big way accelerated RL research program of most US research labs” by OpenAI research lead Jerry Tworek.
With all of the work aggregated, on public leaderboards like LMSYS’s Chatbot Arena, models like DeepSeek R1, Kimi K2, Qwen and GLM-4.6 now continuously appear near the highest alongside U.S. models. Innovation under constraints resulted in leaps.
The Aftermath: Hardware, Software and Soft Power
When AI models are trained and deployed, they are sometimes optimized for certain varieties of chips. Greater than the hardware itself, NVIDIA’s software universe has been a reliable friend to the worldwide AI ecosystem.
The deep-learning revolution, sparked by AlexNet’s 2012 victory on NVIDIA GPUs, created a symbiotic relationship. NVIDIA’s Compute Unified Device Architecture (CUDA), cuDNN, and Collective Communications Library (NCCL) has long formed the bedrock of AI research. A complete ecosystem, including popular frameworks like PyTorch and Hugging Face transformers were heavily optimized on CUDA. A complete generation of developers grew up inside this ecosystem which created enormous switching costs.
A software ecosystem reluctant to change from existing platforms at the moment are exploring elsewhere, which may very well be step one away from U.S. reliance. The software side has evolved with the rise of latest chips; developers are optimizing for and deploying their latest models on latest parallel platforms.
From Sufficient to Demanded
Prior to 2022, domestic chips from corporations like Cambricon and Huawei (Ascend) were rarely treated seriously. They were catapulted to the middle of the domestic AI ecosystem in 2025 when SiliconFlow first demonstrated DeepSeek’s R1 model running seamlessly on Huawei’s Ascend cloud a pair weeks after the R1 release. This created a domino effect, sparking a market-wide race to serve domestic models faster and higher on domestic chips.Fueled by your entire ecosystem and not only DeepSeek alone, the Ascend’s support matrix quickly expanded. This proved domestic silicon was sufficient and ignited massive demand. Notably, Huawei’s Ascend had zero-day integration with the discharge of DeepSeek v3.2–a level of collaboration previously unimaginable.
Domestic Synergy
Researchers began co-developing with domestic chip vendors, providing direct input and solving problems collaboratively. This synergy creates a development ecosystem tailored for Large Language Models (LLMs) that evolves much faster than NVIDIA’s CUDA.
A brand new generation of younger researchers, trained on this multi-vendor world, emerged without the old biases that domestic hardware is inferior to Nvidia’s chips. This collaborative approach has already resulted in adoption. The documentation for the DeepSeek-V3.1 model noting that its latest FP8 precision format explicitly goals “for next-gen domestic chips,” a transparent example of hardware-aware model co-design. Its successor, DeepSeek-V3.2, took this principle further by baking in TileLang-based kernels designed for portability across multiple hardware vendors.
A Latest Software Landscape
The CUDA ecosystem is now being challenged at every layer. Open-source projects like FlagGems from BAAI and TileLang are creating backend-neutral alternatives to CUDA and cuDNN. Communication stacks like Huawei Collective Communication Library (HCCL) and others are providing robust substitutes for NCCL. The ecosystem is substantially different from three years ago, which can have future reverberations globally.
Looking Ahead
Adaptations to geopolitical negotiations, resource limitations, and cultural preferences have led to leaps in each China’s development of highly performant AI and now competitive domestic chips. U.S. policy has modified throughout administrations, from prohibition to a revenue-sharing model, while China responds with a mixture of commercial policy and international trade law. Researchers and developers have innovated and adjusted. The results on open source, training, and deployment point to shifts in software dependencies, compute efficiency innovations that shape development globally, and a self-sufficient Chinese AI ecosystem.
China’s domestic AI ecosystem is accelerating, with corporations like Moore Threads, MetaX, and Biren racing toward IPOs. Cambricon, once struggling, has seen its valuation soar. This latest chip ecosystem’s expansion globally is yet to be decided.
The longer term of the worldwide chip ecosystem, and due to this fact the long run of AI progress, has change into a key item for upcoming leadership talks. The query isn’t any longer if China can construct its own ecosystem, but how far it’ll go.
Acknowledgements
Thanks to Adina Yakefu, Nathan Lambert, Matt Sheehan, and Scott Singer for his or her feedback on earlier drafts. Any errors remain the authors’ responsibility.
Appendix: A Timeline of Chip Usage and Controls
Before 2022, U.S. restrictions were targeted toward specific supercomputing entities. Policy then evolved as regulators and industry adapted.
-
The Initial Moves (October 2022):
-
Chips corresponding to Ascend are nascent while NVIDIA dominates the worldwide and Chinese market.
-
The Commerce Department’s Bureau of Industry and Security (BIS) released its “advanced computing” controls in an effort to address U.S. national security and foreign policy concerns. The rule established a compute threshold with an interconnect-bandwidth trigger, immediately cutting off China’s access to NVIDIA’s flagship A100 and H100 GPUs. China promptly filed a WTO dispute (DS615), arguing the measures were discriminatory trade barriers.
-
-
The Adjustment Era (Late 2022–2023):
-
NVIDIA’s 95% share of the market in China began to quickly drop.
-
NVIDIA began to develop compliant variants for the Chinese market. The A800 (November 2022) and H800 (March 2023) were created with reduced chip-to-chip bandwidth to satisfy regulatory requirements and function alternatives to the A100 and H100s. The immensely popular consumer-grade RTX 4090 was also restricted, prompting the creation of a China-specific RTX 4090D.
-
-
Closing Gaps (Late 2023–2024):
-
Performance in Chinese domestic chips slowly improves.
-
BIS comprehensively upgraded the framework. It removed interconnect bandwidth as a key test and introduced latest metrics: Total Processing Performance (TPP) and performance density. This was a direct, successful strike against the A800/H800s. Debates expanded on export controls for the H20 and even model weights.
-
-
Shifting the Narrative (2025):
-
Adoption of Ascend, Cambricon, and Kunlun sharply increases following January’s “DeepSeek moment”.
-
Also in January, the Biden Administration established its AI Diffusion Rule, imposing further restrictions for each chips and choose model weights amid security and smuggling concerns. In response, NVIDIA designed a brand new compliant chip, the H20. Leveraging NVIDIA’s increasing presence in political spheres, NVIDIA CEO Jensen Huang began publicly explaining the strategic importance of selling U.S. chips worldwide. The U.S. then issued a licensing requirement in April 2025, charging NVIDIA $5.5 billion and effectively halting sales, before rescinding the AI Diffusion Rule in May 2025.
-
-
The Compromise (August 2025):
-
Alibaba proclaims a brand new chip for inference.
-
After intense negotiations, the Commerce Department began issuing licenses for the H20 with an unprecedented 15% revenue-sharing arrangement. But by the point the H20 was unbanned, the market had already began to alter.
-
-
China’s Response (Late 2025):
-
Day zero deployment begins for Ascend and Cambricon amongst latest DeepSeek models.
-
Because the U.S. shifted to a revenue-sharing model, Beijing responded. Chinese regulators reportedly instructed firms to cancel NVIDIA orders, steering demand toward domestic accelerators under a “secure supply at home” narrative. This was followed by an anti-discrimination investigation into U.S. measures and an anti-dumping probe into U.S. analog ICs, centering chips in future leadership talks.
-
