State of Open Source on Hugging Face: Spring 2026

This post examines how the open source AI landscape has shifted across competition, geography, technical trends, and emerging communities over the past yr. We primarily examine community activity on Hugging Face across many sorts of metrics to present a holistic view of the ecosystem.

This post builds on an earlier evaluation conducted mid-2025, available here, which examined what the Hugging Face Community is constructing. We recommend reading additional perspectives on the open source ecosystem in and outdoors of Hugging Face from the Data Provenance Initiative, Interconnects, OpenRouter and a16z, and MIT and the Linux Foundation. Because the Hugging Face ecosystem is distributed, analyses are a mixture of Hugging Face and community members’ work, each of which is appropriately credited.

Activity within the open source AI ecosystem has rapidly grown, with the variety of users, model, and dataset repositories all near doubling. In 2025, Hugging Face grew to 11 million users, greater than 2 million public models, and over 500,000 public datasets. This growth signals greater than increased interest in open source; it reflects a shift toward energetic participation, with users increasingly creating derivative artifacts comparable to fine-tuned models, adapters, benchmarks, and applications reasonably than only consuming pre-trained systems.

Data from Hugging Face | Hugging Face’s two million models and counting: Graph and story by AI World

The ecosystem stays highly concentrated. Roughly half of the models on Hugging Face have lower than 200 total downloads, and the highest 200 most downloaded models, or 0.01% of models, comprise 49.6% of all downloads.

Specialized communities form around particular domains, languages, or problem areas, and infrequently show sustained engagement and reuse even when their overall download counts are modest. Open source AI is best understood as a set of overlapping sub-ecosystems reasonably than a single uniform market.

Open Source in Competition

More firms, each large and small, are constructing on open source. Over 30% of the Fortune 500 now maintain verified accounts on Hugging Face. Startups continuously use open models as default components: Pondering Machines built its Tinker model options entirely on open weights, while popular IDEs comparable to VSCode and Cursor support each open and closed models. Established American firms comparable to Airbnb have increased their engagement with the open ecosystem, and Hugging Face has seen more legacy firms upgrading their organizational subscriptions over the course of 2025.

Big Tech firms are continuously creating recent repositories on Hugging Face Hub; visualized side-by-side, the strong increase in repository growth shows investment over time. NVIDIA has emerged because the strongest contributor.

Data from Hugging Face | Big Tech Is All-In On Open-Source AI, Graph and story by AI World

Studies of open software more broadly suggest that the downstream value created by open artifacts far exceeds the fee of manufacturing them. Similar dynamics are emerging in AI, where open models are reused, adapted, and specialized across hundreds of downstream applications. Organizations that rely exclusively on closed systems often incur higher costs and face reduced flexibility in deployment and customization.

The Geography of Open Source

All-time downloads over the past 4 years show clear frontrunner regions in model popularity. The U.S. and China have historically been top contributors, with the UK, Germany, and France as secondary in popularity. Models developed by individual users or distributed organizations with no clear geographic base account for about half of all platform downloads.

Data from Hugging Face | Graph and Research from Longpre et al. “Economies of Open Intelligence: Tracing Power & Participation within the Model Ecosystem”

The geographic composition of the open source ecosystem has fundamentally modified. Hugging Face data shows China surpassing the U.S. in monthly downloads and overall downloads. Prior to now yr, Chinese models quickly accounted for the plurality or 41% of downloads.

Data and Graph from Hugging Face

Industry’s share of overall development fell from around 70% before 2022 to roughly 37% in 2025. Meanwhile, independent or unaffiliated developers rose from 17% to 39% of all downloads over the identical period, at times accounting for greater than half of total usage. Individuals and small collectives focused on quantizing, adapting, and redistributing base models. These intermediaries now steer a meaningful portion of what typical users can run and the way innovations spread through the ecosystem.

Data from Hugging Face | Graph and Research from Longpre et al. “Economies of Open Intelligence: Tracing Power & Participation within the Model Ecosystem”

Different regions contribute in other ways. The US and Western Europe have historically dominated through large industry labs (Google, Meta, OpenAI, Stability AI), while China has increasingly led on each releases and adoption. France, Germany, and the UK proceed to contribute through research organizations, national AI initiatives, and specialized model families. Ecosystems supporting a wide range of contributors and organizational forms are inclined to produce more widely adopted artifacts.

Countries, Organizations, and Individual Users

Popular models from startups were more widespread. Competitive countries were France and South Korea. Notably, the fourth hottest entity for developing recent trending models were individual users, not organizations. Creating competitive models at a user level is more accessible than ever before.

Data and Graph from Hugging Face

Between the U.S. and China

Of the newly created models in 2025, nearly all of trending models were either developed in China or derivative of a model developed in China. The most well-liked models were developed by large organizations, predominantly from the U.S. and China. For more on the Chinese AI ecosystem, read our three part series reflecting on the changes in a single yr because the “DeepSeek Moment”, with one on strategic changes, two on architectural changes, and three on organizations and the long run.

In 2025, China’s AI ecosystem steered heavily into open source, following the viral release of DeepSeek’s R1 model in January. The variety of competitive Chinese organizations releasing models and the variety of repositories on Hugging Face skyrocketed. Baidu went from zero releases on the Hub in 2024 to over 100 in 2025. ByteDance and Tencent each increased releases by eight to nine times. Organizations that had previously favored closed approaches, including Baidu and MiniMax, shifted decisively toward open releases.

Data and Graph from Hugging Face

An analogous variety of popular U.S. organizations have consistently contributed a better volume of repositories over time. Meta and its former Facebook research organization account for a major proportion of open releases, as does Google to a lesser extent.

Data and Graph from Hugging Face

Next to one another, the steep upward trajectory of repository growth amongst popular Chinese organizations emerges as a key strategic difference.

Data and Graph from Hugging Face

Global Open Source and Sovereignty

Open source AI is increasingly tied to questions of sovereignty. Open weight models allow governments and public institutions to fine-tune systems on local data under national legal frameworks. Models that may be deployed on domestic hardware reduce reliance on foreign-controlled cloud infrastructure. Transparency around model architecture, training processes, and evaluation supports regulatory review and public accountability. Read more concerning the open source approach to sovereignty here.

On the national level, governments are taking motion. South Korea’s National Sovereign AI Initiative launched mid-2025 named national champions LG AI Research, SK Telecom, Naver Cloud, NC AI, and Upstage to supply competitive domestic models. Three models from South Korea trended concurrently on Hugging Face Hub in February 2026. In March 2026, In 2026, South Korea and U.S. startup Reflection AI announced an information center partnership, also bringing frontier open weight models to South Korea.

Switzerland’s Swiss AI initiative and various EU-funded projects reflect similar priorities. The UK’s principle of “public money, public code” has influenced several government-backed AI initiatives.

Hugging Face Trending Page February 2026

These investments in open-source and open weight AI are already paying dividends for countries with thriving AI training ecosystems of their very own, as we see that models and datasets are typically most utilized in the regions where they’re developed; with developers often turning to the models that best represent their languages and reflect similar technical and application requirements.

Data and Graph from Hugging Face

Model Popularity

Most liked models on the Hub show community attention, by way of ability to return to or reference the model or general popularity. While this metric doesn’t at all times reflect usage, the eye collected over time can show signals of interest. In a single yr, essentially the most liked models went from predominantly U.S.-developed from Meta’s Llama family, to a global mix with China’s DeepSeek-R1 at the highest.

Data and Graphic from Hugging Face

Papers and Scientific Contributions

While determining the worth of scientific contributions may be determined by many metrics, our upvote feature on the Hub shows papers from large AI organizations be widely appreciated by community members. Notably, essentially the most upvoted papers are from large organizations, mostly from the U.S. and China. The vast majority of the highest organizations are Chinese Big Tech firms, with ByteDance sharing a high volume of high impact papers.

Space by Hugging Face | PaperVerse Explorer

Of Hugging Face’s Every day Papers, a set of papers curated by Hugging Face’s AK, papers that reference model and dataset creation, showing essentially the most open source adoption, are generally diverse. Outstanding takeaways show medical papers being influential, while Big Tech’s influence is sparse.

Data from Hugging Face | Graphic and story by AI World

Derivative Models

How our community members select to construct on models, whether via fine-tuning, merging, or other methods, reflects model popularity and value. Alibaba as a corporation has more derivative models than each Google and Meta combined, with the Qwen family constituting greater than 113,000 derivative models. When including all models that tag Qwen, that number balloons to over 200,000 models.

Data and Graph from Hugging Face

Adoption and Accessibility

Model development has increasingly emphasized accessibility alongside scale. Smaller models are downloaded and deployed at far higher rates than very large systems, reflecting practical constraints around cost, latency, and hardware availability.

This small-model dominance occurs partly because way more models are released at that size. But even when normalizing for this, the info from the ATOM Project’s Relative Adoption Metric shows that the median top-10 models from 1-9B parameters are only downloaded about 4x greater than models above 100B. Automated systems and CI pipelines further inflate small model download counts, however the trend toward smaller, deployable models is real.

Data from Hugging Face | Graph and Article by ATOM

Engagement with open models tends to peak almost immediately after release, then slow. Mean engagement duration is roughly 6 weeks. Continuous improvement and frequent updates have turn into critical for maintaining relevance. DeepSeek’s successive releases (V3, R1, V3.2) kept it competitive at the same time as challengers emerged. Organizations that stagnate in development are inclined to lose share quickly to those with frequent updates or domain-specific fine-tunes.

Data from Hugging Face | Graph and Research from Choksi et al. “The Transient and Wondrous Lifetime of Open Models”

The mean size of downloaded open models rose from 827M parameters in 2023 to twenty.8B in 2025, driven largely by quantization and mixture-of-experts architectures. The median, nevertheless, increased only marginally, from 326M to 406M parameters. This divergence indicates that high-end LLM users are pulling up the mean while underlying small-model usage stays stable.

Data from Hugging Face | Graph and Research from Longpre et al. “Economies of Open Intelligence: Tracing Power & Participation within the Model Ecosystem”

Performance differences between frontier models and smaller systems often narrow rapidly through fine-tuning and task-specific adaptation. On the Hub, models with a whole lot of thousands and thousands of parameters support search, tagging, and document processing workflows, while models within the single-digit billions are widely used for coding, reasoning, and multimodal tasks. In consequence, most major model developers now release families of models spanning a spread of sizes. The rise of capable small models shifts autonomy closer to the sting, reducing dependency on centralized cloud providers.

Compute, Hardware, and Open Source

Open source AI development is closely linked to hardware trends. Most models are optimized for NVIDIA GPUs, but support for AMD hardware continues to expand. Stability AI model collections now optimize for each NVIDIA and AMD platforms. Libraries increasingly goal each, and tooling has improved to make cross-hardware deployment more straightforward. In 2025 Hugging Face launched the Kernel Hub to load and run kernels optimized for NVIDIA and AMD GPUs.

In parallel, Chinese open models are being released with explicit support for domestically developed chips. Alibaba has invested in inference-focused chip architectures designed to fill Chinese data centers with hardware able to running open source models locally.

While access to compute stays a core necessity across the board for development and deployment of AI models, open-source and open-weight models are helping break away from an ecosystem where it becomes the be-all and end-all, with increasingly more models at all levels of performance pushing efficiency from 10x to 1000x lower costs than flagship AI models the most important developers.

Data and Graphic from Hugging Face

Still, the query of infrastructure investment for open source stays urgent. Public funding for data centers capable of coaching and serving open models has turn into a growing policy discussion, particularly in Europe and the UK. The gap between the compute resources available to large closed-model firms and people accessible to the open source community continues to shape what is possible in open development.

Sub-Communities: Robotics

Robotics has emerged as one in every of the fastest-growing sub-communities on Hugging Face. The numbers are striking: robotics datasets grew from 1,145 in 2024 to 26,991 in 2025, climbing from rank 44 to the only largest dataset category on the Hub in only three years. For comparison, text generation, the second-largest category, had only around 5,000 datasets in 2025.

Data from Hugging Face | Graph and Story by AI World

Community-contributed datasets span every little thing from household manipulation tasks to autonomous driving. The biggest multimodal dataset for spatial intelligence, Learning to Drive (L2D), was released through a LeRobot collaboration with Yaak. Datasets like RoboMIND, with over 107,000 real-world trajectories across 479 distinct tasks and multiple robot embodiments, provide the sort of scale and variety needed for training generalizable robotic policies.

Hugging Face’s acquisition of Pollen Robotics opened open source robotic sales to each industry and academic labs, in addition to on a regular basis hobbyists. LeRobot, Hugging Face’s open source robotics library that gives models, datasets, and tools for real-world robotics in PyTorch, covering imitation learning, reinforcement learning, and vision-language-action models, experienced rapid growth. Over the past yr, its GitHub repository stars nearly tripled.

Data from GitHub | Graphic from star-history.com

Sub-Communities: AI for Science

Scientific research has turn into one other particularly energetic area. Open models and datasets are increasingly used for protein folding, molecular dynamics, drug discovery, and scientific data evaluation. All frontier AI firms now have dedicated science teams, though much current focus stays on literature discovery reasonably than direct experimentation.

Space by Hugging Face | Science Release Heatmap

Community-led projects have formed around shared research goals, often involving a whole lot of contributors working across institutions and disciplines. These efforts highlight the role of open source as a mechanism for coordinating large-scale, interdisciplinary work that might be difficult to arrange through traditional academic or corporate structures alone.

Looking Forward

The open source AI ecosystem continues to evolve through a mixture of worldwide participation, technical specialization, and institutional adoption. Several trends are more likely to define the following phase.

The geographic rebalancing of power is accelerating. Western organizations increasingly seek commercially deployable alternatives to Chinese models, creating urgency around efforts like OpenAI’s GPT-OSS, AI2’s OLMo, and Google’s Gemma to supply competitive open options from US and European developers. Whether these efforts can match the adoption momentum of Qwen and DeepSeek will likely be a defining query of 2026.

The expansion of sub-communities in robotics and science suggests that open source AI is expanding beyond language and image generation into the physical and experimental domains. The infrastructure, norms, and coordination mechanisms developed around text and image models are being adapted for brand spanking new modalities and use cases.

For researchers, developers, firms, and governments, open source stays a foundational layer for constructing, evaluating, and governing AI systems. With increasing agent deployments, open-source and its interoperability will likely be key for agents to thrive. Its trajectory over the past yr makes one thing clear: the open source ecosystem is where much of the sensible work of AI development, adaptation, and deployment takes place, and its influence on the broader AI landscape continues to grow.

Thanks to the Hugging Face community for continuing to construct the muse of the AI ecosystem 🤗

Source link

State of Open Source on Hugging Face: Spring 2026

Open Source in Competition

The Geography of Open Source

Countries, Organizations, and Individual Users

Between the U.S. and China

Global Open Source and Sovereignty

Model Popularity

Papers and Scientific Contributions

Derivative Models

Adoption and Accessibility

Compute, Hardware, and Open Source

Sub-Communities: Robotics

Sub-Communities: AI for Science

Looking Forward

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Self-Hosting Your First LLM

Researchers disclose vulnerabilities in IP KVMs from 4 manufacturers

Constructing the AI Grid with NVIDIA: Orchestrating Intelligence In every single place

Introducing Gemini Embeddings 2 Preview

Design, Simulate, and Scale AI Factory Infrastructure with NVIDIA DSX Air

State of Open Source on Hugging Face: Spring 2026

Open Source in Competition

The Geography of Open Source

Countries, Organizations, and Individual Users

Between the U.S. and China

Global Open Source and Sovereignty

Model Popularity

Papers and Scientific Contributions

Derivative Models

Adoption and Accessibility

Compute, Hardware, and Open Source

Sub-Communities: Robotics

Sub-Communities: AI for Science

Looking Forward

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.