Home Artificial Intelligence The Golden Age of Open Source in AI Is Coming to an End A (biased) history of open sourcing AI libraries and models Open Sourcing Decisions A Sea Change in Open Source Turning Tides in Open Source AI The Way forward for Open Source AI

The Golden Age of Open Source in AI Is Coming to an End A (biased) history of open sourcing AI libraries and models Open Sourcing Decisions A Sea Change in Open Source Turning Tides in Open Source AI The Way forward for Open Source AI

1
The Golden Age of Open Source in AI Is Coming to an End
A (biased) history of open sourcing AI libraries and models
Open Sourcing Decisions
A Sea Change in Open Source
Turning Tides in Open Source AI
The Way forward for Open Source AI

At the identical time of TensorFlow’s rise, foreshadowing what was yet to are available in open source AI, enterprise software went through an open source licensing crisis. Mostly due to AWS, which had mastered the craft of taking open source infrastructure projects and constructing industrial services around them, many open source projects exchanged their permissible licenses for “Copyleft” or “ShareAlike” (SA) alternatives.

Not all open source is created equal. Permissible licenses (like Apache 2.0 or MIT) allow anyone to take an open source project and construct a industrial service around it. “Copyleft” licenses (like GPL), just like Creative Common’s “ShareAlike” terms, are one technique to protect against this. They’re sometimes known as a “poison pill”, because they require any derivative product to be licensed the identical way. If AWS launched a service based on an open source project with a “Copyleft” license, the AWS service itself have to be open sourced under the identical license.

So, partially in response to competitive cloud services, the company creators and maintainers of open source projects like MongoDB and Redis switched up their licenses to less permissible alternatives. This led to a painful but entertaining back-and-forth between AWS and those firms on the principles and merits of open source, which has since calmed down a bit.

Note that this transformation in licensing had a deceptive impact on the open source ecosystem: There are still a number of latest open source projects being announced, however the licensing implications on what can and can’t be done with those projects are more complicated than most individuals realize.

At this point you ought to be asking yourself: If the company maintainers of open source infrastructure projects realized that others were reaping more of the industrial advantages than themselves, shouldn’t the identical be happening with AI? Isn’t this an excellent greater deal for open source AI models, which hold the mixture value of compute and data that went into creating them? The answers are: Yes and yes.

Although there appears to be a Robin Hood-esque movement around open source AI, the info is pointing in a distinct direction. Large corporations like Microsoft are changing licensing of a few of their hottest models from permissible to non-commercial (NC) licenses, and Meta has began to make use of non-commercial licenses for all of their recent open source projects (MMS, ImageBind, DINOv2 are all CC-BY-NC 4.0 and LLAMA is GPL 3.0). Even popular projects from universities like Stanford’s Alpaca are only licensed for non-commercial use (inherited by the non-permissible attributes of the dataset they used). Entire firms change their business models so as to protect their IP and rid themselves of the duty to open source as a part of their mission — remember when a small non-profit called OpenAI transformed itself right into a capped-profit? Notice that GPT2 was open sourced, but GPT3.5 or GPT4 weren’t?

More generally speaking, the trend towards less permissible licenses in AI, although opaque, is noticeable. Below is an evaluation of model licenses on Hugging Face. The share of permissible licenses (like Apache, MIT, or BSD) has been on a persistent decline since mid 2022, while non-permissible licenses (like GPL) or restrictive licenses (like OpenRAIL) have gotten more common.

Source: Evaluation by writer

To make things worse, the recent frenzy around large language models (LLMs) has further muddied the waters. Hugging Face maintains an “Open LLM Leaderboard” which goals to focus on “the real progress that’s being made by the open-source community”. To be fair, the entire models on the board are indeed open source. Nonetheless, a better look reveals that just about none are licensed for industrial use*.

Source: Evaluation by writer

*Between the writing of this post and its publication, the license for Falcon models modified to the permissible Apache 2.0 license. The general commentary continues to be valid.

If anything, the Open LLM Leaderboard highlights that innovation from big tech (LLaMA was open sourced by Meta with a non-commercial license) dominates all other open source efforts. The larger problem is that these derivative models are usually not as forthcoming about their licenses. Almost none declare their license explicitly, and you’ve gotten to do your individual research to search out out that the models and data they’re based on don’t allow for industrial use.

There may be a number of virtue-signaling in the neighborhood, mostly by well-meaning entrepreneurs and VCs who hope that there’s a future that just isn’t dominated by OpenAI, Google, and a handful of others. It just isn’t obvious why AI models ought to be open sourced — they represent hard-earned mental property that firms develop over years, spending billions on compute, data acquisition, and talent. Firms could be defrauding their shareholders if they only gave every thing away totally free.

“If I could spend money on an ETF for IP lawyers I’d.”

The trend towards non-permissible licenses in open source AI seems clear. Yet, the overwhelming volume of reports fails to indicate that the cumulative good thing about this work accrues almost entirely to academics and hobbyists. Investors and executives alike ought to be more aware of the implications and practice more care. I even have a powerful feeling that almost all startups within the emerging LLM cotton industry are constructing on top of non-commercially licensed technology. If I could spend money on an ETF for IP lawyers I’d.

My prediction is that the worth capture for AI (specifically for the most recent generation of enormous generative models) will look just like other innovations that require significant capital investment and accumulation of specialised talent, like cloud computing platforms or operating systems. A number of major players will emerge that provide the AI foundation to the remainder of the ecosystem. There’ll still be ample room for a layer of startups on top of that foundation, but just as there are not any open source projects dethroning AWS, I consider it most unlikely that the open source community will produce a serious competitor to OpenAI’s GPT and whatever comes next.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here