Not legal advice.
The EU AI Act, the world’s first comprehensive laws on artificial intelligence, has officially come into force, and it’s set to affect the way in which we develop and use AI – including within the open source community. For those who’re an open source developer navigating this recent landscape, you’re probably wondering what this implies on your projects. This guide breaks down key points of the regulation with a concentrate on open source development, offering a transparent introduction to this laws and directing you to tools that will provide help to prepare to comply with it.
Disclaimer: The data provided on this guide is for informational purposes only, and shouldn’t be regarded as any type of legal advice.
TL;DR: The AI Act may apply to open source AI systems and models, with specific rules depending on the form of model and the way they’re released. Usually, obligations involve providing clear documentation, adding tools to reveal model information when deployed, and following existing copyright and privacy rules. Fortunately, lots of these practices are already common within the open source landscape, and Hugging Face offers tools to provide help to prepare to comply, including tools to support opt-out processes and redaction of private data.
Try model cards, dataset cards, Gradio watermarking, support for opt-out mechanisms and personal data redaction, licenses and others!
The EU AI Act is a binding regulation that goals to foster responsible AI. To that end, it sets out rules that scale with the extent of risk the AI system or model might pose while aiming to preserve open research and support small and medium-sized enterprises (SMEs). As an open source developer, many points of your work won’t be directly impacted – especially for those who’re already documenting your systems and keeping track of information sources. Normally, there are straightforward steps you may take to organize for compliance.
The regulation takes effect over the following two years and applies broadly, not only to those throughout the EU. For those who’re an open source developer outside the EU but your AI systems or models are offered or impact people throughout the EU, they’re included within the Act.
🤗 Scope
The regulation works at different levels of the AI stack, meaning it has different obligations for those who are a provider (which incorporates the developers), deployer, distributor etc. and for those who are working on an AI model or system.
| Model: only general purpose AI (GPAI) models are directly regulated. GPAI models are models trained on large amounts of information, that show significant generality, can perform a wide selection of tasks and could be utilized in systems and applications. One example is a big language model (LLM). Modifications or fine-tuning of models also have to comply with obligations. | System: a system that’s capable of infer from inputs. This might typically take the shape of a conventional software stack that leverages or connects one or several AI models to a digital representation of the inputs. One example is a chatbot interacting with end users, leveraging an LLM or Gradio apps hosted on Hugging Face Spaces. |
|---|
Within the AI Act, rules scale with the extent of risk the AI system or model might pose. For all AI systems, risks could also be:
- Unacceptable: systems that violate human rights, for instance an AI system that scrapes facial images from the web or CCTV footage. These systems are prohibited and can’t be put available on the market.
- High: systems that will adversely impact people’s safety or fundamental rights, for instance coping with critical infrastructure, essential services, law enforcement. These systems have to follow thorough compliance steps before being put available on the market.
- Limited: systems that interact directly with people and have the potential to create risks of impersonation, manipulation, or deception. These systems need to fulfill transparency requirements. Most generative AI models could be integrated into systems that fall into this category. As a model developer, your models might be easier and more prone to be integrated into AI systems for those who already follow the necessities, reminiscent of by providing sufficient documentation.
- Minimal: the vast majority of the systems – that don’t pose the risks above. They need only comply with existing laws and regulations, no obligation is added with the AI Act.
For general purpose AI (GPAI) models, there’s one other risk category called systemic risk: GPAI models using substantial computing power, today defined as over 10^25 FLOPs for training, or which have high-impact capabilities. In accordance with a study by Stanford, in August 2024, based on estimates from Epoch, only eight models (Gemini 1.0 Ultra, Llama 3.1-405B, GPT-4, Mistral Large, Nemotron-4 340B, MegaScale, Inflection-2, Inflection-2.5) from seven developers (Google, Meta, OpenAI, Mistral, NVIDIA, ByteDance, Inflection) would meet the default systemic risk criterion of being trained using no less than 10^25 FLOPs. Obligations vary in the event that they are open source or not.
🤗 Easy methods to prepare for compliance
Our focus on this short guide is on limited risk AI systems and open source non-systemic risk GPAI models, which should encompass most of what’s publicly available on the Hub. For other risk categories, be sure to examine out further obligations that will apply.
For limited risk AI systems
Limited-risk AI systems interact directly with people (end users) and should create risks of impersonation, manipulation, or deception. For instance, a chatbot producing text or a text-to-image generator – tools that may also facilitate the creation of misinformation materials or of deepfakes. The AI Act goals to tackle these risks by helping the final end user understand that they’re interacting with an AI system. Today, most GPAI models will not be considered to present systemic risk. Within the case of limited-risk AI systems, the obligations below apply whether or not they’re open source.
Developers of limited-risk AI systems have to:
- Speak in confidence to the user that they’re interacting with an AI system unless this is clear, keeping in mind that end users won’t have the identical technical understanding as experts, so it is best to provide this information in a transparent and thorough way.
- Mark synthetic content: AI-generated content (e.g., audio, images, videos, text) have to be clearly marked as artificially generated or manipulated in a machine-readable format. Existing tools like Gradio’s built-in watermarking features can provide help to meet these requirements.
Note that you might even be a ‘deployer’ of an AI system, not only a developer. Deployers of AI systems are people or firms using an AI system of their skilled capability. In that case, you furthermore may have to comply with the next:
- For emotion recognition and biometric systems: deployers must inform individuals concerning the use of those systems and process personal data in accordance with relevant regulations.
- Disclosure of deepfakes and AI-generated content: deployers must disclose when AI-generated content is used. When the content is a component of a creative work, the duty is to reveal that generated or manipulated content exists in a way that doesn’t spoil the experience.
The data above must be supplied with clear language, at the newest on the time of the user’s first interaction with, or exposure, to the AI system.
The AI Office, answerable for implementing the AI Act, will help create codes of practice with guidelines for detecting and labeling artificially generated content. These codes are currently being written with industry and civil society participation, and are expected to be published by May 2025. Obligations might be enforced starting August 2026.
For open source non-systemic risk GPAI models
The next obligations apply for those who are developing open source GPAI models, e.g. LLMs, that don’t present systemic risk. Open source for the AI Act means “software and data, including models, released under a free and open source license that permits them to be openly shared and where users can freely access, use, modify and redistribute them or modified versions thereof”. Developers can select from a listing of open licenses on the Hub. Check if the chosen license suits the AI Act’s open source definition.
The obligations for non-systemic open source GPAI models are as follows:
- Draft and make available a sufficiently detailed summary of the content used to coach the GPAI model, in keeping with a template provided by the AI Office.
- The extent of detail of the content continues to be under discussion but ought to be relatively comprehensive.
- Implement a policy to comply with EU law on copyright and related rights, notably to comply with opt-outs. Developers need to make sure they’re authorized to make use of copyright-protected material, which could be obtained with the authorization of the rightsholder or when copyright exceptions and limitations apply. One in all these exceptions is the Text and Data Mining (TDM) exception, a way used extensively on this context for retrieving and analyzing content. Nonetheless, the TDM exception generally doesn’t apply when a rightsholder clearly expresses that they reserve the fitting to make use of their work for these purposes – this is named “opt-out.” In establishing a policy to comply with the EU Copyright Directive, these opt-outs ought to be respected and restrict or ban use of the protected material. In other words, training on copyrighted material is just not illegal for those who respect the authors’ decision to opt-out of AI training.
- While there are still open questions on how opt-outs ought to be expressed technically, especially in machine-readable formats, respect of data expressed in robots.txt files for web sites and leveraging tools like Spawning’s API are an excellent start.
The EU AI Act also ties into existing regulations on copyright and private data, reminiscent of copyright directive and data protection regulation. For this, look to Hugging Face-integrated tools that support higher opt-out mechanisms and personal data redaction, and stay updated on recommendations from European and national bodies like CNIL.
Projects on Hugging Face have implemented types of understanding and implementing opt-outs of coaching data, reminiscent of BigCode’s Am I In The Stack app and the integration of a Spawning widget for datasets with image URLs. With these tools, creators can simply opt out of allowing their copyrighted material for use for AI training. As opt-out processes are being developed to assist creators effectively inform publicly that they don’t need their content used for AI training, these tools could be quite effective in addressing those decisions.
Developers may depend on codes of practice (that are currently being developed and expected by May 2025) to display compliance with these obligations.
Other obligations apply for those who make your work available in a way that doesn’t meet the standards for being open source in keeping with the AI Act.
Also, note that if a given GPAI model meets the conditions to pose systemic risks, its developers must notify the EU Commission. Within the notification process, developers can argue that their model doesn’t present systemic risks due to specific characteristics. The Commission will review each argument and accept or reject the claim depending on whether the argument is sufficiently substantiated, considering the model’s specific characteristics and capabilities. If the Commission rejects the developers’ arguments, the GPAI model might be designated as posing systemic risk and might want to comply with further obligations, reminiscent of providing technical documentation on the model including its training and testing process and the outcomes of its evaluation.
Obligations for GPAI models might be enforced starting August 2025.
🤗 Become involved
Much of the EU AI Act’s practical application continues to be in development through public consultations and dealing groups, whose end result will determine how the Act’s provisions aimed toward smoother compliance for SMEs and researchers are operationalized. For those who’re involved in shaping how this plays out, now could be a fantastic time to become involved!
@misc{eu_ai_act_for_oss_developers,
creator = {Bruna Trevelin and Lucie-Aimée Kaffee and Yacine Jernite},
title = {Open Source Developers Guide to the EU AI Act},
booktitle = {Hugging Face Blog},
yr = {2024},
url = {},
doi = {}
}
Thanks, Anna Tordjmann, Brigitte Tousignant, Chun Te Lee, Irene Solaiman, Clémentine Fourrier, Ann Huang, Benjamin Burtenshaw, Florent Daudens on your feedback, comments, and suggestions.
