Introduction
The sudden, rapid advancement of LLM capabilities – equivalent to writing fluent sentences and achieving increasingly high scores on benchmarks – has led AI developers and businesses alike to look towards what comes next: What game-changing technology is just on the horizon? One technology very recently taking off is “AI agents”, systems that may take actions within the digital world aligned with a deployer’s goals. Most of today’s AI agents are built by incorporating large language models (LLMs) into larger systems that may perform multiple functions. A fundamental idea underlying this latest wave of technology is that computer programs not need to operate as human-controlled tools, confined to specialized tasks: They will now mix multiple tasks without human input.
This transition marks a fundamental shift to systems capable of making context-specific plans in non-deterministic environments. Many modern AI agents don’t merely perform pre-defined actions, but are designed to research novel situations, develop relevant goals, and take previously undefined actions to realize objectives.
On this piece, we briefly overview what AI agents are and detail the moral values at play, documenting tradeoffs in AI agent advantages and risks. We then suggest paths forward to bring a few future where AI agents are as useful as possible for society. For an introduction to the technical facets of agents, please see our recent developer blogpost. For an introduction to agents written before modern generative AI (that is basically still applicable today) please see Wooldridge and Jennings, 1995.
Our evaluation reveals that risks to people increase with a system’s level of autonomy: The more control a user cedes, the more risks arise from the system. Particularly concerning are risks for the safety of people that arise from the identical advantages that motivate AI agent development, equivalent to freeing developers from having to predict all actions a system may take. Further compounding the difficulty, some safety harms open the door for other sorts of harm – equivalent to harms of privacy and security – and inappropriate trust in unsafe systems enables a snowball effect of yet further harms. As such, we recommend that fully autonomous AI agents are usually not developed. For instance, AI agents that may write and execute their very own code, beyond constrained code options controlled by the developer, might be endowed with the flexibility to override all human control. In contrast, semi-autonomous AI agents could have advantages that outweigh risks, depending on the extent of autonomy, the tasks available to the system, and the character of people’ control over it. We now turn to those topics in-depth.
What’s an AI agent?
Overview
There is no such thing as a clear consensus on what an “AI agent” is, but a commonality across recently introduced AI agents is that they’re “agentic”, that’s, they act with some level of autonomy: given the specification of a goal, they’ll decompose it into subtasks and execute each without direct human intervention. For instance, a really perfect AI agent could reply to a high-level request equivalent to “help me write higher blogposts” by independently breaking this task down into retrieving writing on the net that is comparable to your previous blog topics; creating documents with outlines for brand new blog posts; and providing initial writing inside each. Recent work on AI agents has made possible software with a broader range of functionality and more flexibility in how it could be used than up to now, with recent systems deployed for every little thing from organizing meetings (example1, example2, example3, example4) to creating personalized social media posts (example), without explicit instructions on easy methods to achieve this.
All recently introduced AI agents we’ve surveyed for this article are built on machine learning models, and most specifically use large language models (LLMs) to drive their actions, which is a brand new, novel approach for computer software. Other than being built on machine learning, today’s AI agents share similarities with those up to now, and in some cases realize previous theoretical ideas of what agents could be like: acting with autonomy, demonstrating (perceived) social ability, and appropriately balancing reactive and proactive actions.
These characteristics have gradations: Different AI agents have different levels of capabilities, and may go in isolation or in concert with other agents towards a goal. As such, AI agents could also be said to be kind of autonomous (or agentic), and the extent to which something is an agent could also be viewed on a continuous spectrum. This fluid notion of AI agent has led to recent confusions and misunderstandings about what AI agents are, which we hope to bring some clarity to here. A table detailing the various levels of AI agent is provided below.
| Agentic Level | Description | Who’s in Control | What that is Called | Example Code |
|---|---|---|---|---|
| ☆☆☆☆ | Model has no impact on program flow | 👤 The developer controls all possible functions a system can do and after they are done. | Easy processor | print_llm_output(llm_response) |
| ★☆☆☆ | Model determines basic control flow | 👤 The developer controls all possible functions a system can do; the system controls when to do each. | Router | if llm_decision(): path_a() else: path_b() |
| ★★☆☆ | Model determines how function is executed | 👤 💻 The developer controls all possible functions a system can do and after they are done; the system controls how they’re done. | Tool call | run_function(llm_chosen_tool, llm_chosen_args) |
| ★★★☆ | Model controls iteration and program continuation | 💻 👤 The developer controls high-level functions a system can do; the system controls which to do, when, and the way. | Multi-step agent | while llm_should_continue(): execute_next_step() |
| ★★★★ | Model writes and executes latest code | 💻 The developer defines high-level functions a system can do; the system controls all possible functions and after they are done. | Fully autonomous agent | create_and_run_code(user_request) |
Table 1. One example of how systems using machine-learned models, equivalent to LLMs, may be kind of agentic. Systems may also be combined in “multiagent systems,” where one agent workflow triggers one other, or multiple agents work collectively toward a goal.
Adapted from smolagent blog post, with changes tailored for this blog post.
From an ethics perspective, it is usually useful to know the continuum of autonomy when it comes to how control is ceded from people and given to machines. The more autonomous the system, the more we cede human control.
Throughout this piece, we use some anthropomorphising language to explain AI agents, consistent with the language that’s currently used to explain them. As was also noted in historic scholarship, describing AI agents using mentalistic language ordinarily applied to humans – equivalent to having knowledge, beliefs, and intentions – may be a problem for appropriately informing users about system abilities. For higher or worse, such language serves as an abstraction tool to gloss over more precise details of the technology. Understanding that is critical when grappling with the implications of what these systems are and the role they might play in peoples’ lives: Using mentalistic language describing AI agents doesn’t entail that these systems have a mind.
The Spectra of AI Agents
AI agents vary on plenty of interrelated dimensions:
- Autonomy: Recent “agents” can take a minimum of one step without user input. The term “agent” is currently used to explain every little thing from single-step prompt-and-response systems (citation) to multi-step customer support systems (example).
- Proactivity: Related to autonomy is proactivity, which refers back to the amount of goal-directed behavior that a system can take with out a user directly specifying the goal (citation). An example of a very “proactive” AI agent is a system that monitors your refrigerator to find out what food you might be running out of, after which purchases what you wish for you, without your knowledge. Smart thermostats are proactive AI agents which can be being increasingly adopted in peoples’ homes, mechanically adjusting temperature based on changes within the environment and patterns that they study their users’ behavior (example).
- Personification: An AI agent could also be designed to be kind of like a selected person or group of individuals. Recent work on this area (example1, example2, example3) has focused on designing systems after the Big Five personality traits – Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism as a “psychological framework” (citation) for AI. At the top of this spectrum can be “digital twins” (example non-agentic digital twin). There are currently not agentic digital twins that we’re aware of. Why creating agentic digital twins is especially problematic has recently been discussed by the ethics group at Salesforce, amongst others (example).
- Personalization: AI agents may use language or perform actions which can be aligned to a user’s individual needs, for instance, to make investment recommendations based on current market patterns and investments a user has made up to now.
- Tooling: AI agents even have various amounts of additional resources and tools they’ve access to. For instance, the initial wave of AI agents accessed search engines like google and yahoo to reply queries, and further tooling has since been added to permit them to govern other tech products, like documents and spreadsheets (example1, example2).
- Versatility: Related to above is how diverse the actions that an agent can take are. It is a function of:
- Domain specificity: How many alternative domains an agent can operate in. For instance, just email, versus email alongside online calendars and documents.
- Task specificity: How many differing kinds of tasks the agent may perform. For instance, scheduling a gathering by making a calendar invite in participants’ calendars (example), versus moreover sending reminder emails in regards to the meeting and providing a summary of what was said to all participants when it’s over (example).
- Modality specificity: How many alternative modalities that an agent can operate in – text, speech, video, images, forms, code. A few of the most up-to-date AI agents are created to be highly multimodal (example), and we predict that AI agent development will proceed to extend multimodal functionality.
- Software specificity: How many differing kinds of software the agent can interact with, and at what level of depth.
- Adaptibility: Just like versatility is the extent to which a system can update its motion sequences based on latest information or changes in context. This can also be described as being “dynamic” and “context-aware”.
- Motion surfaces: The places where an agent can do things. Traditional chatbots are limited to a chat interface; chat-based agents might also give you the option to surf the net and access spreadsheets and documents (example), and will even give you the option to do such tasks via controlling items in your computer’s graphical interface, equivalent to by moving across the mouse (example1, example2, example2). There have also been physical applications, equivalent to early-stage agents embodied in robots (example).
- Request formats: A typical theme across AI agents is that a user should give you the option to input a request for a task to be accomplished, without specifying fine-grained details on easy methods to achieve it. This may be realized with low-code solutions (example), with human language in text, or with voiced human language (example). AI agents whose requests may be provided in human language are a natural progression from recent successes with LLM-based chatbots: A chat-based “AI agent” goes further than a chatbot because it could operate outside of the chat application.
- Reactivity: This characteristic refers to how long it takes an AI agent to finish its motion sequence: Mere moments, or a for much longer span of time. A forerunner to this effect may be seen with modern chatbots. For instance, ChatGPT responds in mere milliseconds, while Qwen QwQ takes several minutes, iterating through different steps labelled as “Reasoning”.
- Number: Systems may be single-agent or multi-agent, meeting needs of users by working together, in sequence, or in parallel.
Risks, Advantages, and Uses: A Values-Based Evaluation
To look at AI agents through an ethical lens, we break down their risks and advantages in line with the various values espoused in recent AI agent research and marketing. These are usually not exhaustive, and are along with the risks, harms, and advantages which have been documented for the technology that AI agents are based on – equivalent to LLMs. We intend this section to contribute to the understanding of easy methods to develop AI agents, providing information on the advantages and risks in several development priorities. These values may additionally inform evaluation protocols (equivalent to red-teaming).
Value: Accuracy
- 🙂 Potential Advantages: By grounding in trusted data, agents may be more accurate than when operating from pure model output alone. This will be done via rule-based approaches or machine learning approaches equivalent to RAG, and time is ripe for novel contributions for ensuring accuracy.
- 😟 Risks: The backbone of recent AI agents is generative AI, which doesn’t distinguish between real and unreal, fact and fiction. For instance, large language models are designed to construct text that appears like fluent language – meaning they often produce content that sounds right, but could be very mistaken. Applied inside an AI agent, LLM output may lead to incorrect social media posts, investment decisions, meeting summaries, etc.
Value: Assistiveness
- 🙂 Potential Advantages: Agents are ideally assistive for user needs, supplementing (not supplanting) people. Ideally, they might help increase a user’s speed in completing tasks and their efficiency in ending multiple tasks concurrently. Assistive agents may additionally augment capabilities to attenuate negative outcomes, equivalent to an AI agent that helps a blind user navigate busy staircases. AI agents which can be well-developed to be assistive could offer their users more freedom and opportunity, help to enhance their users’ positive impact inside their organizations, or help users to extend their reach on public platforms.
- 😟 Risks: When agents replace people – equivalent to when AI agents are used as a substitute of individuals at work – this could create job loss and economic impacts that drive an extra divide between the people creating technology and the individuals who have provided data for the technology (often without consent). Further, assistiveness that’s poorly designed may lead to harms from overreliance or inappropriate trust.
Value: Consistency
One idea discussed for AI agents is that they might help with consistency, as they may be less affected than people by their surrounding environment. This may be good or bad. We are usually not aware of rigorous work on the character of AI agent consistency, although related work has shown that the LLMs many AI agents are based on is extremely inconsistent (citation1, citation2). Measuring AI agent consistency would require the event of latest evaluation protocols, especially in sensitive domains.
- 🙂 Potential Advantages: AI agents are usually not “affected” by the world in a way that humans are, with inconsistencies brought on by mood, hunger, sleep level, or biases within the perception of individuals (although AI agents perpetuate biases based on the human content they were trained on). Multiple firms have highlighted consistency as a key advantage of AI agents (example1, example2).
- 😟 Risks: The generative component of many AI agents introduces inherent variability in outcomes, even across similar situations. This might affect speed and efficiency, as people must uncover and address an AI agent’s inappropriate inconsistencies. Inconsistencies that go unnoticed may create safety issues. Consistency may additionally not all the time be desirable, as it could are available tension with equity. Maintaining consistency across different deployments and chains of actions will likely require an AI agent to record and compare its different interactions – which brings with it risks of surveillance and privacy.
Value: Efficiency
- 🙂 Potential Advantages: A selling point of AI agents is that they might help people to be more efficient – e.g., they’ll organize your documents for you, so you possibly can concentrate on spending more time along with your family or pursuing work you discover rewarding.
- 😟 Risks: A possible drawback is that they might make people less efficient, as attempting to discover and fix errors that agents introduce – which could also be a posh cascade of issues because of agents’ ability to take multiple sequential steps – may be time-consuming, difficult, and stressful.
Value: Equity
AI agents may affect how equitable, fair, and inclusive situations are.
- 🙂 Potential Advantages: AI agents can potentially help “level the playing field”. For instance, a gathering assistant might display how much time everybody has needed to speak. This might be used to advertise more equal participation or highlight imbalances across gender or location (example).
- 😟 Risks: The machine learned models underlying modern AI agents are trained on human data; humans data may be inequitable, unfair, exclusionary and worse. Inequitable system outcomes may additionally emerge because of sample bias in data collection (for instance, overrepresenting some countries).
Value: Humanlikeness
- 🙂 Potential Advantages: Systems able to generating human-like behavior offer the chance to run simulations on how different subpopulations might reply to different stimuli. This may be particularly useful in situations where direct human experimentation might cause harm, or when a big volume of simulations help to higher solve the experimental query at hand. For instance, synthesizing human behavior might be used to predict dating compatibility, or forecast economic changes and political shifts. One other potential profit currently being researched is that humanlikeness may be useful for ease of communication and even companionship (example).
- 😟 Risks: This profit could be a double-edged sword: Humanlikeness can lead users to anthropomorphise the system, which could have negative psychological effects equivalent to overreliance (citation), inappropriate trust, dependence, and emotional entanglement, resulting in anti-social behavior or self-harm (example). There’s concern that AI agent social interaction may contribute to loneliness, but see citation1, citation2 for nuances which may be gleaned from social media use. The phenomenon of uncanny valley adds one other layer of complexity – as agents turn into more humanlike but fall in need of perfect human simulation, they’ll trigger feelings of unease, revulsion, or cognitive dissonance in users.
Value: Interoperability
- 🙂 Potential Advantages: Systems that may operate with others can provide more flexibility and options in what an AI agent can do.
- 😟 Risks: Nevertheless, this could compromise safety and security, because the more an agent is capable of affect and be affected by systems outside of its more limited testing environment brings with it increased risk of malicious code and unintended problematic actions. For instance, an agent that’s connected to a checking account in order that it could easily purchase items on behalf of somebody can be able to empty the checking account. For this reason concern, tech firms have shunned releasing AI agents that could make purchases autonomously (citation).
Value: Privacy
- 🙂 Potential Advantages: AI agents may offer some privacy in keeping transactions and tasks wholly confidential, other than what’s monitorable by the AI agent provider.
- 😟 Risks: For agents to work in line with the user’s expectations, the user could have to offer detailed personal information, equivalent to where they’re going, who they’re meeting with, and what they’re doing. For the agent to give you the option to act on behalf of the user in a customized way, it may additionally have access to applications and data sources that may be used to extract further private information (for instance, from contact lists, calendars, etc.). Users can easily surrender control of their data – and personal details about other people – for efficiency (and much more if there may be trust within the agent); if there may be a privacy breach, the interconnectivity of various content brought by the AI agent could make things worse. For instance, an AI agent with access to phone conversations and social media posting could share highly intimate information to the world.
Value: Relevance
- 🙂 Potential Advantages: One motivation for creating systems which can be personalized to individual users is to assist be sure that their output is especially relevant and coherent for the users.
- 😟 Risks: Nevertheless, this personalization can amplify existing biases and create latest ones: As systems adapt to individual users, they risk reinforcing and deepening existing prejudices, creating confirmation bias through selective information retrieval, and establishing echo-chambers that reify problematic viewpoints. The very mechanisms that make agents more relevant to users – their ability to learn from and adapt to user preferences – can inadvertently perpetuate and strengthen societal biases, making the challenge of balancing personalization with responsible AI development particularly difficult.
Value: Safety
- 🙂 Potential Advantages: Robotic AI agents may help save people from bodily harm, equivalent to agents which can be able to diffusing bombs, removing poisons, or operating in manufacturing or industrial settings which can be hazardous environments for humans.
- 😟 Risks: The unpredictable nature of agent actions implies that seemingly secure individual operations could mix in potentially harmful ways, creating latest risks which can be difficult to forestall. (This is comparable to Instrumental Convergence and the paperclip maximizer problem.) It might probably even be unclear whether an AI agent might design a process that overrides a given guardrail, or if the best way a guardrail is specified inadvertently creates further problems. Subsequently, the drive to make agents more capable and efficient – through broader system access, more sophisticated motion chains, and reduced human oversight – conflicts with safety considerations. Further, access to broad interfaces (for instance, GUIs, as discussed in “Motion Surfaces” above) and humanlike behavior gives agents the flexibility to perform actions much like a human user with their same level of control without setting off any warning systems – equivalent to manipulating or deleting files, impersonating users on social media, or using stored bank card information to make purchases for whatever ads pop up. Still further safety risks emerge from AI agents’ ability to interact with multiple systems and the by-design lack of human oversight for every motion they might take. AI agents may collectively create unsafe outcomes.
Value: Scientific Progress
There’s currently debate about whether AI agents are a fundamental step forward in AI development in any respect, or a “rebranding” of technology that we’ve had for years – deep learning, heuristics, and pipeline systems. Re-introducing the term “agent” as an umbrella term for contemporary AI systems that share common traits of manufacturing operations with minimal user input is a useful method to succinctly check with recent AI applications. Nevertheless, the term carries with it connotations of freedom and agency that suggest a more fundamental change in AI technology has occurred.
All the listed values on this section are relevant for scientific progress; most of them are supplied with details of potential advantages in addition to risks.
Value: Security
- 🙂 Potential Advantages: Potential advantages are much like those for Privacy.
- 😟 Risks: AI agents present serious security challenges because of their handling of often sensitive data (customer and user information) combined with their safety risks, equivalent to ability to interact with multiple systems and the by-design lack of human oversight for every motion they might take. They may share confidential information, even when their goals were set by users acting in good faith. Malicious actors could also potentially hijack or manipulate agents to achieve unauthorized access to connected systems, steal sensitive information, or conduct automated attacks at scale. As an example, an agent with access to email systems might be exploited to share confidential data, or an agent integrated with home automation might be compromised to breach physical security.
Value: Speed
- On speed for users:
- 🙂 Potential Advantages: AI agents may help users to get more tasks done more quickly, acting as a further helping hand for tasks that should be done.
- 😟 Risks: Yet they may additionally cause more work because of issues of their actions (see Efficiency).
- On speed of systems:
- As with most systems, getting a result quickly can come on the expense of other desirable properties (equivalent to accuracy, quality, low price, etc.). If history sheds light on what is going to occur next, it could be the case in the long run that slower systems will provide higher results overall.
Value: Sustainability
- 🙂 Potential Advantages: AI agents may theoretically help address issues relevant to climate change, equivalent to forecasting the expansion of wildfires or flooding in urban areas alongside the evaluation of traffic patterns, then suggesting optimal routes and methods of transportation in real-time. A future self-driving AI agent may make such routing decisions directly, and will coordinate with other systems for relevant updates.
- 😟 Risks: Currently, the machine learning models AI agents are based on bring with them negative environmental impacts, equivalent to carbon emissions (citation) and the usage of drinking water (citation). Greater will not be all the time higher (example), and efficient hardware and low-carbon data centers might help reduce this.
Value: Trust
- 🙂 Potential Advantages: We are usually not aware of any advantages of AI agents relevant to trust. Systems needs to be constructed to be worthy of our trust, meaning that they’re shown to be secure, secure, reliable, etc.
- 😟 Risks: Inappropriate trust leads people to be manipulated, and other risks detailed for Efficiency, Humanlikeness, and Truthfulness. An extra risk stems from LLMs’ tendency to create false information (called “hallucinations” or “confabulations”): A system that is correct nearly all of the time is more prone to be inappropriately trusted when it’s mistaken.
Value: Truthfulness
- 🙂 Potential Advantages: We are usually not aware of any advantages of AI agents relevant to truthfulness.
- 😟 Risks: The deep learning technology AI agents are based off of is well-known to be a source of false information (citation), which may take shape in forms equivalent to as deepfakes or misinformation. AI agents may be used to further entrench such false information, equivalent to by gathering up-to-date information and posting on several platforms. Which means AI agents may be used to offer a false sense of what’s true and what’s false, manipulate people’s beliefs, and widen the impact of non-consensual intimate content. False information propagated by AI agents, personalized for specific people, may also be used to scam them.
AI Agents at HF
At Hugging Face, we’ve begun introducing the flexibility for people to construct and use AI agents in plenty of ways, grounding in values as discussed above. This includes:
Recommendations & What Comes Next
The present state-of-the-art of AI “agents” point forward in several clear directions:
- Rigorous evaluation protocols for agents should be designed. An automatic benchmark may be told by the different dimensions of AI agents listed above. A sociotechnical evaluation may be told by the values.
- Effects of AI agents should be higher understood. Individual, organizational, economic, and environmental effects of AI agents should be tracked and analyzed so as to inform how they needs to be further developed (or not). This could include analyses of the consequences of AI agents on well-being, social cohesion, job opportunity, access to resources, and contributions to climate change.
- Ripple effects should be higher understood. As agents deployed by one user interact with other agents from other users, they usually perform actions based on each other’s outputs, it’s currently unclear how their ability to satisfy the user’s goals might be affected.
- Transparency and disclosure should be improved. To be able to achieve the positive effects of the values listed above, and minimize their negative effects, it must be clear to people after they are talking to an agent and the way autonomous it’s. Clear disclosure of AI agent interactions requires greater than easy notifications – it demands an approach combining technical, design, and psychological considerations. Even when users are explicitly aware they’re interacting with an AI agent, they might still experience anthropomorphization or develop unwarranted trust. This challenge calls for transparency mechanisms that operate on multiple levels: clear visual and interface cues that persist throughout interactions, fastidiously crafted conversation patterns that frequently reinforce the agent’s artificial nature, and honest disclosure of the agent’s capabilities and limitations in context.
- Open source could make a positive difference. The open source movement could function a counterbalance to the concentration of AI agent development within the hands of a couple of powerful organizations. Consistent with the broader discussion on the values of openness, by democratizing access to agent architectures and evaluation protocols, open initiatives can enable broader participation in shaping how these systems are developed and deployed. This collaborative approach not only accelerates scientific progress through collective improvement but in addition helps establish community-driven standards for safety and trust. When agent development happens within the open, it becomes harder for any single entity to compromise on relevant and vital values like privacy and truthfulness for industrial gain. The transparency inherent in open development also creates natural accountability, because the community can confirm agent behavior and be sure that development stays aligned with public interest quite than narrow corporate objectives. This openness is especially vital as agents turn into more sophisticated and their societal impact grows.
- Developers are prone to create more agentic “base models”. That is clearly foreseeable based on current trends and research patterns, not a suggestion we’re providing relevant to ethics. Current agent technology utilizes a set of recent and older techniques in computer science – near-term future research will likely try and train agent models as one monolithic general model, a sort of multimodal model++: Trained to perform actions jointly with learning to model text, images, etc.
We thank Bruna Trevelin, Orion Penner, and Aymeric Roucher for contributions to this piece.

