This study is an element of a growing body of research warning in regards to the risks of deploying AI agents in real-world financial decision-making. Earlier this month, a gaggle of researchers from multiple universities argued that LLM agents must be evaluated totally on the premise of their risk profiles, not only their peak performance. Current benchmarks, they are saying, emphasize accuracy and return-based metrics, which measure how well an agent can perform at its best but overlook how safely it will possibly fail. Their research also found that even top-performing models usually tend to break down under adversarial conditions.
The team suggests that within the context of real-world funds, a tiny weakness—even a 1% failure rate—could expose the system to systemic risks. They recommend that AI agents be “stress tested” before being put into practical use.
Hancheng Cao, an incoming assistant professor at Emory University, notes that the value negotiation study has limitations. “The experiments were conducted in simulated environments that won’t fully capture the complexity of real-world negotiations or user behavior,” says Cao.
Pei, the researcher, says researchers and industry practitioners are experimenting with a wide range of strategies to scale back these risks. These include refining the prompts given to AI agents, enabling agents to make use of external tools or code to make higher decisions, coordinating multiple models to double-check one another’s work, and fine-tuning models on domain-specific financial data—all of which have shown promise in improving performance.
Many outstanding AI shopping tools are currently limited to product suggestion. In April, for instance, Amazon launched “Buy for Me,” an AI agent that helps customers find and buy products from other brands’ sites if Amazon doesn’t sell them directly.
While price negotiation is rare in consumer e-commerce, it’s more common in business-to-business transactions. Alibaba.com has rolled out a sourcing assistant called Accio, built on its open-source Qwen models, that helps businesses find suppliers and research products. The corporate told it has no plans to automate price bargaining thus far, citing high risk.
That could be a sensible move. For now, Pei advises consumers to treat AI shopping assistants as helpful tools—not stand-ins for humans in decision-making.
“I don’t think we’re fully able to delegate our decisions to AI shopping agents,” he says. “So possibly just use it as an information tool, not a negotiator.”