Travel agents help to supply end-to-end logistics — like transportation, accommodations, meals, and lodging — for businesspeople, vacationers, and everybody in between. For those trying to make their very own arrangements, large language models (LLMs) appear to be they’d be a robust tool to employ for this task due to their ability to iteratively interact using natural language, provide some commonsense reasoning, collect information, and call other tools in to assist with the duty at hand. Nevertheless, recent work has found that state-of-the-art LLMs struggle with complex logistical and mathematical reasoning, in addition to problems with multiple constraints, like trip planning, where they’ve been found to supply viable solutions 4 percent or less of the time, even with additional tools and application programming interfaces (APIs).
Subsequently, a research team from MIT and the MIT-IBM Watson AI Lab reframed the difficulty to see if they may increase the success rate of LLM solutions for complex problems. “We consider quite a lot of these planning problems are naturally a combinatorial optimization problem,” where that you must satisfy several constraints in a certifiable way, says Chuchu Fan, associate professor within the MIT Department of Aeronautics and Astronautics (AeroAstro) and the Laboratory for Information and Decision Systems (LIDS). She can be a researcher within the MIT-IBM Watson AI Lab. Her team applies machine learning, control theory, and formal methods to develop protected and verifiable control systems for robotics, autonomous systems, controllers, and human-machine interactions.
Noting the transferable nature of their work for travel planning, the group sought to create a user-friendly framework that may act as an AI travel broker to assist develop realistic, logical, and complete travel plans. To attain this, the researchers combined common LLMs with algorithms and an entire satisfiability solver. Solvers are mathematical tools that rigorously check if criteria will be met and the way, but they require complex computer programming to be used. This makes them natural companions to LLMs for problems like these, where users want help planning in a timely manner, without the necessity for programming knowledge or research into travel options. Further, if a user’s constraint can’t be met, the brand new technique can discover and articulate where the difficulty lies and propose alternative measures to the user, who can then select to just accept, reject, or modify them until a legitimate plan is formulated, if one exists.
“Different complexities of travel planning are something everyone may have to take care of sooner or later. There are different needs, requirements, constraints, and real-world information you can collect,” says Fan. “Our idea just isn’t to ask LLMs to propose a travel plan. As an alternative, an LLM here is acting as a translator to translate this natural language description of the issue right into a problem that a solver can handle [and then provide that to the user],” says Fan.
Co-authoring a paper on the work with Fan are Yang Zhang of MIT-IBM Watson AI Lab, AeroAstro graduate student Yilun Hao, and graduate student Yongchao Chen of MIT LIDS and Harvard University. This work was recently presented on the Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics.
Breaking down the solver
Math tends to be domain-specific. For instance, in natural language processing, LLMs perform regressions to predict the following token, a.k.a. “word,” in a series to investigate or create a document. This works well for generalizing diverse human inputs. LLMs alone, nevertheless, wouldn’t work for formal verification applications, like in aerospace or cybersecurity, where circuit connections and constraint tasks should be complete and proven, otherwise loopholes and vulnerabilities can sneak by and cause critical questions of safety. Here, solvers excel, but they need fixed formatting inputs and struggle with unsatisfiable queries. A hybrid technique, nevertheless, provides a possibility to develop solutions for complex problems, like trip planning, in a way that’s intuitive for on a regular basis people.
“The solver is absolutely the important thing here, because once we develop these algorithms, we all know exactly how the issue is being solved as an optimization problem,” says Fan. Specifically, the research group used a solver called satisfiability modulo theories (SMT), which determines whether a formula will be satisfied. “With this particular solver, it’s not only doing optimization. It’s doing reasoning over quite a lot of different algorithms there to know whether the planning problem is feasible or not to resolve. That’s a fairly significant thing in travel planning. It’s not a really traditional mathematical optimization problem because people provide you with all these limitations, constraints, restrictions,” notes Fan.
Translation in motion
The “travel agent” works in 4 steps that will be repeated, as needed. The researchers used GPT-4, Claude-3, or Mistral-Large as the tactic’s LLM. First, the LLM parses a user’s requested travel plan prompt into planning steps, noting preferences for budget, hotels, transportation, destinations, attractions, restaurants, and trip duration in days, in addition to some other user prescriptions. Those steps are then converted into executable Python code (with a natural language annotation for every of the constraints), which calls APIs like CitySearch, FlightSearch, etc. to gather data, and the SMT solver to start executing the steps specified by the constraint satisfaction problem. If a sound and complete solution will be found, the solver outputs the result to the LLM, which then provides a coherent itinerary to the user.
If a number of constraints can’t be met, the framework begins in search of an alternate. The solver outputs code identifying the conflicting constraints (with its corresponding annotation) that the LLM then provides to the user with a possible treatment. The user can then determine easy methods to proceed, until an answer (or the utmost variety of iterations) is reached.
Generalizable and robust planning
The researchers tested their method using the aforementioned LLMs against other baselines: GPT-4 by itself, OpenAI o1-preview by itself, GPT-4 with a tool to gather information, and a search algorithm that optimizes for total cost. Using the TravelPlanner dataset, which incorporates data for viable plans, the team checked out multiple performance metrics: how continuously a way could deliver an answer, if the answer satisfied commonsense criteria like not visiting two cities in at some point, the tactic’s ability to fulfill a number of constraints, and a final pass rate indicating that it could meet all constraints. The brand new technique generally achieved over a 90 percent pass rate, in comparison with 10 percent or lower for the baselines. The team also explored the addition of a JSON representation inside the query step, which further made it easier for the tactic to supply solutions with 84.4-98.9 percent pass rates.
The MIT-IBM team posed additional challenges for his or her method. They checked out how necessary each component of their solution was — resembling removing human feedback or the solver — and the way that affected plan adjustments to unsatisfiable queries inside 10 or 20 iterations using a brand new dataset they created called UnsatChristmas, which incorporates unseen constraints, and a modified version of TravelPlanner. On average, the MIT-IBM group’s framework achieved 78.6 and 85 percent success, which rises to 81.6 and 91.7 percent with additional plan modification rounds. The researchers analyzed how well it handled recent, unseen constraints and paraphrased query-step and step-code prompts. In each cases, it performed thoroughly, especially with an 86.7 percent pass rate for the paraphrasing trial.
Lastly, the MIT-IBM researchers applied their framework to other domains with tasks like block picking, task allocation, the traveling salesman problem, and warehouse. Here, the tactic must select numbered, coloured blocks and maximize its rating; optimize robot task project for various scenarios; plan trips minimizing distance traveled; and robot task completion and optimization.
“I believe it is a very strong and revolutionary framework that may save quite a lot of time for humans, and in addition, it’s a really novel combination of the LLM and the solver,” says Hao.
This work was funded, partially, by the Office of Naval Research and the MIT-IBM Watson AI Lab.