Enhancing LLM decision-making: integrating language agent tree search with GPT-4o for superior problem-solving
Large Language Models (LLMs) have demonstrated exceptional abilities in performing natural language tasks that involve complex reasoning. In consequence, these models have evolved to operate as agents able to planning, strategising, and solving complex problems. Nonetheless, challenges persist in terms of making decisions under uncertainty, where outcomes will not be deterministic, or when adaptive decision-making is required in changing environments, especially in multi-step scenarios where each step influences the following. We want more advanced capabilities…
That is where GPT-4’s advanced reasoning capabilities and Language Agent Tree Search (LATS) come together to deal with these challenges. LATS incorporates a dynamic, tree-based search methodology that enhances the reasoning capabilities of GPT-4O. By integrating Monte Carlo Tree Search (MCTS) with LLMs, LATS unifies reasoning, acting, and planning, making a more deliberate and adaptive problem-solving framework. This powerful combination allows for improved decision-making and more robust handling of complex tasks, setting a brand new standard within the deployment of language models as autonomous agents.
Is “search” the missing piece in GenAI problem solving?
Computational problem solving will be broadly defined as “search through a combinatorial problem space”, represented as a tree. Depth-First Search (DFS) and Breadth-First Search (BFS) are fundamental methods for exploring such solution spaces. A notable example of the facility of deep search is AlphaGo’s “Move 37,” which showcased how progressive, human-surpassing solutions can emerge from extensive exploration.
Unlike traditional methods that follow predefined paths, LLMs can dynamically generate recent branches throughout the solution space by predicting potential outcomes, strategies, or actions based on context. This capability allows LLMs to not only navigate but in addition expand the issue space, making them exceptionally powerful in situations where the issue structure will not be fully known, is constantly evolving, or is very complex.
Inference-time Reasoning with Meta Generation Algorithms (MGA)
Scaling compute during training is widely recognised for its ability to enhance model performance. The advantages of scaling compute during inference remain under-explored. MGA’s offer a novel approach by amplifying computational resources during inference…
Unlike traditional token-level generation methods, meta-generation algorithms employ higher-order control structures similar to planning, loops with multiple model calls, self-reflection, task decomposition, and dynamic conditioning. These mechanisms allow the model to execute tasks end-to-end, mimicking higher-level cognitive processes sometimes called Systems-2 considering.
Subsequently one-way meta generation algorithms may enhance LLM reasoning by integrating search into the generation process. During inference, MGA’s dynamically explore a broader solution space, allowing the model to reason through potential outcomes and adapt strategies in real-time. By generating multiple paths and evaluating their viability, meta generation algorithms enable LLMs to simulate deeper, more complex reasoning akin to traditional search methods. This approach not only expands the model’s ability to generate novel insights but in addition improves decision-making in scenarios with incomplete or evolving information.
Techniques like Tree of Thoughts (ToT), and Graph of Thought (GoT) are employed to navigate combinatorial solution spaces efficiently.
- ToT (2*) enables hierarchical decision-making by structuring potential outcomes as tree branches, facilitating exploration of multiple paths.
- GoT (6*)maps complex relationships between ideas, allowing the model to dynamically adjust and optimize its reasoning path.
- CoT (5*) provides step-by-step reasoning that links sequential thoughts, improving the coherence and depth of the generation.
Within the Tree of Thoughts (ToT) approach, traditional methods like Depth-First Search (DFS) or Breadth-First Search (BFS) can navigate this tree, but they’re computationally expensive because they explore each possible path systematically & exhaustively.
Monte Carlo Tree Search (MCTS) is an improvement on this by simulating different outcomes for actions and updating the tree based on these simulations. It uses a “selection” process where it picks decision nodes using a technique that balances exploration (trying recent paths) and exploitation (selecting known good paths). That is guided by a formula called Upper Confidence Certain (UCB).
The UCB formula has two key parts:
- Exploration Term: This represents the potential reward of selecting a node and is calculated through simulations.
- Exploitation Term: This decreases the deeper you go right into a certain path, meaning that if a path is over-explored, the algorithm may shift to a less-explored path even when it seems less promising initially.
By choosing nodes using UCB, simulating outcomes (rewards) with LLMs, and back-propagating the rewards up the tree, MCTS effectively balances between exploring recent strategies and exploiting known successful ones.
The second a part of the UCB formula is the ‘exploitation term,’ which decreases as you explore deeper into a selected path. This decrease may lead the choice algorithm to change to a different path in the choice tree, even when that path has a lower immediate reward, since the exploitation term stays higher when that path is less explored.
Node selection with UCB, reward calculations with LLM simulations and backpropagation are the essence of MCTS.
An Implementation — Financial Decision Making…
For the sake of demonstration we are going to use LATS to unravel the difficult problem of coming up with the optimal investment strategy in todays macroeconomic climate. We’ll feed LLM with the macro-economic statu susing the “IMF World Economic Outlook Report” because the context simply summarising the document. RAG will not be used. Below is an example as to how LATS searches through the answer space…
Iteration 1:
- Selection: We start at the basis node, and since that is the primary LATS iteration, we are going to select all initial decision nodes generated by the LLM (A, B, and C nodes) and simulate their outcomes.
- Simulation & Backpropagation: Next LLM “simulates” each strategy based on the context it has and assigns the next “rewards” — investment returns — to every “node”.
- Strategy A: $5,000
- Strategy B: $7,000
- Strategy C: $4,000
3. Expansion: Based on the choice, Strategy B has the very best UCB1 value (since all nodes are at the identical depth), so we expand only Strategy B by simulating its child nodes.
Iteration 2:
- Selection: Since B1 & B2 strategies will not be simulated, there may be a tie when it comes to their UCB scores and each nodes will probably be simulated.
- Simulate Each Nodes:
- Simulate B1: LLM predicts a return of $8,500 for B1.
- Simulate B2: LLM predicts a return of $7,500 for B2.
3. Backpropagation:
After each simulation, results of the simulation are back-propagated up the tree, updating the values of the parent nodes. This step ensures that the impact of the brand new information is reflected throughout the tree.
Updating Strategy B’s Value: Strategy B now must reflect the outcomes of B1 and B2. One common approach is to average the rewards of B1 and B2 to update Strategy B’s value. Now, Strategy B has an updated value of $8,000 based on the outcomes of its child nodes.
4. Recalculate UCB Scores:
After backpropagation, the UCB scores for all nodes within the tree are recalculated. This recalculation uses the updated values (average rewards) and visit counts, ensuring that every node’s UCB1 rating accurately reflects each its potential reward and the way much it has been explored.
UCB(s) = (exploration/reward term)+ (exploitation term)
Note again the exploitation term decreases for all nodes on a path that’s continously explored deeper.
5. Next selection & simulation:
B1 is chosen for further expansion (because it has the upper reward) into child nodes:
- B1a: “Spend money on AI corporations”
- B1b: “Spend money on green tech”
6. Backpropagation:
B1 reward updated as (9200 + 6800) / 2 = 8000
B reward updated as (8000 + 7500) / 2 = 7750
7.UCB Calculation:
Following backpropagation UCB values of all nodes are recalculated. Assume that because of the decaying exploration factor, B2 now has a better UCB rating than each B1a and B1b. This might occur if B1 has been extensively explored, reducing the exploration term for its children. As an alternative of constant to expand B1’s children, the algorithm shifts back to explore B2, which has turn into more attractive because of its unexplored potential i.e. higher exploitation value.
This instance illustrates how MCTS can dynamically adjust its search path based on recent information, ensuring that the algorithm stays efficient and focused on probably the most promising strategies because it progresses.
An Implementation with Azure OpenAI GPT-4o
Next we are going to construct a “financial advisor” using GPT-4o, implementing LATS. (Please check with the Github repo here for the code.)
(For an accurate evaluation I’m using the IMF World Economic Outlook report from July, 24 as my LLM context for simulations i.e. for generating child nodes and for assigning rewards to decision nodes …)
Here is how the code runs…
The code leverages the graphviz
library to visually represent the choice tree generated in the course of the execution of the investment strategy simulations. Decision tree is simply too wide and can’t fit right into a single picture hence I even have added snippets as to how the tree looks below. Yow will discover a sample decision tree within the github repo here…
Below is the optimal strategy inferred by LATS…
Optimal Strategy Summary: The optimal investment strategy is structured around several key steps influenced by the IMF report. Here's a concise summary of every step and its significance:
1. **Diversification Across Geographies and Sectors:**
- **Geographic Diversification:** This involves spreading investments across regions to mitigate risk and tap into different growth potentials. Advanced economies just like the U.S. remain essential because of their robust consumer spending and resilient labor market, however the portfolio should include cautious weighting to administer risks. Concurrently, emerging markets in Asia, similar to India and Vietnam, are highlighted for his or her higher growth potential, providing opportunities for higher returns.
- **Sector Diversification:** Incorporating investments in sectors like green energy and sustainability reflects the growing global emphasis on renewable energy and environmentally friendly technologies. This also aligns with regulatory changes and consumer preferences, creating future growth opportunities.
2. **Green Energy and Sustainability:**
- Investing in green energy demonstrates foresight into the worldwide shift toward reducing carbon footprints and reliance on fossil fuels. This is critical because of increased governmental supports, similar to subsidies and policy incentives, that are prone to propel growth inside this sector.
3. **Fintech and E-Commerce:**
- Allocating capital towards fintech and e-commerce corporations capitalizes on the digital transformation accelerated by the worldwide shift towards digital platforms. This sector is predicted to grow because of increased adoption of online services and digital payment systems, thus presenting promising investment opportunities.
Conclusion:
By integrating LATS, we harness the reasoning capabilities of LLMs to simulate and evaluate potential strategies dynamically. This mix allows for the development of decision trees that not only represent the logical progression of choices but in addition adapt to changing contexts and insights, provided by the LLM through simulations and reflections.
(Unless otherwise noted, all images are by the creator)
References:
[1] Language Agent Tree Search: Unifying Reasoning, Acting, and Planning in Language Models by Zhou et al
[2] Tree of Thoughts: Deliberate Problem Solving with Large Language Models by Yao et al
[3] The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey by Tula Masterman, Mason Sawtell, Sandi Besen, and Alex Chao
[4] From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models” by Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf*, Alex Xie, Graham Neubig, Ilia Kulikov, and Zaid Harchaoui.
[5] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models by Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou
[7] Graph of Thoughts: Solving Elaborate Problems with Large Language Models by Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michał Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler.
[8] From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models” by Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, and Zaid Harchaoui.