Introduction
After I began to study AI probably the most fascinating ideas was that machines think like humans. But when taking a more in-depth have a look at what AI and machine learning methods are literally doing, I used to be surprised there actually is a big gap between what yow will discover in courses and books about how humans think, i.e., human cognition, and the best way machines do. Examples of those gaps for me were: how a perceptron works, which is also known as “inspired by its biological pendant” and the way real neurons work. Or how fuzzy logic tries to model human concepts of knowledge and inference and the way human inference actually seems to work. Or how humans cluster a cloud of points by taking a look at it and drawing circles around point clouds on a board and the way algorithms like DBSCAN and k-means perform this task.
But now, LLMs like ChatGPT, Claude, and LLaMA have come into the highlight. Based on billions and even trillions of those artificial neurons and mechanisms that even have a crucial part to play in cognition: attention (which is all you wish obviously). We’ve come a good distance, and meanwhile Nobel Prizes have been won to honor the early giants on this field. LLMs are insanely successful in summarizing articles, generating code, and even answering complex questions and being creative. A key point is — no doubts about it—the fitting prompt. The higher you specify what you wish from the model, the higher is the consequence. Prompt engineering has turn out to be an evolving field, and it has even turn out to be a specialized job for humans (though I personally doubt the long-term way forward for this role). Quite a few prompting strategies have been proposed: famous ones are Chain-of-thought (CoT) [2] or Tree-of-Thought (ToT) [3] that guide the language model reasoning step-by-step, mainly by providing the LLM steps of successful problem solving examples. But these steps are often concrete examples and require an explicit design of an answer chain.
Other approaches attempt to optimize the prompting, for instance with evolutionary algorithms (EAs) like PromptBreeder. Personally I feel EAs are at all times a superb idea. Very recently, a research team from Apple has shown that LLMs can easily be distracted from problem solving with different prompts [4]. As there are many good posts, also on TDS on CoT and prompt design (like here recently), I feel no must recap them here in additional detail.
What Is Cognitive Prompting?
Something continues to be missing, as there’s obviously a niche to cognitive science. That every one got me considering: can we help these models “think” more like humans, and the way? What in the event that they could possibly be guided by what cognitive science refers to as cognitive operations? For instance, approaching an issue by breaking it down step-by-step, to filter out unnecessary information, and to acknowledge patterns which can be present within the available information. Sounds a bit like what we do when solving difficult puzzles.
That’s where cognitive prompting is available in. Imagine the AI cannot only answer your questions but in addition guide itself — and also you while you read its output — through complex problem-solving processes by “considering” in structured steps.
Imagine you’re solving a math word problem. The very first thing you do might be to make clear your goal: What exactly do I would like to determine, what’s the consequence we expect? Then, you break the issue into smaller steps, a promising way is to discover relevant information, and maybe to note patterns that help guiding your thoughts closer toward the specified solution. In this instance, let’s seek advice from these steps as goal clarification, decomposition, filtering, and pattern recognition. They’re all examples of cognitive operations (COPs) we perform instinctively (or which we’re taught to follow by a teacher in the perfect case).
But How Does This Actually Work?
Here’s how the method unfolded. We define a sequence of COPs and ask the LLM to follow the sequence. Figure 1 shows an example of what the prompt looks like. Example COPs that grow to be vital are:
- Goal Clarification: The model first needed to restate the issue in a transparent way — what exactly is it trying to resolve, what’s the specified consequence?
- Decomposition: Next, break the issue into manageable chunks. As an alternative of getting overwhelmed by all the data available, the model should deal with solving smaller parts — one by one.
- Filtering: Ask the model to filter out unnecessary details, allowing it to deal with what really matters. This is commonly vital to permit the model to place attention on the really vital information.
- Pattern Recognition: Discover patterns to resolve the issue efficiently. For instance, if an issue involves repeated steps, ask the model to acknowledge a pattern and apply it.
- Integration: Ultimately it is smart to synthesize all insights of the previous steps, particularly based on the last COPs and integrate them into an answer for the ultimate answer.
These structured steps mimic the best way humans solve problems — logically, step-by-step. There are many further cognitive operations and the selection which to decide on, which order and specify them for the prompt. This definitely leaves room for further improvement.
We already prolonged the approach in the next way. As an alternative of following a static and deterministic order of COPs, we give the model the liberty to decide on its own sequence of COPs based on the provided list — called reflective and self-adaptive cognitive prompting. It seems that this approach works pretty much. In the subsequent paragraph we compare each variants on a benchmark problem set.
What also seems to enhance the performance is adapting the COP descriptions to the particular problem domain. Figure 1, right, shows an example of a math-specific adaptation of the final COPs. They “unroll” to prompts like “Define each variable clearly” or “Solve the equations step-by-step”.
In practice, it is smart to advise the model to provide the ultimate answer as a JSON string. Some LLMs don’t deliver an answer, but Python code to resolve the issue. In our experimental evaluation, we were fair and ran the code treating the reply as correct when the Python code returns the proper result.
Example
Let’s give a brief example asking LLaMA3.1 70B to resolve considered one of the 8.5k arithmetic problems from GSM8K [5]. Figure 2 shows the request.
Figure 3 shows the model’s output resulting in an accurate answer. It seems the model systematically follows the sequence of COPs — even providing a pleasant problem-solving explanation for humans.
How Does Cognitive Prompting Perform — Scientifically?
Now, let’s turn out to be slightly more systematic by testing cognitive prompting on a typical benchmark. We tested it on a set of math problems from the GSM8K [5] dataset — mainly, a group of math questions you’d find in grade school. Again, we used Meta’s LLaMA models to see if cognitive prompting could improve their problem-solving skills, appliying LLaMA with 8 billion parameters and the much larger version with 70 billion parameters.
Figure 4 shows some results. The smaller model improved barely with deterministic cognitive prompting. Perhaps it isn’t large enough to handle the complexity of structured considering. When it selects an own sequence of COPs, the win in performance is significantly.
Without cognitive prompting, the larger model scored about 87% on the maths problems. Once we added deterministic cognitive prompting (where the model followed a hard and fast sequence of cognitive steps), its rating jumped to 89%. But after we allowed the model to adapt and select the cognitive operations dynamically (self-adaptive prompting), the rating shot as much as 91%. Not bad for a machine getting quite general advice to reason like a human — without additional examples , right?
Why Does This Matter?
Cognitive prompting is a technique that organizes these human-like cognitive operations right into a structured process and uses them to assist LLMs solve complex problems. In essence, it’s like giving the model a structured “considering strategy” to follow. While earlier approaches like CoT have been helpful, cognitive prompting offers even deeper reasoning layers by incorporating quite a lot of cognitive operations.
This has exciting implications beyond math problems! Take into consideration areas like decision-making, logical reasoning, and even creativity — tasks that require greater than just regurgitating facts or predicting the subsequent word in a sentence. By teaching AI to think more like us, we open the door to models that may reason through problems in ways which can be closer to human cognition.
Where Do We Go From Here?
The outcomes are promising, but that is just the start. Cognitive prompting could possibly be adapted for other domains of course, but it may even be combined with other ideas from AI As we explore more advanced versions of cognitive prompting, the subsequent big challenge will likely be determining optimize it across different problem types. Who knows? Perhaps at some point, we’ll have AI that may tackle anything from math problems to moral dilemmas, all while considering as logically and creatively as we do. Rejoice trying out cognitive prompting on your personal!
References
[1] O. Kramer, J. Baumann. Unlocking Structured Considering in Language Models with Cognitive Prompting (submission to ICLR 2025)
[2] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou. Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Bel- grave, K. Cho, and A. Oh, editors, Neural Information Processing Systems (NeurIPS) Workshop, volume 35, pages 24824–24837, 2022
[3] S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y. Cao, and K. Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. In Neural Information Processing Systems (NeurIPS), volume 36, pages 11809–11822, 2023
[4] I. Mirzadeh, K. Alizadeh, H. Shahrokhi, O. Tuzel, S. Bengio, and M. Farajtabar. GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. 2024.
[5] K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plap- pert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman. Training verifiers to resolve math word problems. arXiv preprint arXiv:2110.14168, 2021.