To learn from AI, your organization’s learning loops must evolve Single-loop learning and the race for velocity Double-loop learning and opportunity cost Steering the loop with design critique: an important customer profit Experiments and leading indicators Triple-loop learning and the desirable future Applying the design process to the third loop Quadruple loop learning


A protracted time ago, the most important obstacle to success was having the correct groundbreaking idea. Today, — the most important obstacle is putting within the work to bring one idea to life. But not for for much longer.

Corporations have been attempting to drive the prices of delivery down for years, embracing practices like off-shoring development work and no-code tools. To an industry where the power to deliver something to market is commonly a sufficient differentiator, GPT4 looks like the last word solution to our problems. It’s now possible to assume a future where AI tools make outputs nearly instantaneous.

But delivery is removed from the one remaining bottleneck to creating customer value. To meaningfully profit from AI, your organization’s decision-making humans must have the option to maintain up with it.

Three loops connect at one inception point. They are labeled: Is this the right problem to solve? Do the requirements describe the best solution to this problem? Does the implementation meet the requirements?
Within the foreseeable future, AI-powered delivery tools will operate comfortably on the single-loop level. But human intellect will still be needed to steer those outputs towards invaluable outcomes.

The one approach to survive in an environment where a thousand GPT-generated apps demand the eye of your customers can be to fight quantity with quality — and that can require learning how you can work out what quality means to your customer, and whether that definition of quality overlaps with business needs.

The productivity of teams within the age of AI can be measured by whether or not they stage the correct experiments and by how quickly they will learn from the outcomes. To stay competitive, product managers must familiarize themselves with applied organizational learning theory — otherwise generally known as design.

“The more efficient you might be at doing the incorrect thing, the wronger you change into.” –Russel Ackoff

The effectiveness of a decision-making loop is defined by two variables. The more obvious one is velocity. Tighter feedback loops end in quicker progress towards the goal. But the opposite variable — the variety of layers in your organizational learning loop — will likely be the more impactful of the 2.

Consider a typical product team that has adopted the advisable SAFe mechanisms in an effort to enhance their velocity. Going through product increment planning lets the team define their work crisply and take away blockers ahead of time. That is the primary learning loop: are we doing an excellent job of delivering the outputs we defined?

Imagine that this team acquired a latest AI-powered tool, which robotically wrote code. As a substitute of assigning tickets to developers, the team could just plug their Definition of Done directly into ChatGPT, and get the vital code written in seconds as an alternative of days. Their velocity could be off the charts, they usually will surely deliver all the required outputs on time. Sounds nice — if the variety of outputs was their ultimate goal.

If this team followed SAFe, their success would even be measured quarterly — via lagging indicators resembling revenue or NPS. Irrespective of how quickly they ship, even in the event that they could deliver the following version instantaneously, this team would still must wait three months to seek out out in the event that they did an excellent job, and determine the necessities for the following release.

One of the best they will do within the meantime is say, “hmm, that’s not right,” and do SAFe even harder or make some small tweaks to seek out a local maximum.

In other words, while this team was in a position to tighten their first loop, they’re still constrained by the following level of organizational learning. And not using a approach to define what to construct (and more importantly what not to construct), with the ability to deliver features on the speed of ideas becomes a curse somewhat than a blessing. No amount of AI enhancements to the tools this team uses for producing outputs would allow them to course-correct faster and thus achieve organizational outcomes more quickly.

“You possibly can do anything, but not every little thing.” –David Allen

The second organizational learning loop is choosing the correct approach to succeed in the goal. Probably the most famous double learning loop is OODA (Observe, Orient, Determine, Act). If single-loop learning represents simply observing (“are we done yet?”) and acting (“do more work”), the double-loop adds two additional steps: did the work we already do get us closer to our goal? and should we alter the sort of work we’re doing? Intuitively, adding extra steps to a process appears like it’s going to slow it down somewhat than speed it up, but don’t forget that slow is smooth, and smooth is fast.

Unfortunately for many product teams, the throughout of this decision loop is constrained by a scarce resource: user attention. To measure the effectiveness of something they’ve shipped, they need to wait for usage metrics to are available in. If one is measured on the variety of monthly lively users, it’s going to at all times take one month before a meaningful uptick may be detected.

There may be a line of pondering amongst proponents of LLM tools that GPT has sufficient reasoning skill to operate at this level of decision-making. If GPT can pass the Bar, surely it could make business decisions — or not less than tell us in regards to the problems with our website. Then we’ll have the option to generate the correct product instantaneously, right?

Well, not so fast. Not only are attempts to emulate user research with LLMs doomed from the beginning but your competitors are going to have access to the exact same insights as you. The human within the decision-making loop is there to remain because that human is the one edge any company goes to have over a competitor.

Unlike software, humans don’t scale elastically. While the variety of AI queries you’ll be able to run is simply constrained by your AWS budget, you simply have so many individuals (employees and users) to ask questions. Any query you select to ask incurs opportunity cost — the worth you pay just isn’t having enough time to ask another query.

Fortunately, there may be a discipline where opportunity cost constraints are a given. This discipline has developed a robust mechanism to short-circuit second-loop learning and take motion long before the trailing indicators have rolled in.

After 20 years of “pondering like a designer,” it’s time to learn how you can critique like one.

Common conception of the design process oscillates between two phases: the designer doing design, and the designer testing their work with users. Critique is the crucial layer between these two activities that exists to optimize the usefulness of insights gained from research by strengthening the rigor of the designer’s pondering.

A series of nexted loops. From smallest to largest: Expertise — Does this decision make sense within my conceptual model? Critique — Does my conceptual model make sense within our scoping of the opportunity? Experiment — Does the scoping of the opportunity make sense within the user’s context of needs and goals? Position — Does fulfilling these needs create the future we want?
The design process consists of nested feedback loops. The faster inner loops seek the local maximum. The more expensive outer loops provide answers to the more invaluable questions that discover a worldwide maximum.

Only poorly-done critique hinges on “good taste.” Skilled critique refines not only a visible artifact, however the shared mental model which defines what “good” means within the context of the issue at hand. This mental model is what connects the dots between business goals like “increase monthly lively users (MAU)” and definition of done requirements like “follows the product’s established interaction patterns.”

Design critique may be applied to the outputs of an LLM just as easily as to designs created by hand. To someone acquainted with design critique, the outputs of GPT models look lots just like the work of a junior designer who can produce visuals but struggles to articulate why. They each make decisions because they’ve seen it done that way before — but they don’t know why doing it that way made sense in that context.

To guide design decisions towards desired outcomes, critique at all times begins with the query — “what was your goal?” What end result were you hoping for, what problem were you solving, what opportunity were you pursuing? This can be a critical query to ask, because when solving wicked problems the issue framing can never be taken with no consideration, and may evolve throughout the design process.

After framing their goal, a designer in critique will explain the first user profit their work was trying to supply; in other words, what missing capability was causing the issue. Different solution concepts are then in comparison with each other based on how well they deliver that primary profit.

There are two common critique questions that designers ask to discover gaps on this thought process:

  • If the reply is unsatisfying, the answer concept could also be poorly framed. The hypothesis for the way it solves the issue is missing.
  • If the reply is “no,” your opportunity can have been framed as “users don’t have the feature we would like to construct.” The framing itself needs work.

“In the event you can’t judge the standard of the reply, asking is pointless.” Amy Hoy

Every solution is an assumption, and even the most effective design critique can only refine that assumption. The designer’s adage “the user just isn’t like me” has never been truer than when the designer is a machine with no lived experience of its own. For this reason we want the third critical feedback loop of the design process — audience testing. And of the three loops, its throughput is constrained by opportunity cost essentially the most.

There may be a reason that critiques concentrate on helping the designer narrow down the variety of ideas: no matter what number of you produce, there are only so many user eyeballs to judge them. Whether you’ve ten options from a human designer or ten thousand from GPT, only a number of of them can undergo proper, high-quality testing (and doing bad testing as an alternative just isn’t the reply).

Jumping straight from setting an objective like “increase NPS” to testing potential features which may increase NPS is a poor experiment since the overwhelming majority of results can be inconclusive. Was the execution itself bad? Was it providing an unnecessary capability? Or was your complete problem that the feature was solving a non-issue for patrons to start with?

As a substitute, designers frame a separate hypothesis for every of those questions, and test them in sequence. Using low-fidelity artifacts like scenario storyboards to cheaply evaluate the magnitude of a number of pain points will avoid any confounding aspects and help align the team around one most significant problem to resolve. Similarly, research into potential capabilities to resolve the issue will discover one primary user profit that may be invaluable to deliver. After which it might be trivial to place the query to GPT: how will we provide this exact profit to the user?

In specific context, the behavior people perform to reach goal has friction — a problem hypothesis. Overcoming the friction improves key result, leading to objective — a business impact hypothesis. Solution provides the capability they need to overcome the friction — a solution hypothesis.
Hypothesis-driven design is a framework for making underlying assumptions explicit and discrete from each other, and due to this fact testable.

In fact, the integrity of this whole process relies on conclusive evidence — leading indicators — being available. Without it, we’re back to the team that waits 3 months for his or her NPS benchmark or 1 month to update their MAUs. These experiments are only as invaluable because the accuracy of the team’s proxy metrics, which brings us to the third loop of organizational learning.

“The compass determines direction. The navigation determines the route. The route results in the destination. In that order. The order is essential.” –A.R. Moxon

In a world obsessive about making outputs easier to realize, an important query — whether the consequences of those outputs bring us closer to what we actually want — often falls by the wayside.

The tools to set great outcome-based goals already exist. Unfortunately, as feature teams applied these tools in an effort to realize agile transformation without reforming their strategy, they suffered drift. Because feature teams had no accountability for outcomes, they used these tools to measure their outputs as an alternative.

But in a world where AI tools generate code with unlimited velocity, “outcomes over outputs” stops being aspirational and becomes existential. Managers can have to re-learn how you can set measurable end result goals (what John Cutler calls inputs to the North Star metric) and form a useful hypothesis for what opportunities the business should pursue to realize those outcomes.

A pyramid with a North Star at the top: Total   monthly  items received  on time. A line runs down multiple options to trace a course of action: the chosen input metric (size of order), the opportunity (customers can’t find their brand), the primary user benefit (discovery), and a potential solution (a peer social feed)
Tracing the provenance of decision-making down from the north star metric to a possible solution that the responsible product team might test.

With a human delivery team, managers could get away with very fuzzy requirements, and depend on their reports to work out the small print. While not very helpful, feedback along the lines of “I’ll realize it after I see it” was not less than sufficient to get those teams fascinated by how else the deliverable could work. The thought technique of “what’s it?” might be displaced from the stakeholder to the designer. Humans working on the output could fall back on their mental model of user must fill within the gaps — and when that mental model differed from the leader’s understanding, they might beat back.

But this displacement just isn’t possible with a statistical model, because statistics cannot reason. Irrespective of how advanced tools like GPT change into, the underlying technology won’t ever have the option to interpret the intent behind the prompt, or let you know that you simply are incorrect to want that thing.

To articulate what they need their AI tool to provide, managers would require a crisp understanding of what outcomes they need to realize. And the identical process designers use to control the second loop can structure the pondering vital for the third.

The identical tools that help designers define a mental model around a single services or products may be applied to form a mental model on a better level of abstraction: business strategy across a number of products. In the identical way that design critique could connect the dots between the user goal and the correct approach to achieve it, it could help find the trail between the business goal and the vital inputs that can move us towards it.

Analogous to identifying a invaluable problem to resolve, a business leader must have the option to set a invaluable North Star: a number one indicator that could be a proxy for a self-evidently invaluable metric like retention or revenue. Determining this indicator just isn’t a nice-to-have; it’s literally the primary line of the job description. Within the language of OKRs, that is the Objective — the thing we’ve got decided we would like to realize.

Next come the input metrics to the North Star (in OKR terms: the Key Results). Together, moving these metrics in the correct direction should roll up to perform that crowning achievement. The parallels to the first user profit needs to be obvious: we don’t necessarily know how we are going to accomplish these, but we all know that we would like to try since it’s the most effective path to the specified end result.

And eventually, there are the levers that we expect we are able to move to realize those results. In healthy organizations, top leadership has some theory of victory — a conception of some overlap between the levers their org can move, and those that can result in positive impact on the input metrics. But just as designers take heed to user feedback, executives should expect their product teams to have their very own thoughts about the correct levers to drag — and in any respect costs resist the urge to inform them how to drag those levers, which could be analogous to designers dictating goals to the user!

There may be yet one more level of organizational learning to which an organization may aspire. The fourth learning loop covers how an organization learns to learn — how quickly it could ingest information in regards to the current state of the world and re-generate the goals it sets for its third, second, and first loops.

A diagram showing how the learning loops cycle through their various steps to drive actions.
The anatomy of organizational learning loops, via Lee, Hwang, and Moon

M. Jae Moon

Recent developments in productizing AI — resembling Microsoft enhancing Bing search with Chat GPT — point to a future during which these tools can function at the extent of this learning loop. But as with the opposite loops, language models cannot help us make decisions that secure a singular advantage available in the market. Only the integrity of our pondering — established via effective application of the design process — can try this.


What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x