
When many enterprises weren’t even fascinated about agentic behaviors or infrastructures, Booking.com had already “stumbled” into them with its homegrown conversational advice system.
This early experimentation has allowed the corporate to take a step back and avoid getting swept up within the frantic AI agent hype. As a substitute, it’s taking a disciplined, layered, modular approach to model development: small, travel-specific models for reasonable, fast inference; larger large language models (LLMs) for reasoning and understanding; and domain-tuned evaluations built in-house when precision is critical.
With this hybrid strategy — combined with selective collaboration with OpenAI — Booking.com has seen accuracy double across key retrieval, rating and customer-interaction tasks.
As Pranav Pathak, Booking.com’s AI product development lead, posed to VentureBeat in a brand new podcast: “Do you construct it very, very specialized and bespoke after which have a military of 100 agents? Or do you retain it general enough and have five agents which can be good at generalized tasks, but then you might have to orchestrate lots around them? That's a balance that I feel we're still attempting to work out, as is the remainder of the industry.”
Take a look at the brand new Beyond the Pilot podcast here, and proceed reading for highlights.
Moving from guessing to deep personalization without being ‘creepy’
Suggestion systems are core to Booking.com’s customer-facing platforms; nevertheless, traditional advice tools have been less about advice and more about guessing, Pathak conceded. So, from the beginning, he and his team vowed to avoid generic tools: As he put it, the worth and advice needs to be based on customer context.
Booking.com’s initial pre-gen AI tooling for intent and topic detection was a small language model, what Pathak described as “the dimensions and size of BERT.” The model ingested the client’s inputs around their problem to find out whether it could possibly be solved through self-service or bumped to a human agent.
“We began with an architecture of ‘you might have to call a tool if that is the intent you detect and that is the way you've parsed the structure,” Pathak explained. “That was very, very much like the primary few agentic architectures that got here out when it comes to reason and defining a tool call.”
His team has since built out that architecture to incorporate an LLM orchestrator that classifies queries, triggers retrieval-augmented generation (RAG) and calls APIs or smaller, specialized language models. “We've been capable of scale that system quite well since it was so close in architecture that, with a couple of tweaks, we now have a full agentic stack,” said Pathak.
Because of this, Booking.com is seeing a 2X increase in topic detection, which in turn is freeing up human agents’ bandwidth by 1.5 to 1.7X. More topics, even complicated ones previously identified as ‘other’ and requiring escalation, are being automated.
Ultimately, this supports more self-service, freeing human agents to deal with customers with uniquely-specific problems that the platform doesn’t have a dedicated tool flow for — say, a family that’s unable to access its hotel room at 2 a.m. when the front desk is closed.
That not only “really starts to compound,” but has a direct, long-term impact on customer retention, Pathak noted. “Considered one of the things we've seen is, the higher we’re at customer support, the more loyal our customers are.”
One other recent rollout is personalized filtering. Booking.com has between 200 and 250 search filters on its website — an unrealistic amount for any human to sift through, Pathak identified. So, his team introduced a free text box that users can type into to right away receive tailored filters.
“That becomes such a crucial cue for personalization when it comes to what you're searching for in your personal words reasonably than a clickstream,” said Pathak.
In turn, it cues Booking.com into what customers actually want. As an illustration, hot tubs — when filter personalization first rolled out, jacuzzi’s were one of the vital popular requests. That wasn’t even a consideration previously; there wasn’t even a filter. Now that filter is live.
“I had no idea,” Pathak noted. “I had never looked for a hot tub in my room truthfully.”
In terms of personalization, though, there’s a superb line; memory stays complicated, Pathak emphasized. While it’s necessary to have long-term memories and evolving threads with customers — retaining information like their typical budgets, preferred hotel star rankings or whether or not they need disability access — it should be on their terms and protective of their privacy.
Booking.com is incredibly mindful with memory, looking for consent in order to not be “creepy” when collecting customer information.
“Managing memory is far harder than actually constructing memory,” said Pathak. “The tech is on the market, we’ve got the technical chops to construct it. We would like to ensure that we don't launch a memory object that doesn't respect customer consent, that doesn't feel very natural.”
Finding a balance of construct versus buy
As agents mature, Booking.com is navigating a central query facing your complete industry: How narrow should agents turn into?
As a substitute of committing to either a swarm of highly specialized agents or a couple of generalized ones, the corporate goals for reversible decisions and avoids “one-way doors” that lock its architecture into long-term, costly paths. Pathak’s strategy is: Generalize where possible, specialize where mandatory and keep agent design flexible to assist ensure resiliency.
Pathak and his team are “very mindful” of use cases, evaluating where to construct more generalized, reusable agents or more task-specific ones. They strive to make use of the smallest model possible, with the very best level of accuracy and output quality, for every use case. Whatever might be generalized is.
Latency is one other necessary consideration. When factual accuracy and avoiding hallucinations is paramount, his team will use a bigger, much slower model; but with search and suggestions, user expectations set speed. (Pathak noted: “Nobody’s patient.”)
“We’d, for instance, never use something as heavy as GPT-5 for just topic detection or for entity extraction,” he said.
Booking.com takes a similarly elastic tack in terms of monitoring and evaluations: If it's general-purpose monitoring that another person is healthier at constructing and has horizontal capability, they’ll buy it. But when it’s instances where brand guidelines should be enforced, they’ll construct their very own evals.
Ultimately, Booking.com has leaned into being “super anticipatory,” agile and versatile. “At this point with the whole lot that's happening with AI, we’re just a little bit averse to walking through a method doors,” said Pathak. “We would like as lots of our decisions to be reversible as possible. We don't need to get locked into a call that we cannot reverse two years from now.”
What other builders can learn from Booking.com’s AI journey
Booking.com’s AI journey can function a crucial blueprint for other enterprises.
Looking back, Pathak acknowledged that they started off with a “pretty complicated” tech stack. They’re now in place with that, “but we probably could have began something much simpler and seen how customers interacted with it.”
On condition that, he offered this priceless advice: When you’re just starting out with LLMs or agents, out-of-the-box APIs will just do superb. “There's enough customization with APIs that you would be able to already get numerous leverage before you select you desire to go do more.”
Then again, if a use case requires customization not available through a typical API call, that makes a case for in-house tools.
Still, he emphasized: Don't start with the complicated stuff. Tackle the “simplest, most painful problem you could find and the only, most blatant solution to that.”
Discover the product market fit, then investigate the ecosystems, he advised — but don’t just rip out old infrastructures because a brand new use case demands something specific (like moving a whole cloud strategy from AWS to Azure just to make use of the OpenAI endpoint).
Ultimately: “Don't lock yourself in too early,” Pathak noted. “Don't make decisions which can be one-way doors until you might be very confident that that's the answer that you desire to go together with.”
