Home Artificial Intelligence Dynamic Pricing with Contextual Bandits: Learning by Doing

Dynamic Pricing with Contextual Bandits: Learning by Doing

1
Dynamic Pricing with Contextual Bandits: Learning by Doing

Adding context to your dynamic pricing problem can increase opportunities in addition to challenges

Photo by Artem Beliaikin on Unsplash

In my previous article, I conducted a radical evaluation of the most well-liked strategies for tackling the dynamic pricing problem using easy Multi-armed Bandits. For those who’ve come here from that piece, firstly, thanks. It’s on no account a straightforward read, and I really appreciate your enthusiasm for the topic. Secondly, prepare, as this recent article guarantees to be much more demanding. Nevertheless, if that is your introduction to the subject, I strongly advise starting with the previous article. There, I present foundational concepts, which I’ll assume readers are conversant in on this discussion.

Anyway, a transient recap: the prior evaluation aimed to simulate a dynamic pricing scenario. The predominant goal was to evaluate as quickly as possible various price points to search out the one yielding the best cumulated reward. We explored 4 distinct algorithms: greedy, ε-greedy, Thompson Sampling, and UCB1, detailing the strengths and weaknesses of every. Although the methodology employed in that article is theoretically sound, it bears oversimplifications that don’t delay in additional complex, real-world situations. Essentially the most problematic of those simplifications is the idea that the underlying process is stationary — meaning the optimal price stays constant regardless of the external environment. That is clearly not the case. Consider, for instance, fluctuations in demand during holiday seasons, sudden shifts in competitor pricing, or changes in raw material costs.

To unravel this issue, Contextual Bandits come into play. Contextual Bandits are an extension of the Multi-armed Bandit problem where the decision-making agent not only receives a reward for every motion (or “arm”) but additionally has access to context or environment-related information before selecting an arm. The context might be any piece of knowledge which may influence the consequence, comparable to customer demographics or external market conditions.

Here’s how they work: before deciding which arm to tug (or, in our case, which price to set), the agent observes the present…

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here