User Motion Sequence Modeling: From Attention to Transformers and Beyond

-

The search to LLM-ify recommender systems

Image generated using ChatGPT

User motion sequences are amongst probably the most powerful inputs in recommender systems: your next click, read, watch, play, or purchase is probably going at the very least somewhat related to what you’ve clicked on, read, watched, played, or purchased minutes, hours, days, months, and even years ago.

Historically, the established order for modeling such user engagement sequences has been pooling: for instance, a classic 2016 YouTube paper describes a system that takes the most recent 50 watched videos, collects their embeddings from an embedding table, and pools these right into a single feature vector with sum pooling. To avoid wasting memory, the embedding table for these sequence videos is shared with the embedding table for candidate videos themselves.

YouTube’s recommender system sum-pools the sequence of watched videos for a user. Covinton et al 2016

This simplistic approach corresponds roughly to a bag-of-words approach within the NLP domain: it really works, but it surely’s removed from ideal. Pooling doesn’t have in mind the sequential nature of inputs, nor the relevance of the item within the user history with respect to the candidate item we’d like to rank, nor any of the temporal information: an…

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x