Recommender System: Collaborative Filtering with Matrix Factorization

Artificial Intelligence

Recommender System: Collaborative Filtering with Matrix Factorization

admin

April 26, 2023

Recommender System: Collaborative Filtering with Matrix Factorization

Explanation of Recommendations through Matrix Factorization

Netflix is a preferred online streaming platform that gives its subscribers a wide selection of films, documentaries, and TV shows. To enhance users’ experience, Netflix has developed a classy suggestion system that means movies based in your past viewing history, rankings, and preferences.

The recommender system uses complex algorithms that analyze vast amounts of information to predict what users will almost definitely enjoy. With over 200 million subscribers worldwide, Netflix’s suggestion system is a key think about its success and sets the usual for the streaming industry. Following is the source on how Netflix achieved 80% stream time through personalization link.

A recommender system is certainly one of unsupervised learning that uses information filtering to suggest products, or content to users based on their preferences, interests, and behavior. These systems are widely utilized in e-commerce and online streaming settings, and other applications to assist discover latest products and content that could be of interest to users.

Recommender systems are trained to know user and product preferences, past decisions, and characteristics using data collected about user-product interactions.

There are two sorts of suggestion systems as follows:

Content-based Filtering

The suggestion is predicated on the user or item attribute because the input to the algorithm. The contents of the shared attribute space are then used to create user and item profiles.

As an illustration, Spider-Man: No Way Home and Ant-Man and the Wasp: Quantumania have similar attributes as each movies are under the Motion/Adventure genre. Not only that, each are a part of Marvel. Due to this fact, if Alice watched Spider-Man movie, a content-based suggestion system may recommend movies with similar attributes like motion/Marvel movies.

Collaborative Filtering

Based on several users who’ve similar past interactions. The important thing idea of this approach is leveraging the concept of collaboration to provide a latest suggestion.

As an illustration, Alice and Bob have similar interests particularly movies genre. A collaborative filtering suggestion system may recommend items to Alice that Bob has watched previously which is latest to Alice since each of them have pretty similar preferences. And the reverse is true for Bob as well.

There may be a large scope of Recommender System model types as shown within the figure below, but today this text will concentrate on collaborative filtering (CF) with Matrix Factorization

**Form of Recommender System** -Image Illustrated by Creator

Put simply, Matrix Factorization is a mathematical process that transforms an advanced matrix right into a lower-dimensional space. One of the crucial popular matrix factorization techniques utilized in recommender systems is Singular Value Decomposition (SVD), Non-negative Matrix Factorization (NMF), and Probabilistic Matrix Factorization

Following is the illustration of how the matrix factorization concept is able to predicting the user-movie rating

Stage 1: Matrix Factorization will randomly initialize the number, and the number of things (K) is about. On this sample, we’ll set K = 5

User Matrix (green box) represents the association between each user and the features
Item Matrix (orange box) represents the association between each item and the features

Here, as an example, we’re creating 5 features (k=5) to represent the character of m_1 movie: comedy as 2.10, horror as 0.88, motion as 0.04, parent-guide as 0.02, and family-friendly as 0.04. And the reverse is true for user_matrix. User_matrix represents the character of user corresponding to prefered actors or directors, favorite movie production and plenty of more

Stage 2: Rating Prediction is calculated from the dot product of User Matrix and Item Matrix

where R as true rating, P as User Matrix, Q as Item Matrix, resulted R’ as predicted rating.

In higher mathematical notation, the predicted rating R’ might be represented within the equation as follows:

Stage 3: The squared error is used to calculate the difference between true rating and prediction rating

Once we now have these steps in place, we will optimize our parameters, using stochastic gradient descent. It’s going to then compute the derivative of this value

At each iteration, the optimizer will compute the match between each movie and every user by multiplying them using the dot product, then compare it to the actual rating that the user gave the movie. It’s going to then compute the derivative of this value and update the weights by multiplying it by the educational rate ⍺. As we repeat this process over and over, the loss will improve, leading to raised recommendations.

Considered one of matrix factorization models which were widely utilized in suggestion systems is referred to as Singular Value Decomposition (SVD). SVD itself has broad applications, including image compression, and noise reduction in signal processing. Moreover, SVD is usually employed in recommender systems, where it’s adept at addressing the sparsity issue inherent in large user-item matrices.

This text can even provide an outline of SVD implementation using the Surprise Package.

So let’s get our hands dirty with the implementation!!

Implementation Contents

Data Import
Data Pre-Processing
Implementation #1: Matrix Factorization in Python from Scratch
Implementation #2: Matrix Factorization with Surprise Package

The whole notebook on Matrix Factorization implementation is obtainable here.

Since we’re developing a suggestion system like Netflix, but we may not have access to their big data, we’re going to use an important dataset from MovieLens for this practice [1] with permission. Besides, you’ll be able to read and review their README files for the usage licenses and other details. This dataset comprises hundreds of thousands of films, users, and users’ past-interacting rating.

After extracting the zip file, there can be 4 csv given as follows:

**Snapshot of information** -Image by Creator

Btw, Collaborative Filtering has an issue with user cold-start. The cold-start problem refers to a situation during which a system or algorithm couldn’t make accurate predictions or recommendations for brand new users, items, or entities that has no prior information. This may occur when there’s little or no historical data available for the brand new users or items, making it difficult for the system to know their preferences or characteristics.

The cold-start problem is a standard challenge in suggestion systems, where the system needs to offer personalized recommendations for users with limited or no interaction history.

On this stage, we’re going to select users who’ve a minimum of interacted with 2000 movies and films who’ve been rated by 1000 users (this might be an excellent method to reduce the scale of information and ofc with less null data. Besides, my RAM could never handle massive table)

**My RAM condition** -Source: KC Green’s 2013 webcomic

Actually, it’s also possible to use the small subset of 100k rankings which is provided by MovieLens. I just need to optimize my computer resources as much as I can with less null data.

**Data output after data pre-processing** -Image by Creator

As is customary, we’ll divide the info into two groups: a training set and a testing set — by utilizing the train_test_split method.

While the data we require is present, it isn’t presented in a way that is useful for humans to grasp. Nonetheless, I actually have created a table that presents the identical data in a format that is less complicated for humans to know.

Here is the Python snippet for implementing Matrix Factorization with the gradient descent. The matrix_factorization function returns 2 matrices: nP (user matrix) and nQ (item matrix).

Then, fit the training dataset to the model and here I set n_factor K = 5. Following that, predictions might be computed by multiplying nP and the transpose of nQ using the dot product method, as illustrated within the code snippet below.

Because of this, here is the ultimate prediction that the matrix_factorization produce

**Recent predicted rating in train set**-Image form Creator

Prediction on the Test Set

The next snippet leverages the given nP (user matrix) and nQ (movie matrix) to make a prediction on the test set

**The rating and pred_rating output of test set-**Image from Creator

Evaluating The Prediction Performance

Although there are numerous evaluation metrics for Recommender Systems, corresponding to Precision@K, Recall@K, MAP@K, and the list goes on. For this exercise, I’ll employ a basic accuracy metric namely RMSE. I probably will write other evaluation metrics in greater detail in the next article.

Because the result, the RMSE on the test set is 0.829, which is pretty decent even before the hyper-tuning is implemented. Definitely, we will tune several parameters like learning rate, n_factor, epochs steps for higher outcomes.

On this segment, we opted for the Python library namely the surprise package. A surprise package is a Python library for constructing and evaluating suggestion systems. It provides a straightforward and easy-to-use interface for loading and processing datasets, in addition to implementing and evaluating different suggestion algorithms.

Data Import and Model Training

Top-N suggestion generator

for UserId: 231832 following is the highest 10 movie suggestion list:

m_912, m_260, m_1198, m_110, m_60069, m_1172, m_919, m_2324, m_1204, m_3095

**Top 10 suggestion output** -Image by Creator

The utilization of Matrix Factorization in modern entertainment like Netflix helps to know user preferences. This information is then used to recommend essentially the most relevant item/product/movie to the tip user.

Here’s a summary of the Matrix Factorization illustration that I created, in case I would like to elucidate it to my grandkids sooner or later….

[1] Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872