Recommendation System
Building a recommendation system for users
The recommendation system can be used in
The Amazon homepage recommends personalized products that we might be interested in.
The Pinterest feed is full of pins that we might like based on trends and our historical browsing.
Netflix shows movie recommendations based on our taste, trending movies, etc.
1. Problem Statement
Problem statement
Display media (movie/show) recommendations for a Netflix user. Your task is to make recommendations in such a manner that the chance of the user watching them is maximized.
Scope of the problem
Define the scope of the problem
The total number of subscribers on the platform as of 2019 is 163.5 million
There are 53 million international daily active users
==> Given a user and context (time, location, and season), predict the probability of engagement for each movie and order movies using that score.
Type of user feedback

Explicit feedback
A user provides an explicit assessment of a recommendation. In our case, it would be a star rating, e.g., a user rates the movie four out of five stars.
==> The recommendation problem will be viewed as a rating prediction problem
Implicit feedback
Implicit feedback is extracted from a user’s interaction with the recommended media. Most often, it is binary in nature. For instance, a user watched a movie (1), or they did not watch the movie (0).
==> The recommendation problem is viewed as a ranking problem
2. Metrics

Online metrics
Engagement rate
User may click a movie but did not find it interesting enough to complete watching it
Videos watched
The average number of videos that user has watched in a significant time (> 5 mins)
May miss out the overall user satisfaction with the recommemded content
Session watch time
The overall time a user spends watching content based on recommendations in a session
==> Measures that a user is able to find a meaningful recommendation in a session
Offline metrics
Mean Average Precision (mAP@N)
==> mAP@N measures how relevant the top recommendations are.
Precision
Assuming
System recommended N = 5 movies
the user watched 3 movies from these recommendations and ignored the other 2
There are only m=10 movies that are actually relevant to the user
The average precision
Mean Average Recall (mAR@N)
==> mAR@N measures how well the recommender recalls all the items with positive feedback, in its top recommendations
Recall
F1 score
Offline metric for optimizing ratings
If the interviewer wants you to have to optimize the recommendataion system for getting the ratings right, we need to use root mean squared error (RMSE)
3. Architectural Components

There are mainly two stages for the recommendation task
Stage 1: Candidate generation
A simpler mechanism to sift through the entire corpus for possible recommendations
It focuses on higher recall. Meaning it focuses on gathering movies that might interest the user from all perspectives
Stage 2: Ranking of generated candidates
Score the candidate movies that are generated by Stage 1
It focuses on higher precision. It focuses on the ranking of the top k recommendations
Training data generation
The user’s engagement with the recommendations on their Netflix homepage will help to generate training data for both, the ranker component and the candidate generation component.
4. Feature Engineering
The main actors of the features are

The features are

User-based features
Age
Gender
Language
Country
Average_session_time
last_genre_watched
User-history-features
user_actor_histogram
user_genre_histogram
user_language_histogram
Context-based features
season-of-the-year
upcoming-holiday
days-to-upcoming-holiday
time-of-day
day-of-week
device: tablelet, TV, cell phone
Media-based features
public-platform-rating: IMDB, Netflix rating
revenue
time-passed-since-release-date
time-on-platform
media-watch-history
last-12-hours
last-24-hours
genre
movie-duration
content-set-time-period
content-tags
show-season-number
country-of-origin
release-country
release-year
release-type: DVD, broadcast, streaming
maturity-rating
Media-user cross features
user-genre historical interaction features
user-genre-historical-interaction
3months
1year
user-and-movie-embedding-similarity
user-actor
user-director
user-language-match
user-age-match
Sparse features
movie-id
title-of-media
synopsis
original-title
distributor
creator
original-language
director
first-release-year
music-composer
actors
5. Candidate Generation
The purpose of candidate generation is to select the top k (let's say one-thousand) movies that you would want to consider showing as recommendations to the end-user.
Collaborative filtering
There are two methods to perform collaborative filtering
Nearest neighborhood
Given a matrix of n*m: represented n users watch history of the m movies
For an user i, we need to predict his feedback for all movies that he has not seen
First, based on KNN, we pick the top k similar users to the user i
Then, the user' feedback for an unseen movie can be predicted by taking the weighted average of the feedback that the top k similar users. The weight is the users' similarity to user u_i
==> The issue is the computationally expensive
Matrix factorization
Matrix factorization decomposes a high sparse matrix to two lower dimensional matrices:
User profile matrix ( )
Media profile matrix ( )
The dimension M is the number of latent factors.

For matrix factorization process
Create the user profile and movie profile matrices
Generate good candidates for movie recommendations by predicting user feedback for unseen movies.
The prediction can be made simply by computing the dot product of the user vector and the movie vector
To train the latent factor matrices for users and media, we can follow the steps:
Initialize the user and movie vectors randomly
for each user-movie feedback value f_{ij}, predict the movie feedback by dot product the u_i and m_j
The difference between the f_{ij} and u_i*m_j is used for gradient descent for the matrices update
Content-based filtering
Content-based filtering allows us to make recommendations to users based on the characteristics or attributes of the media they have already interacted with.
Feature preparation
The idea is representing the media attributes based on TF-IDF features


Recommending media to the user
Similarity with historical interactions
Based on the similarity to the movies that the user has watched in the past
Similarity between media and user profiles
Based on user's interaction (watched, ignored) with a list of movie, we can build a user profile vector
Then, we can measure the similarity of the user profile vector to the candidate movie

Embedding-based similarity
We can use deep learning to generate latent vectors/embeddings to represent both movies and users. Then, we can use KNN to find the movies that are good candidates for a user to watch
Embedding generation

Two towers model
Left tower with the media only features, and the last layer is the media embedding
Right tower with the user only features, and the last layer is the user embedding
The Loss function is aiming to minimize the distance between the dot product of u and m, and the actual feedback label.
Note the length of user vector and the movie vector are the same
Techniques Pros and Cons
Collaborative filtering
Pros:
Does not require domain knowledge to create user and media profiles
May capture data features that can be neglected in content-based filtering
Cons:
Cold start problem. The new moive or user cannot be recommended with no history
Computation expensive
Neural network:
Pros:
More ability to representing a user and movie feature
Cons:
Cold start problem. The new movie or user have few instances of feedback for training the neural network
Content-based filtering:
Pros:
It requires some initial input from the user regarding their preferences to start generating candidates.
The new media profile can be built immediately as their description is provided manually
6. Training Data Generation
Generating training examples
Labeling the user actions based on the duration for the user watched a particular movie
Positive: if a user watched more than 80% of the movie
Negative: if a user watched less than 10% of the movie
uncertain: if a user watched the duration between [10%, 80%]. The reason is that we do not know why a user stopped watching in the middle. ==> We can avoid those misinterpretations.
Balancing the positive and negative samples
We can use randomly downsampling

Weighting training examples
Since Netflix wants to increase the time a user spends on the platform. ==> Assign more weights to the samples with a longer time

Train test split
There are two factors:
User behavior is different in weekday and weekend ==> we should include a whole week of data
We predict the user's behavior based on their previous behavior ==> The data should be split by time

7. Ranking
Models for ranking
Logistic regression or random forest

Deep NN with sparse and dense features


Re-ranking
Re-ranking is done for various reasons, such as bringing diversity to the recommendations. Consider a scenario where all the top ten recommended movies are comedy. You might decide to keep only two of each genre in the top ten recommendations. This way, you would have five different genres for the user in the top recommendations.
Last updated
Was this helpful?