Recommendation System

Building a recommendation system for users

The recommendation system can be used in

  • The Amazon homepage recommends personalized products that we might be interested in.

  • The Pinterest feed is full of pins that we might like based on trends and our historical browsing.

  • Netflix shows movie recommendations based on our taste, trending movies, etc.

1. Problem Statement

Problem statement

Display media (movie/show) recommendations for a Netflix user. Your task is to make recommendations in such a manner that the chance of the user watching them is maximized.

Scope of the problem

Define the scope of the problem

  • The total number of subscribers on the platform as of 2019 is 163.5 million

  • There are 53 million international daily active users

==> Given a user and context (time, location, and season), predict the probability of engagement for each movie and order movies using that score.

Type of user feedback

Explicit feedback is provided by user, whereas implicit feedback is assumed by user action
  • Explicit feedback

    • A user provides an explicit assessment of a recommendation. In our case, it would be a star rating, e.g., a user rates the movie four out of five stars.

    ==> The recommendation problem will be viewed as a rating prediction problem

  • Implicit feedback

    • Implicit feedback is extracted from a user’s interaction with the recommended media. Most often, it is binary in nature. For instance, a user watched a movie (1), or they did not watch the movie (0).

    ==> The recommendation problem is viewed as a ranking problem

2. Metrics

Offline evaluation --> best model --> Online evaluation

Online metrics

  • Engagement rate

    • engagement  rate=sessions  with  clickstotal  number  of  sessionsengagement\;rate = \frac{sessions\;with\;clicks}{total \; number\;of\;sessions}

    • User may click a movie but did not find it interesting enough to complete watching it

  • Videos watched

    • The average number of videos that user has watched in a significant time (> 5 mins)

    • May miss out the overall user satisfaction with the recommemded content

  • Session watch time

    • The overall time a user spends watching content based on recommendations in a session

      ==> Measures that a user is able to find a meaningful recommendation in a session

Offline metrics

  • Mean Average Precision (mAP@N)

    ==> mAP@N measures how relevant the top recommendations are.

    • Precision P=relevant  recommendations  #total  recommendations  #P = \frac{relevant\; recommendations\; \#}{total\; recommendations \;\#}

    • Assuming

      • System recommended N = 5 movies

      • the user watched 3 movies from these recommendations and ignored the other 2

      • There are only m=10 movies that are actually relevant to the user

    • The average precision AP@N=1m∑k=1N(P(k)  if  kth  item  was  relevant)=1m∑k=1NP(k)∗rel(k)AP@N = \frac{1}{m}\sum^{N}_{k=1}(P(k) \;if\; k^{th}\; item\; was \;relevant) = \frac{1}{m}\sum^{N}_{k=1}P(k)*rel(k)

  • Mean Average Recall (mAR@N)

    ==> mAR@N measures how well the recommender recalls all the items with positive feedback, in its top recommendations

    • Recall r=relevant  recommendations  #all  possible  relevant  #r = \frac{relevant \; recommendations \; \#}{all \;possible \; relevant\; \#}

  • F1 score

    • F1  score=2×mAR×mAPmAR+mAPF1\;score = 2\times\frac{mAR\times mAP}{mAR + mAP}

  • Offline metric for optimizing ratings

    • If the interviewer wants you to have to optimize the recommendataion system for getting the ratings right, we need to use root mean squared error (RMSE)

    • RMSE=1N∑i=1N(yi^−yi)2RMSE = \sqrt{\frac{1}{N}\sum^N_{i=1}(\hat{y_i}-y_i)^2}

3. Architectural Components

There are mainly two stages for the recommendation task

  • Stage 1: Candidate generation

    • A simpler mechanism to sift through the entire corpus for possible recommendations

    • It focuses on higher recall. Meaning it focuses on gathering movies that might interest the user from all perspectives

  • Stage 2: Ranking of generated candidates

    • Score the candidate movies that are generated by Stage 1

    • It focuses on higher precision. It focuses on the ranking of the top k recommendations

Training data generation

The user’s engagement with the recommendations on their Netflix homepage will help to generate training data for both, the ranker component and the candidate generation component.

4. Feature Engineering

The main actors of the features are

The features are

  • User-based features

    • Age

    • Gender

    • Language

    • Country

    • Average_session_time

    • last_genre_watched

    • User-history-features

      • user_actor_histogram

      • user_genre_histogram

      • user_language_histogram

  • Context-based features

    • season-of-the-year

    • upcoming-holiday

    • days-to-upcoming-holiday

    • time-of-day

    • day-of-week

    • device: tablelet, TV, cell phone

  • Media-based features

    • public-platform-rating: IMDB, Netflix rating

    • revenue

    • time-passed-since-release-date

    • time-on-platform

    • media-watch-history

      • last-12-hours

      • last-24-hours

    • genre

    • movie-duration

    • content-set-time-period

    • content-tags

    • show-season-number

    • country-of-origin

    • release-country

    • release-year

    • release-type: DVD, broadcast, streaming

    • maturity-rating

  • Media-user cross features

    • user-genre historical interaction features

      • user-genre-historical-interaction

        • 3months

        • 1year

      • user-and-movie-embedding-similarity

      • user-actor

      • user-director

      • user-language-match

      • user-age-match

    • Sparse features

      • movie-id

      • title-of-media

      • synopsis

      • original-title

      • distributor

      • creator

      • original-language

      • director

      • first-release-year

      • music-composer

      • actors

5. Candidate Generation

The purpose of candidate generation is to select the top k (let's say one-thousand) movies that you would want to consider showing as recommendations to the end-user.

Collaborative filtering

There are two methods to perform collaborative filtering

Nearest neighborhood

  • Given a matrix of n*m: represented n users watch history of the m movies

  • For an user i, we need to predict his feedback for all movies that he has not seen

    • First, based on KNN, we pick the top k similar users to the user i

    • Then, the user' feedback for an unseen movie can be predicted by taking the weighted average of the feedback that the top k similar users. The weight is the users' similarity to user u_i

==> The issue is the computationally expensive

Matrix factorization

Matrix factorization decomposes a high sparse matrix to two lower dimensional matrices:

  • User profile matrix ( n×Mn\times M )

  • Media profile matrix ( M×mM\times m )

The dimension M is the number of latent factors.

For matrix factorization process

  • Create the user profile and movie profile matrices

  • Generate good candidates for movie recommendations by predicting user feedback for unseen movies.

    • The prediction can be made simply by computing the dot product of the user vector and the movie vector

To train the latent factor matrices for users and media, we can follow the steps:

  • Initialize the user and movie vectors randomly

  • for each user-movie feedback value f_{ij}, predict the movie feedback by dot product the u_i and m_j

  • The difference between the f_{ij} and u_i*m_j is used for gradient descent for the matrices update

Content-based filtering

Content-based filtering allows us to make recommendations to users based on the characteristics or attributes of the media they have already interacted with.

Feature preparation

The idea is representing the media attributes based on TF-IDF features

Recommending media to the user

  • Similarity with historical interactions

    • Based on the similarity to the movies that the user has watched in the past

  • Similarity between media and user profiles

    • Based on user's interaction (watched, ignored) with a list of movie, we can build a user profile vector

    • Then, we can measure the similarity of the user profile vector to the candidate movie

Embedding-based similarity

We can use deep learning to generate latent vectors/embeddings to represent both movies and users. Then, we can use KNN to find the movies that are good candidates for a user to watch

Embedding generation

  • Two towers model

    • Left tower with the media only features, and the last layer is the media embedding

    • Right tower with the user only features, and the last layer is the user embedding

  • The Loss function is aiming to minimize the distance between the dot product of u and m, and the actual feedback label.

  • Note the length of user vector and the movie vector are the same

Techniques Pros and Cons

  • Collaborative filtering

    • Pros:

      • Does not require domain knowledge to create user and media profiles

      • May capture data features that can be neglected in content-based filtering

    • Cons:

      • Cold start problem. The new moive or user cannot be recommended with no history

      • Computation expensive

  • Neural network:

    • Pros:

      • More ability to representing a user and movie feature

    • Cons:

      • Cold start problem. The new movie or user have few instances of feedback for training the neural network

  • Content-based filtering:

    • Pros:

      • It requires some initial input from the user regarding their preferences to start generating candidates.

      • The new media profile can be built immediately as their description is provided manually

6. Training Data Generation

Generating training examples

Labeling the user actions based on the duration for the user watched a particular movie

  • Positive: if a user watched more than 80% of the movie

  • Negative: if a user watched less than 10% of the movie

  • uncertain: if a user watched the duration between [10%, 80%]. The reason is that we do not know why a user stopped watching in the middle. ==> We can avoid those misinterpretations.

Balancing the positive and negative samples

We can use randomly downsampling

Weighting training examples

Since Netflix wants to increase the time a user spends on the platform. ==> Assign more weights to the samples with a longer time

Train test split

There are two factors:

  • User behavior is different in weekday and weekend ==> we should include a whole week of data

  • We predict the user's behavior based on their previous behavior ==> The data should be split by time

7. Ranking

Models for ranking

Logistic regression or random forest

Deep NN with sparse and dense features

Training a deep neural network with embedding and other dense features

Re-ranking

Re-ranking is done for various reasons, such as bringing diversity to the recommendations. Consider a scenario where all the top ten recommended movies are comedy. You might decide to keep only two of each genre in the top ten recommendations. This way, you would have five different genres for the user in the top recommendations.

Last updated

Was this helpful?