Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018p58pd125
Title: Predicting Netflix Movie Ratings using a Topic Modeling Algorithm
Authors: Zhu, Michael
Advisors: Arora, Sanjeev
Contributors: Singer, Amit
Department: Mathematics
Class Year: 2014
Abstract: Latent factor models and matrix factorization algorithms were some of the most successful stand-alone algorithms used for predicting movie ratings in the Netflix Prize. To address the sparsity in the movie rating training set, many matrix factorization algorithms train only on the observed ratings and use regularization to avoid overfitting. Topic modeling algorithms must also be able to handle high sparsity. Given a collection of documents, the purpose of topic modeling is to discover the high-level thematic structure that best explains the collection of documents as a whole. In the same way, we might hope that given a collection of movie ratings, we can uncover the high-level movie genres that best explain the collection of movie ratings as a whole. Mathematically, topic modeling can be interpreted as recovering the first factor in a matrix factorization, subject to some constraints. By this view, perhaps a topic modeling algorithm can be the first step in a matrix factorization algorithm that predicts Netflix movie ratings. In this thesis, we develop a three-step algorithm for predicting movie ratings using a matrix factorization of the form M = AW: first we obtain a collection of genres using a topic modeling algorithm, then we generate a suitable A matrix from the collection of genres, and finally we use the A matrix to get the W matrix.
Extent: 23 pages
URI: http://arks.princeton.edu/ark:/88435/dsp018p58pd125
Type of Material: Princeton University Senior Theses
Language: en_US
Appears in Collections:Mathematics, 1934-2023

Files in This Item:
File SizeFormat 
Michael Zhu thesis.pdf607.5 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.