Movie Recommender

Back To Home Github link to the source code

About this algo:

- I created this algo using a 'User Based Collaborative Filtering' mechanism on a dataset of 100,000 movie reviews.

- This method relies on users reviews as against knowledge about the underlying content. I applied a basic correlation score on a manipulated table of user ratings.

- The underlying 'limitation' of the math model - The same user must have rated multiple movies for a correlation score to be calculated on that movie.

- This results in associations between common movies based on correlation scores

Use cases of the underlying principle:

- Media: Recommending similar movies to users as done on Netflix/Amazon Prime/HBO/Apple TV
- Music Recommender: Spotify like algorithm to suggest similar songs based on users picks!
- TV Series Recommendations: Very similar to movie recommendations based on similar user’s interests
- Restaurant Recommendations: Apps like Yelp, Uber Eats, Liefrando, Zomato, Grubhub, Grab, Foodpanda heavily rely on users reviews to suggest restaurants to other users
- Travel Recommendations: Which could go hand in hand with the above especially when travellers leave reviews for non-touristy spots. Additionally, A text mining algo to classify the sentiment could be fit here. Note: I have coded a similar text mining algo right here:
- News and Article Recommendations: This can be used to identify a correlation score with the same user’s existing past reading history as against relying on other similar users.
- Instagram Feed recommendations: Similar principle to the news articles correlation
- Job Matching in HR: Suggesting similar job applicants basis successfully filtered candidates

Dataset: What does the underlying data look like?:

-This is a base level dataset of customer transactions and all the items purchased by them
- It entails a column of the transaction id, the items bought, time of the day and Bought on a Weekday or Weekend.
-Data Training: The whole dataset was used to give you an answer and there was no split between training and testing data as such.