Sunday, 14 April 2024

Building Recommendation Systems FAST with Turi Create by Apple

 This time we're going to be going over how you can use the new to recreate deep learning library by Apple in order to create a movie recommendation system for iOS in fact this is actually my very first cut pattern that I've developed and you can find out all about that in the description below I'm gonna have a link to it and calm patterns aren't essentially building blocks of code for you as developers to build your own applications but they're not just building blocks there are real applications that you can modify you can customize even combine however you like to before I get into the code pattern what the code actually is let's begin by talking a little bit more about to recreate and how the entire system works now the use case is to create a movie recommendation system and so sure we all have experience going through a bunch of movies and trying to find new movies to watch and occasionally becomes frustrating because there's so many options and you get confused you don't know what to watch sometimes you watch a movie that you think you wouldn't like but you might actually end up liking it and so how do you know what you should watch before you watch it this is where the recommender system comes in but in order to do so you need a lot of data you need a lot of computing power to process that data and you need to build the algorithms behind that processing and this is where turi Creek comes in is he a little while ago in fact just a few months ago there used to be a deep learning library called GraphLab create it's an absolutely exceptional deep learning library and Apple bought out the company Apple then at WWDC had open sourced graph type create as a new open-source version called Tariq rate the differences between GraphLab crate cherry crate now are really only skin-deep the back end is completely the same you're still using the same s frames and the best part is that it's meant for performance and I'll get to that in just a moment but the best part is that has really meant not just for machine learning developers but for app developers what does that mean well this is not machine learning with an algorithm focus design it's machine learning with task focused designed what that means is that instead instead of telling the Machine here's the algorithm and here's the data I want you to train the algorithm on rather rhetoric route you say alright the task is recommender systems this is the data now go choose a model that would work best with the data that I've provided and to recreate well I might have to go through a bunch of different options and find the model at war best for your needs and if you need to you can specify an algorithm that Tariq rate already has support for it and then you can export these models to core ml and now the new core ml 2.0 you have support for for example model quantization so you've got huge models you can actually scale down the number of bits that each and every waste is encoded with and you can actually create custom models in our for example eight times less the size which would allow you to create smaller applications that take up less space on a user's device and take less time to download and of course it also supports custom models and with custom models you can implement things like recommender systems and even merge multiple models into one model with individual connecting custom layers really interesting what quorum l2 has to offer but that's a separate video together back to Terry create an entire point of torie create is to be able to process huge datasets by huge I mean in the billions of data points and millions of features in each data point category and the best part you can do this mostly on a MacBook Pro and if you have again billions of data points you can do this on an iMac Pro you do not need to go to the cloud and you can do everything locally on the computers you probably already have which is why this is so fascinating now back to movie recommendation you see the movie lence data set has 26 million rows these rows consist of user ID the ID of the movie that the scenes are watched and a rating on a scale one to five with of course decimal allowed so you can do for example a three point five star rating and so essentially it's a bunch of ratings from different users twenty six million ratings to be exact that they've gathered over the years and almost every few months they continuously update this database so you have more and more movies and movie IDs you have more and more IMDB and tmdb IDs associated with those movies so you can actually go ahead and link them over to an API get covers descriptions past whatever you need to get and from there you can train will be recommender systems and you bring these two together you're able to analyze twenty six million rows worth of movie recommendation data almost instantly in under a minute on a MacBook Pro and I'm talking the model before the late 2016 model and so this is really interesting let's take a look at how exactly the application will work of course to begin we need to start off with a data set as I mentioned in this case we're going to use the movie lens dataset and the movie left today is that again twenty six million different rows it's great for training now movie lace is gonna be fed into a library called Tory create Tariq raid of course is the open source version of GraphLab creat open sourced by Apple now on the other end we're going to have an iOS application and we're going to have a user interacting with this iOS application but there needs to be a way for this iOS application and the story create application to be linked so there are two steps that go in between them so we have flask and flask is essentially a small HTTP server written in Python and of course you've got a little bit of ng are okay in the middle over here now what's happening over here is to recreate is sending its data over to flask and flask is sending one data it has de Torre create and now flask in iOS needs to be able to communicate so iOS can call this REST API and so in order to do that I set up a little and Rock endpoint that's gonna basically act as a little bridge in between iOS and flask it's really that simple in terms of communication now wait I thought I told you that we could download Cornel models for export format models and actually implement them in iOS and that's true you can do that you can actually implement these movies to recreate models onto onto core metal iOS but there's a little bit of a problem with that right now first of all that's only supported with the newer versions of the operating systems that Apple has iOS 12 mac OS mojave Wachtel s5 and tbrs 12 and the second problem is that there are currently a few different bugs with how exactly you implement coronel models on to the phone like for example working with different kinds of architectures like the simulators x86 and an actual phones arm 64 it doesn't exactly work because of bit code not being enabled in the friend lyrics however by the September when the latest releases of or the stable releases of the new iOS Mac OS and other operating systems are out that's when you're actually going to see me translating this code pattern to work natively on iOS without having to need any of this back-end communication between iOS and to recreate running on the IBM cloud and so that is how this code patterning works so okay we've got the user communicating with the iOS application telling the iOS app exactly what they like and the way the iOS app enables them to do that is by actually communicating with and using the movie line database and showing all kind of graphical user interface for movie lights and so essentially we just import the movie lines database into a UI tableview and then use the IMDB API in order to actually scrape off for example albums and real titles and descriptions and then store those ratings locally and when you ask it's for recommending new movies for you it's gonna go ahead and call this whole REST API and so that is how this comb pattern works and now let's head over to the coding part I'm going to show you how exactly you can actually implement all of this and of course if you like to take a look at the code pattern itself you can take a look at back down in this version below alright so welcome back to the code part and now I'm going to show you how you can actually go ahead and create this code pattern now as you can see this over here is the code pattern on github itself now if you'd like you can actually go ahead click the link in the description below it's on the IBM github page you can go ahead and clone down the gap repo and then once you've cloned that you can go ahead and follow these steps or follow this YouTube tutorial and go ahead and actually run all the steps and actually create this application so what I'm going to do is I'm actually going to show you how you can do that step by step why to clone the model locally into the turi create movie recommender folder you're gonna go ahead and start off in the local model training folder this is where you're actually gonna go ahead and train your model before you can train your model you need data so go back to its root directory which is the actual game repo and run the setup SH script what this is gonna do is it's gonna go ahead and download the movie lens latest zip file now this is the latest file so it will always keep changing every time the movie lens databases and updates a some more users or some more items this will update them so it's not actually calling a specific version of movie lens in fact that kind of feature is not available readily from movie lens unless you were to host it yourself and so right as the movie lens database is done downloading its gonna go ahead unzip and it's gonna take a bunch of the different CSV files that movie lens provides and whichever ones are useful it'll move to their respective directory like for example there are some sort of index documents that are useful for the iOS application so that users can actually search up for their favorite movies rate them and get recommendations but there are some that are going to be useful for training like for example which users are associated to which movies with which rating and so that's what's going to be moved into its respective directory so now I'm just gonna speed up the clip until it's done downloading this file alright so as you can see it's done downloading the file it's now unzipping and then it'll just move the correct CSV to the correct places as you can see it's done setting up now we can head back over to the local model training and again this is written in Python 2.7 so this is not compatible to Python 3 just yet but if you'd like to make a compatible please feel free to contribute to the github repo go ahead and run your Jupiter notebook of course this code pattern will assume that you already have a Jupiter notebook setup I'm gonna go ahead and actually quit Safari so I can go ahead and open this up in google chrome so I'm gonna take this token link that's provided to me here and I'm gonna go here into the notebook and just like that you can see the notebook as you can see the notebook is actually a guide you through step by step what's happening at each line of code in fact Turek rate is exceptionally simple to use all you do is you import it you read a CSV into an S frame you delete the columns that aren't relevant to your data you can print it out just to the bug and then you create your model in this case I'm creating an item similarity recommender and I provided the user ratings and I tell it the user ID is the user ID call the item ID is the movie ID column and the target and we're trying to predict on the target that we're basing this off of is the rating then I just save the model as movie recommend movie wreck now I would like to note that to recreate models are special they're not saved as files they're saved as folders and so what this means is that well you have a folder called movie wreck which has a bunch of different files that we would adapt to recreate we'll use then I can go ahead and all these cells as you can see it's gonna import Tariq rate I'm currently running the second beta of version 5 it's gonna read the CSV practically instantly and we've got a very impressive number of line here lines here we've got twenty six million twenty four thousand two hundred and eighty-nine lines right in seven point two eight seconds on a MacBook Pro which is very impressive I then delete the timestamp print out the ratings and then create the model as you can see it's going through in total we have two hundred and seventy thousand eight hundred and ninety six users and forty five thousand one hundred fifteen items and at the pair of the data in less than seventeen seconds now it's gonna go ahead and actually create or train the model as you can see it was able to actually get some statistics like some bare-bones statistics in less than a second in fact just over half a second and now it's going to go ahead and actually go through the whole dataset and I can guarantee that within 60 seconds it'll be done going through the whole dataset I'm gonna speed this up for you since I don't want to have you watch paint dry alright as you can see I was off by just thirty seconds here but again very very short amount of time in fact ninety two seconds I was able to get a recommender system off of twenty six million rows two hundred forty thousand users forty five thousand items rated I was able to go ahead and get a very accurate actually as you'll see in just a moment movie recommendation system and I was able to export it over here if you'd like to X word to Cora melon you can find out how to do so on the github to recreated page now let's head back over to the terminal and let's head over to the next important folder this is the server-side prediction API folder this is where you're going to host your flask server but in order to do it you need to get a model in here so you just move you just find where exactly the local model training folder is and you move the movie rec folder which is actually the model to this current direct once it's in here then you just run your flask app the way you're gonna do that is by setting the flask app environment variable to back-end dot py and then just do a quick flask run there we go flask is indeed running and now in order for your iOS application to access flask go ahead and run an ng ROK instance now in this case I've registered as a reserved sub domain with and gr okay called movie recommend there we go now if you go to movie recommend an gracio it's gonna pull it's going to point to my local host 5000 then I can head over to my iOS front-end and this is where the magic happens I'm gonna go ahead into the movie recommender folder and this is the actual sort of Xcode project folder first thing I'm gonna do is run pod install to install my cocoa pods which will be all my dependencies like a Lambo fire and Orlando Fire swifty JSON we're actually doing communications with networking CSV got Swift to actually index my data and of course swifty JSON to deal with JSON and there we go now I can go ahead and open up my room and orexi workspace Oh miss belt open there and then all you need to do is wait a second and it should open up Xcode for me now I would like you I would like you know that I am indeed running Mac OS Mojave and Xcode ten along with iOS 12 however this is meant to work with iOS 11 and is really only backwards compatible here this is not meant really to work with iOS 12 if it were then I would actually be using the export to Correll feature however that's gonna be in a separate branch very soon on the github repository and very soon in September once at once Apple releases their golden master version it will be merged into master alright as you can see it's gonna go ahead and launch this movie recommender application onto an iPhone 10 simulator now as this is launching I'd like you to note one thing as you can see inside of Xcode that setup script has actually moved the CSVs into the correct location but at the same time you have to realize that we're dealing with approximately 45,000 844 different movies here and we're dealing with that across two different files now all these data points need to be joined and they need to be processed and so what that means is that it might take a little bit of time to get started because well I mean you need to index all of this data and make it so the iOS application can actually use it so the first run that you make of this application will take around five minutes to actually index all the data of course it'll take probably less than five minutes I'm just saying that in a very in sort of the worst case it'll take around five minutes however every other run after that should take approximately two seconds maximum to load up because after that the data would have been indexed now let's head back over to the iOS application and as you can see I've already fit in a few of my favorite movies the home alone home alone three home one for Matilda recycle me and the speaker will need to now if I wanted to I have actually go ahead and add some more movies in and I'd search for the movies over here so all I do is I type in the movies name say brave I search it up and it'll use a Levenstein string distance algorithm and actually go ahead and search all of those rows in a multi-threaded fashion almost instantly it really just takes maximum three seconds I can go ahead and rate the movie for example five stars I've been clear and it brings me back to my current favorites and it shows me the actual ratings for each of these movies now all I need to do is click on what should I watch and if everything were just right it should tell me what I should watch next there we go now you know these are my recommendations over here but the funny part the funny part is that I've already watched a few of these movies and I just had not actually put them in the favorites list like for example frozen I wrote that a good five stars wreck-it Ralph I'd write that a good five stars I hadn't watched angle yet but I'll definitely do that soon monsters University I'll give that a for How to Train Your Dragon haven't really watched that Megamind five and there we go now I can clear as you can see it's added to these movies to my favorites and by complete coincidence I'd actually watch these and the neural network is able to recommend these movies because I'd like them which is really amazing if you think about it it was able to read my mind quite literally and then I just click on what should I watch they use me even more personalized recommendations and then I can just go ahead you continuously rate and after a few rounds of rating the model will have a good idea as to what kinds of movies you like what kinds of reviews you don't like and then you can just expand it from there and make it even bigger the entire architecture of the application is actually relatively simple if you open up the main that storyboard you can see we've got a few main views in fact we've only really got two or one main view well I mean we've got the navigation controller to the left here and then we've got just one view over here that does all of the work it's called a favorite search view controller and it'll actually show you your favorites by default but it'll also help you search for different movies and at the same time it's going to help you actually get recommendations and show them out on this view of course there are a few prototype cells that I've put into this table view in fact I'm only really put one it will show the sort of cover or the poster of the movie over to the left here it's for the title and it'll show the year that the reviewers released and of course it can actually either a provide you the functionality for you to actually give a rating or it'll actually provide the functionality to change this label into the rating that you've already given if you're on the my favorites page and then you've got a few dots with files that actually go ahead and sort of orchestrate all of the work like for example in the favorites huge search viewcontroller you've got code that will actually go ahead decide what's actually happening right now for example are you searching are you taking a look at your favorites or you're getting recommendations and depending on what's happening it's going to change the content in the table view depending on for example should you actually have the option to rate you see the ratings that you've already provided etc in fact this is all done also via the movie table view cell in fact this is actually where I make a I calls to the tmdb API the tmdb API is the movie database API and then of course in the movie handler over here I basically have this class to deal with the API that does the movie recommendation of course in September this class will be gone because you won't need this anymore or at least some sections of this class will be gone because everything will be done locally there are a few however that will stay like for example this function over here actually uses a multi-threaded approach along with eleven Stein string Edit distance algorithm in order to search but I'll search all those movies to find to find the movies that you're looking for and then of course I recommend function over here also we got a few quick extensions to some classes like for example in array you can actually chunk arrays into numerous different arrays now with this extension there's also a synchronize array there will be a link to the synchronized array down in the description below by a great blogger online he's made he's done some great work with the synchronizer right really it's amazing it allows you to have arrays that are synchronized across multiple threads and of course some quick string extensions one for the levenshtein distance and one of them is just to encode strings as certain a certain in a certain encoding of course we've also got the two CSV files and of course the app delegate dot so if that's really not doing much work here at least customized work and that is a quick overview of the code and how exactly you can build a movie recommender system with this code pattern I really do hope you enjoyed this tutorial again if you'd like to build a system for yourself I definitely recommend you head over to this github repository it's on the IBM github page will be a link to it down in the description below and that's what I had for this tutorial today again I really do hope you enjoyed and if you did please do make sure to leave a light down below share it out share it with anyone you believed that could benefit from it like your friends or family car flat if you really do like this content and you do want to see more of it please you can see there subscribing to the channel is it really does help out a lot and of course turning on notifications by clicking the bell icon if you'd like to be notified however I release new video if you do have any comment suggestions feedback I'd love to hear it down in the comment section below

No comments:

Post a Comment

Connect broadband