Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Tuesday, 10 November 2020

Perceptrons & Gradient Descent — Learn [AI] Deep Learning from Scratch

Learn deep learning from scratch series today we're going to be taking a deep dive into the very fundamentals of machine learning technology we're also going to be covering the basic building block of deep neural networks perceptrons now here's the thing as i mentioned in the introduction to this series in order to understand machine learning and deep learning more effectively it is important to have at least a little bit of an understanding of the mathematics that go behind neural networks now what i'm going to do is i'm going to try and simplify this math as much as possible and make it so that you only really need to understand the parts that are really relevant to deep learning technology specifically we're going to be covering getting derivatives of functions what that even means and how you can actually use those derivatives to optimize a function now what exactly does that mean i know it sounds really complex i'm going to show you in just a moment before we start though i do want to say really quickly that if you do enjoy this kind of content please do consider subscribing to the channel as it really does help out a lot and of course please do consider liking the tutorial as well if you did enjoy it feel free to put your questions down in the comments below and i'll go ahead and respond as soon as possible now before we dive into the mathematics of these concepts how about we take a look at the very basic building block of these deep neural networks in the first place the perceptron now as i mentioned before machine learning technology is all about taking an input we'll call it x and predicting the output based off of that input say y now this could be all kinds of things x could be a picture of dogs and cats and y could be the label is this dog or is this a cat x could be historical stock prices and why it could be future stock prices basically you know there's some kind of causal link between x and y and you're trying to use mathematics to figure out what that link is so that you can take some value of x and predict what y should be now what goes behind the deep learning systems that actually do this are basic building blocks called perceptrons you can see a little diagram of a perceptron on screen right now now this diagram is pretty simple it's literally just two circles in the arrow these circles we're going to call nodes or neurons we can use the term interchangeably and the circles have a connecting arrow between them so the one on the left connects to the one on the right now the node on the left we're going to label that as the input node or the input neuron and the node on the right we're going to label the output neuron basically what we're doing is we're saying that this perceptron on the left the input is where we feed in our x our input and what we want to do is we want to make it so that through that connection we're able to predict the value through the output neuron the way we figure out what the value of the output neuron should be which is the prediction of our network is by using w which is the weight associated with the connection between i and o now these perceptrons can model all kinds of tasks for example let's just say we wanted to build a simple perceptron that takes in an input and outputs the negated number that we input so for example we feed into 4 gives us negative 4 we give it a 1 gives us negative 1 or we feed it in negative 0.5 and it gives us a positive 0.5 so whatever we input it gives us our negated input as output now the way we determine the value of the output perceptron is actually pretty simple it's just a mathematical function that takes the input from the input perceptron as well as the weight between the input and output and multiplies them that's it so if we were to use this rule you can pretty immediately tell this is basic math that the weight in this case should be negative one because if the weight is negative one then when you multiply the input the output is the negated input three would become negative three negative four will become positive 4 when multiplied by negative 1. now the thing about this function is that while it does work great it's never really obvious what the value of the weight should be now here's the thing in this case it was very easy for us to tell well we want to negate the input so we just want to multiply by negative one but what if you had a more complex task like actually predicting stock prices or determining if an image contains a cat or a dog in that case you don't really know what to multiply by that's something that you want to leave up to the computer to decide and this is where the more slightly advanced mathematics come in now to dive into these slightly more advanced mathematics let's go ahead and take a look at a graph now the graph on screen is plotting out a pretty simple function now let's just say you think of this function as a one-dimensional set of hills to the left and right with a valley in the middle all right now there's a ball on this hill now of course in the real world if we were to apply to the laws of physics this ball would roll down and get down to that little basin in the middle because of course gravity wants the ball to come down now let's just bring the ball back up for a moment let's freeze it in time and there's one thing that i want you to notice as the ball falls down it has this thing called velocity now velocity tells us is really two things it's telling us both the direction in which that ball is moving right so where exactly is it moving in this case is moving to the left or the right and also it tells us the magnitude of its movement it tells us the speed now you know in some cases this could be pretty flat or it could be pretty steep but something to keep in mind is that this isn't necessarily a 100 accurate representation of velocity and rather as an analogy for derivatives and believe it or not even though it might sound a lot scarier the derivative of a function is really just telling you the velocity at a certain point so let's go ahead and go back to the ball example now let's just say we were to go ahead and calculate the derivative of this function so if i go ahead and calculate the derivative of this function at that point again which is basically you know as an analogy the velocity of the ball and if i were to go ahead and sort of plot it out so it's a little bit easier to imagine what exactly that means as you can see i can make this representation now in this specific case as you can see i've got an arrow pointing to the left what that means is that the derivative is telling us that in order to reduce the value of y we need to move left on the x-axis all right so in order to move the ball down we got to move the ball left that's one thing now the other white line that you're seeing that's what's called the tangent line that's what's telling us the magnitude of that movement how steep is the line that the ball is going down at that specific point now that steepness can help us determine how fast the ball is actually moving so again the analogy to velocity now if i were to go ahead and animate this ball moving down for you you can see that as we go down and as the rate of change as the steepness of the function decreases that line gets flatter and flatter until we're at the very bottom where it's completely flat and of course the arrow really in this case it just happens to be pointing left because that's just where it was last uh but in this case it could be pointing left or right we're at the exact middle um and there's no movement to uh happening but then if we were to move this ball over to the left side of this hill as you can see we get kind of like the opposite effect in this case in order to move the ball down it needs to move to the right on the x-axis and that's why the arrow is pointing there that's what the derivative function is telling us and as you can see the white line has also slightly shifted so that we can see that well this is the slope of the function this is how fast the function is changing at that point again analogy to velocity it's the speed of the change of that function now let's just go ahead and move that ball back for a moment and let me go ahead and help you convert this sort of more beautiful interpretation of a derivative to the actual value now as you can see in this specific case the arrow is pointing left all right because the arrow is pointing left that means the derivative is negative so we know one thing already that the value of the derivative of this function at this point is going to be a negative number the next thing we know is well we can see the slope and if we go ahead and convert that slope to an actual number we get in this specific case 1.37 so this more beautiful interpretation of derivative you were seeing really just means that at that specific point the derivative is negative 1.37 now one thing i will note is that what i've been saying here is a slightly inaccurate representation of the derivative in reality this number is not negative it's positive let me show you why and in order to understand let's go ahead and actually plot out the derivative of this function now when we plot it you can see that when the function is steeper you see a peak in the derivative function that is because when the function itself is changing more quickly the derivative is going to be a higher value remember it's telling us that speed as a velocity analogy however the derivative doesn't tell us where to go to decrease the value of the function to move the ball down the derivative really tells us where to go to move the ball up this is called a set but what we want to do in the world of deep learning is called descent which is why i sort of showed you this analogy of it being negative however in reality the derivative at that point is positive so i do want to want you to keep that in mind in the world of machine learning and deep learning while we do usually or really always negate our derivatives to move the ball down for reasons that you'll understand in a later episode the actual derivative would tell us how to move the value up all right so now that you understand this basic concept of derivatives let me show you one more example of a function and its derivative just to help you wrap your head around the concept a little bit more so in this case i've plotted out a function which is the blue line called sigmoid alright this is a slightly modified version of sigmoid but it is based off of the sigmoid function and the red line is its derivative as you can tell the way this function works is that towards the ends it sort of tapers off and it gets more flat but towards the middle when x is equal to zero it's growing at its quickest rate meaning the smallest change in x would result in a larger change in the actual function and the derivative actually tells us this information you can see that the derivative peaks when x is equal to zero and so does the steepness of the function at x equals zero so i hope that this visualization sort of helps you sort of nail in the idea that what the derivative is telling us is not only where the function is increasing in value towards the left or towards the right but also just how fast it's increasing in value as well and so that's all the math that you need to understand to get into the world of deep learning technology now of course though there is a little bit more you're probably wondering well sure derivative functions are great but why in the world would we ever use them well let me show you it's because the derivative of a function can help us do what's known as optimization for example let's just say we've got the dot on this function all right that point is at x equals three now as we can see the dot's pretty high up we want to move the dot down we want to move it down so what should the value of x be in order to move it as far down as possible and we know just by looking at the graph that x should be equal to zero that's because this is one dimensional we can pretty easily kind of scroll through a graph and figure out what the value should be but the issue is that neural networks aren't just one-dimensional right the real deep neural networks that we're going to build have many hundreds or thousands or millions of dimensions we can't even visualize that forget actually exploring the space so therefore we need to figure out an automated mathematical way to figure out how to move that point down and well i'm going to show you how to do exactly that as we can see this function starts at x equals three and before we can understand how we can move it down with math we need to understand something that's known as the update function the update function which i'm going to label u takes three arguments it takes x y and z and what it returns is simply x minus y times z now you're probably wondering what's the purpose of this function well what this function enables us to do is actually really powerful what it does is it takes three things it takes x which is the input to a certain function it takes y which is the derivative of that function at x and it also takes z which is something that's known as a learning rate and i'll talk about what that is in a moment what the update function does is it returns to us a new value of x meaning a new value that we can feed into the function in order to reduce the output from that function in order to make it smaller that is to move the dot down the hill right down that slope now the way it does this is by taking the actual input to the function and basically removing the derivative value from it because remember the derivative is telling us where to move and how much to move in order to increase the output of the function well what that means is if we go in the opposite direction of the derivative we're going to decrease the value of the function move the ball down and so by removing that derivative value from the input to the function we end up making the output of the function if we use that new value lower now here's the thing though the derivative at some certain points can be really really steep right so if you remember where x equals 3 the derivative is very steep and what that means is the derivative is usually a large value and when you negate that entire value from the actual input to the function you usually end up overshooting right the ball ends up going way too far so what we actually do is we multiply that derivative by a learning rate to scale it down in this case we're going to be using a learning rate of 0.5 and you'll see that in a moment so basically what we're trying to do is we have the actual derivative value we negate it and then combine that value with the input to the function this gives us the new input to the function which should theoretically give us or really we know is going to give us a smaller output value let's take a look at that now what i'm going to go ahead and do is plot out some of the actual values that we're going to get from this function so at x equals 3 the output of the function is 3.65 that's a really large value and we want to bring it down to zero okay now if we were to take a look at the derivative of the function at point x equals three we see that the derivative tells us one point two one this means it's positive and go towards the right to increase the value of the function and the magnitude the speed of that change is 1.21 units so what we want to do is we want to negate that because of course we want to decrease the output of the function and we want to have that too so we're going to feed 3 1.21 and 0.5 which is the input of the function the derivative at that point and the learning rate respectively into the update function and it gives us a value of 2.39 this is going to be the new input to the function meaning that the point has actually moved from x equals 3 to x equals 2.39 so as you can see we just move the dot down all right so let's go ahead and redo that little bit of math so now when we feed 2.39 into the function we see the output is 2.68 that's so great we went from 3.6 to 2.6 now if we were to get the derivative at this point it gives us 1.9 which means it's increased in slope the magnitude has actually increased by moving down right the function is more steep at this point so if we were to feed in the previous input 2.39 as well as the derivative 1.9 and of course our learning rate 0.5 we get a new input to the function of 1.44 as you can see we are trending towards x equals zero but let's see if this trend continues to hold up so we're going to move the ball down to x equals 1.44 as you can see pretty big jump and we're going to redo that math once again so we feed in 1.44 and now we're in the zeros so the output from the function is 0.89 derivative is 1.56 which is still pretty steep and then we feed that info into the update function and it says your new x value should be 0.66 so we move the dot to 0.66 as you can see we're getting the pretty flat territory and if we go ahead and calculate the slope once more the slope has now gone from like 1.5 to simply 0.46 and so the slope is decreasing right because if you take a look at the function it's getting flatter and flatter as we get towards x equals zero now if we were to do you know tens or maybe hundreds of iterations of this sort of optimization procedure then we would eventually get x equals zero however in this case i don't want to have to what i don't want to sort of put you through that effort of watching paint drying watching this gets towards x equals zero so i've just done a couple of iterations and as you can see we do get pretty close to the output of the function being an actual zero in this case it brings us to 0.01 which is incredibly close to what we want an x in this case which is our input to the function is 0.23 now you might be wondering well sure but what is the point of all this why would i ever want to figure out how to reduce the value of a mathematical function based off the input parameters to it well let me tell you why it's because think back to our perceptron well in our perceptron's case we knew what the input in the output should have been but we did not know what the weight should have been to determine how to get that correct output well by using this exact mathematical procedure that i described to you you can automatically figure out the value of that weight using this procedure and this is known as gradient descent gradient is just a fancy word for derivative specifically it is the derivative of the function with respect to all of its input parameters and we'll be covering exactly what that means in a future episode you don't need to worry about it right now all you need to know is that by finding the derivative of the function which in this case is the perceptron along with what's known as an error function you can actually figure out what that weight needs to be now in the next episode i'm going to be showing you how you can actually implement that error function the perceptron and the derivative descent in python using the tensorflow library it's going to be really fun and i'm going to show you how you can actually implement this optimization using real python code thank you very much for joining today i do hope you enjoyed now of course if you did please do make sure to leave a like down below and subscribe to the channel as it really does help out a lot and turn on notifications so you know when i release the next episode in this series as i mentioned we're going to be covering how you can actually implement the concepts that you learned today in python and how you can use them to build a basic perceptron then we're going to do some really fun flower classification using more advanced multi-layer perceptron neural networks once again no need to worry about what all that means just yet it sounds fancy but it's really not i'll see you in the next episode goodbye

No comments:

Post a Comment

Connect broadband

How to Configure an Encoder-Decoder Model for Neural Machine Translation

  The encoder-decoder architecture for recurrent neural networks is achieving state-of-the-art results on standard machine translation bench...