false
Catalog
AI 101 - Class Recordings
Recording Class 4
Recording Class 4
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
we're going to get started on our homework review. If you want to, just follow along or pull out your homework. Now, we're going to start on question 1. Given the following diagram, find the accuracy. Here, we're given what's called a confusion matrix, and just heading over to the key. Here, we still have our confusion matrix. Let me see if I can have a loose pointer. If not, that's okay. I believe you guys can see my cursor. On our x-axis, we have our predicted class. That's what our model predicted was either positive or negative. For this example, let's say we're trying to figure out if the photo is of a cat or something else. Our predicted class being positive means that our model thought that the picture inputted was of a cat, and our predicted class being negative would mean that it did not think it was a cat. Now, on the y-axis, we have our actual class. This is what our label is telling us. It's our ground truth. It's the truth about our pictures. The actual class being positive means, yes, the photo is of a cat and negative, the photo is not of a cat. Now, going back to accuracy, we are trying to figure out the total number of correct predictions our model outputted over the total number of samples. As we can see that circled here, actual class being positive and predicted class being positive, that means our model got it right and nailed it. Yes, this picture is of a cat, and yes, the actual class is of a cat, so it was right. This is called our true positive. This is going in our numerator added by our true negative. The actual class is negative and our predicted class is negative. Now, when our actual class is positive and our predicted class is negative, this is what we call a false negative. Our predicted class thought that a positive, a photo that is of a cat is not of cats, that's a false negative. Our actual class being negative and predicted class being positive, this is what we call a false positive. Our predicted class thought that a photo that is not of a cat was of a cat. All of our values, true negative, false negative, true positive, or false positive and true negative, that's all going to be in our denominator, and that's just how we can predict or calculate the total number of samples. We can see our model is 82 percent accurate, so rooms for some improvement, but that is how you calculate the accuracy. Moving on to question 2, take a look at the four points in best fit line. The equation for the best fit line is right here, y equals 1.4x plus 0.2, and there's a little hint here to get the exact predictive value, you have to plug and chug into this equation. These points are actual data points, and this line is our model's way of trying to find the predicted value, and we are asked to calculate the mean squared error. Moving on to the key, and I apologize in our column headers, predicted and actual is flipped or swapped. Just know that when I'm talking about the calculated values, those are the predicted values and the whole numbers are actual values. We have the equation of our line again, and our actual values are right after the x values, and our predicted values are calculated just by what we said in our hint, plugging and chugging our x value into our y equals mx plus b equation, and getting their output as our predicted values. Then using mean squared error, 1 over n, that is the total number of samples we have, and sigma, this just means sum for every term, and actual minus predicted squared. Know that while we did make a mistake with actual and predicted in our key here, our output or our mean squared error is still the same just because of general rules of exponents and putting something to the power of two, since we're just calculating the difference and then squaring that. We get rid of any signs, so our output is still the same. Then we do our actual value minus our predicted value for every term, square that, and then we add that all, and then multiply it by 1 over n, or in this case, since we have four samples, 1 over 4, and this is our mean squared error. Now, you guys did not see that yet. Moving on to question 3, label each of the following problems as regression or classification problems. In my head, what I like to think of regression or classification problems, you have to think for yourself, is there a finite number of possible outputs? If that answer is yes, then it's a classification problem. If the answer is no, it's a regression problem. I'll go more in depth about this in these examples. For 3a, determine the sentiment of a Yelp review per given restaurant. Now, how many sentiments can we have? The general number of sentiments is positive, neutral, or negative. Now, there could be some variation, but it's a finite number, so this would be a classification problem. Now, b, predict energy consumptions of a building based on historical usage data and per weather conditions. Here, is there a finite number of the amount of energy you can consume? I could consume zero kilowatts, and Rose could consume maybe 20 kilowatts, and Sophia could consume 37.68 kilowatts. There's an infinite number of possibilities of the amount of energy we can consume. Not saying that the amount of energy we can consume is infinite, but it's an infinite number of possibilities, and therefore, this is a regression problem. Now, similar situation for question c, predict the amount of rainfall for a given day, given the amount of rainfall for previous day, humidity, wind speed, and atmospheric pressure. The amount of rainfall for a given day, this is also an infinite number of possibilities. One day, it could rain 0.2 inches, the next, or probably not the next day, but at a different time, it could rain 13.6 inches, so it's an infinite number of possibilities. This would also be a regression problem. Check that with our key right here. Classification, regression, regression. Does anyone have any questions about the homework? Oh, someone said that why is 3a, why is that one classification? The sentiment of a Yelp review is pretty finite. Since, like I said before, that's within the general three possibilities, there could be more if we wanted to add maybe exemplary or slightly dissatisfied and dissatisfied. But it's pretty much a finite number of possibilities, so that's why this is classification. Sure, we can go over question 2 again. Taking a look at the four points in the best fit line. Remember, we're calculating the mean squared error. Here, this line is our predicted values. If we're trying to predict when our Y value at X is 2, it would correspond to this point up here, which looks like it's 3. That's just one example. These points are our actual output. Looking here, I'm sorry if this is confusing, our actual and predicted columns are swapped. The whole numbers are our actual values and our predicted values are our, and our actual values are our predicted values. You just plug and chug into the equation we were given for the line. This is just how we find the Y value when X is whatever number it is. If we plug in, like I said, 2 to our equation up here, Y equals 1.4X plus 0.2, we would get a number close to 3. I haven't done the math for it yet, so it might not be 3, but just using the line as our visual representation of this equation. To calculate the mean squared error, we multiply 1 over n, so the total number of samples, so 1, 2, 3, 4, four samples, 1 over 4 times, this symbol means sigma, and it means sum of all the samples. We're going to have four of these actual minus predicted squared, so 1.6, and correctly, it should be 2 minus 1.6, but this doesn't affect our output because of the rules of squaring something. 2 minus 1.6 squared, so that's 1, and then 2, 3, 4. That's why we have these four values. We are finding the difference between these values and then squaring that, so our sign shouldn't matter. 1.6 minus 2, this is 0.4. But if we did this correctly, it would be 1.6 minus 2, that's negative 0.4. If we did this correctly, it'd be 2 minus 0.4. Now, squaring 0.4 and squaring negative 0.4 would result in the same output as 0.16 because if you square a negative number, that results in a positive number. Same thing, if you square a positive number, it results in a positive number. That's why in this case, it doesn't matter. Then we add all of these values up and multiply them by 1 over 4. Does that make sense? 82 percent and not 0.822. Sure, that's an excellent question. We fail to write it out here, but generally when finding accuracy, we're looking for a percentage, so we just multiply our answer by 100 percent. Looks like if that is all, we can move on to today's lecture. Sorry, I'm going to get that set up for a minute, and get whatever you need ready for the lecture. Sorry everyone, I'm trying to give remote control access to our other instructor here, Coven. It'll just take a minute. Okay, we'll get started. I was not able to do that since it looks like Coven just left the meeting, but I might take a pause a little later in. Neural networks, this is the topic of our lecture workshop number 4, really exciting stuff. What are first neural networks? They have an ability to model complex non-linear relationships between input and output data. Neural networks are a mathematical attempt to map the function of our brain. Now, our brain is made up of 10 to the power of 16 neurons. Neurons is this picture on the right. That's a lot of neurons. A neural network is composed of many supposed neurons. Each dot here you see in our representation of a neural network, the graph, is called a neuron. In our neuron, basic neuroanatomy, you're not going to be tested on this just to better understand a neural network. This is an oversimplified version, so know that this is just to get general understanding of a neural network. A neuron, the picture on the right, the blue is called cell body. This is where all our computations are done. Then the red is the axon. This is where our information gets propagated. All these branches on either ends are called dendrites. The green is called T dendrites, but that part is not important. This is what connects to other neurons in synapses. Here in our neural network, we attempt to make a mathematical model based off the anatomy of a neuron. Each of these circles would be our computational unit, and these lines would be our axons or connectors or our synapses, so the connections between neurons. This is where our information get passed on to other neurons. Where are neural networks used? Speech recognition, so classifying and recognizing spoken words, natural language processing, financial forecasting, medical diagnosis, and additional applications. I apologize, guys. Can you guys not try and annotate my screen? It's a little distracting. Thank you. Any questions here? Now, moving on to our vocab, I already touched on this a little earlier. Neurons, the basic computational unit in a neural network. Each of these circles you see here in this graph representation of a neural network is called a neuron. Neural networks are composed of interconnected nodes or neurons organized in layers. These circles, they're also called nodes. You may use them, the terms interchangeably. They are organized in layers. Our first layer is called our input layer, and they receive initial data or our feature vector. Remember in Intro to ML, that's our workshop 2, we learned on how to make these feature vectors. This is being inputted into our neurons, and then our second or third and fourth. However many layers in between our input and output layer, these are called our hidden layers, and they process information. They're called hidden layers since they're not exposed directly to the input or output. Lastly, we have our output layer, and they produce the predicted final results. The number of nodes you have in your output layer is the number of possible outputs you could have for your given model. Now, more vocab. Weights. These are the strengths of connections between neurons. Weights are represented by our neural network graph by these lines connecting the neurons, and the actual values are not shown. This would be a number, and we're going to talk a little more about weights and how to compute them later on. Last but not least for our vocab, we have bias. Bias is an additional parameter in each neuron to adjust its output and the actual values are not shown. Bias is just another value that can help our model predict the right output in the end. Going back, weights are very important, so I'm going to do a quick example. If we're talking about classifying in a dataset with the given data, is this a dog or is this a cat? Now, the first input feature vector could be of the color of the dog or cat, and the second could be about the tail shape of the dog or cat. What are our weights at first are randomly initialized, which means since we don't know which of these features are important for the classification of whether something is a cat or a dog, we just randomly initialize them so that we can later figure out what is or not important. At first, these are random, but eventually, our weights will either decide, will give a weight to each input. This would mean that our color is a very important differentiating factor or the shape of the tail is a very important differentiating factor. I would think the tail would be a very important differentiating factor. Since the shapes of tails for dogs and cats are pretty different in comparison to the colors of dogs and cats are generally pretty similar. However, our neural network would eventually decide which one is better at classifying whether something is a dog or cat, and that would be shown in our weights. Now, neural networks are a type of machine learning model inspired by the human brain structure and function that can solve both regression and classification tasks. What you learned last meeting, both those tasks can be done by a neural network. Neural networks learn by going through multiple passes of data through training. Weights and biases are adjusted so that the neural network can accurately predict the output from the input. We're going to talk more about how to compute the training of a neural network. By the end of this meeting, you guys should be able to know how to do that yourselves or please ask any questions if you don't understand by the end of the meeting, we can get that figured out. From here, Coven is going to start explaining the math behind neural networks. Let me see if he's able to rejoin. It looks like not, so I'll just keep an eye on that and we can keep moving forward. Going into the math of a basic neural network, if our input feature is the vector 0, 2, what is the predicted output? Remember, each line we see is a weight, and these weights are represented on the side here. Weight 1, 1, that means it's from our neuron 1 and going to the hidden layer neuron 1. That format is here on the right, and our bias is just the added numerical value. It's an additional parameter that we are adding on to our weights and input here. For our first node, you're going to multiply our weight 1 by our input 1, so 0 would be our x1 and 2 would be our x2. What is going into our hidden layer node 1 would be our x1 times our weight 1 right here, and our x2 times our weight 2,1 right up here. Then once we multiply these, we're also going to add our bias, which is b1 here at 0. Here's the math for that, 0 times 3, 0, 2 times 1 is 2, plus 0, so that equals 2. For our second node, we can do the same. We're still multiplying our weights times our inputs, just a different weight this time since we're moving on to different lines that are shown visually. Our input feature is still the same, 0 and 2. Weight 1 times our input feature 0, and this is propagated again, so 0 times 2, 0, plus 2 times negative 1, minus 2, plus 4, so that's going to be 2 again. Remember here, we are adding our bias 2, so that's why we're adding 4 instead of 0 here. Is this all making sense? I know it's a little confusing with all the math. I'm moving on to doing our third node, our weight 1 times our input feature 1 again, and then our weight 2, 3 times our input feature 2, added by our bias 3, which in this case is 1, so this would equal 1. Now handing this off to Calvin, he's going to go in more depth and explain a little more about this. Thank you so much, Naomi. We can go a little bit more in-depth into this. Can I get the remote access? Perfect. Now let's go into the next slide. Basically, the way these weights are calculated and strengths are calculated is that you multiply the weight, the W1, 3 times the X1, and I'm sure Naomi covered this, but plus the W2, 3, and then times X2 plus your third bias. If you're doing those calculations, we see that the W1, 3 is 0, the X1 is negative 2, the W2, 3 is 2, the X2 is 0, and the third bias is 1. Doing that calculation, which I'm sure you guys are very adept at, is 1. That is the output for our third node. Yeah. Given that, we saw that in the first node, our output was 2, and our second node, the output was 2 as well, and our third output for the third node was 1. There we go. We can see this is the predicted output and this is the shape. It's 2, 2, 1, and this should be our predicted output. Now, let's do a quick question. I want everyone to look at this. The question is, write each entry of the output vector separated by a comma. For example, negative 1, 1 would represent the above vector. We want you guys to do this problem. We've given you the weights and the biases on this side, and then I can go over the explanation. Please put your answers in chat, and then we can go over the answers. I'll give you guys two minutes to do this. Okay, everyone can, is that enough time for everybody or can someone tell me if they need more time? That was a minute, we can go for another minute if we need it. Okay, yeah, so we can do a bit more time and some people say are saying that they're not understanding Okay, we can go over this in our solution Yeah Okay, um Yeah, so we can go ahead and i'll skip to the next slide And a lot of you sent me a chat saying that I don't know how to solve this So that's what we'll be going over in the next one So Okay, yeah, so the predicted output of the input feature is negative two one three and some of you guys got that So if you guys did get that good job, but I will explain it now. So the way this is calculated is that um maybe Can I annotate on this perhaps? I think I can. Okay. So the way this is calculated is that you do your weight one one and then Times your x one plus your weight two one plus your x two plus your bias term Using this formula we get that the w1 one is three your Your x value is zero your uh Your w2 one is negative one x value zero plus your bias term is zero So, um, does it mean what so, um, it would it's doing weight one Times x one plus weight two times x two plus b Oh, sorry Yes, x one is actually negative one and hold on let me uh quickly pull up the slides on my end and I can go through this, um with you guys, but um Yeah I think yeah. Um Oh, sorry. Yeah. Yeah. So yeah, you would go through this value. Sorry. So let me annotate this So this is actually x one and this is actually x two and you can see that is our input So let me quickly just erase. Yeah, so the formula that I mentioned is still correct And we can use that completely fine. I just used the wrong value for x one but You can see I apologize, guys. Let me try and figure this out. Yeah, okay. The screen share disappeared, so. Can you see the screen still? Oh, okay. Yeah, I can see it now. Sorry, I clicked off the zoom for a second. Okay, yeah, okay. So, sorry for the mishap, guys. Okay, so, yeah. So, the way it works is... Yeah, yeah. So, the predicted output, if the input feature is negative 1, 1, is negative 2, 3. So, you do your weight 1, 1, which is 3, times your x1, which is negative 1. So, this is x1, this value right here, negative 1. Then, what you do is you do your weight 2, 1, which we can see is negative 1 right here. So, you say plus negative 1 times our x2, which is 1. And we want to be really careful about our negatives here, and I will make sure to do that. And plus our bias term. Plus our bias term. Plus our bias term is like b1, which is 0, right? And so, given this, we can see that this is negative 3. And then, we can do plus... Actually, sorry, guys. This is actually x1. So, the negative 1. And then, plus... Yeah, yeah, good catch, Arjun. And so, this is plus 1, and we can see that this is negative 2. So... Oh, yeah, yeah. Yeah, go ahead. Negative 1 times negative 1. Wait. Isn't x2 1? Isn't weight 2, 1, 1? I think it would be negative 1 times 3 plus negative 1 times 1 plus 0. Yeah, yeah. That's what that would be. So, we can go over this once again. I see many of you guys are saying in the chat, and I agree with that. And so, we can go over this value. And please correct me if I'm putting the wrong values in this. I'm putting the wrong values in this. So, our w1, 1 is 3. And our x1 is negative 1. So, we're putting the value like this. And then... And then, what we're doing is we are taking the w2, 2, 1, which is 1. I accidentally used w2, 2 last time. So, we are going with this value. So, it is 1. And we are going to multiply that by our x2, which is a 1. Then, we're going to add our bias 1 term, which is 0. And so, doing this, we get negative 3 plus 1 is equal to negative 2. And that is how we get this value here. Does that make sense for everybody? Please let me know in the chat if that made sense. Okay, so, I'm getting some no's. Okay, you maybe want to... Okay, so, I'm getting some mixed signals. Okay, I'm getting some yes, I'm getting some no's. So, what I can do is I can move on to the next... To calculate for the next one. And if you still have questions about that, then we can go over that one as well. So, let me erase this. And in the screen... OK, perfect. I think additionally, yeah, so now if what we do is that we use the formula to find the second node, which is W21 times X1 plus W22 X2 and then. And then plus our bias 2 term. And so this is the formula that we were going to be using. In this, so we're going to be using our. Yeah, so we're going to be using the so we're going to be using the 12 and then the 22. So then you given this 12 we say this is 2 and then given our X1 values negative 1. Our W22 is negative 1 plus our X2 value is positive 1 and with our bias we can see that it is 4. So this adds up to negative 3 plus 4 is equal to. 1 and so that's how we get this value. So hopefully, yeah, OK, so now I'm getting a few get it's in chat. So would the answer be? Wait, so it's 1 times negative 1 plus negative 1 times 1 plus 1 half. No, there's no 1 half because we're using our bias 2 term. Sorry, this should be a B. This is our bias 2 term. That's why we're adding the 4 here. So last time we added the bias 1 term is 0, but here we're adding the bias 2 term is 4. So that's how we get the value of negative 1. Does that make sense? Yeah, OK, perfect. So now. Okay, so yeah, I cleared all the drawings and now let's move on to the next few slides So then you would do that you repeat the process for the third node as well And you would get the predicted output is negative the shape of the predicted output is negative 2 1 negative 2 1 3 Okay, so now given that can we do another practice guys and so I did a walkthrough so now the The Input size is different. I think our weights are still the same so you guys should be familiar with the weights and biases because they are similar on the same from last time, but our Input shape is different different so I think we can I will give you guys maybe three minutes because this does take a little while and We can see where you guys are at I don't understand how to get the B. Yeah, so the B value is what you are given here. So for the first node, you'd use B1 is 0. For the second node, you'd use B2 is 4. And for B3, you would use 1. And I see a lot of you guys are getting the correct answer, but a few of you are only giving me one answer. So remember, it would be a three-dimensional output. So we can go over that after three minutes has passed. But yeah, so basically the way that works is that it would be a three-dimensional output. And I think we can move on now because a lot of you guys got the correct answer. So now let's move on to the answer. So the answer was 10, 9, negative 5. And I think a lot of you guys got this. So I definitely think this is good. So I think you guys do understand this topic pretty well. And if you guys do need some more practice, we will upload this recording, and we will go from there. Okay, so now let's talk about the activation function. So an activation function is a function that transforms the input data into the output of a neuron. So an example of this is like identity, so which is the no modification, which just means y equals x. And then the ReLU, which is a rectified linear unit, and the sigmoid function. And so these are all the ReLU and the sigmoid is terms that you guys should know for your actual test. And so the ReLU is just the max of 0 and x. And so you guys should know this equation. We don't want you to know the sigmoid equation because that's not really as important. We just want you to be able to describe the shape. All right, I skipped a slide. But okay, so now let's figure out how to do the neural net calculation. So the first step is we calculate the intermediate output by multiplying the weights with the inputs and adding the biases. And this is what we did in our last two slides. And we did two practice problems on this. Then what we do is that we apply the activation function. And if we don't have an activation function, that means that it is the identity function, which is y equals x. So what we do is that after we calculate the weights and biases, as we see in the right-hand part of the slide, we apply the ReLU function to it, which is an activation function, as we learned in the last slide. Okay, so now let's go over some more vocab. So let's go over the loss function. So what is a loss function? So a loss function measures the difference between the predicted and actual output. And basically what a loss function does is it basically explains how well your model is working. So, for example, if I'm looking to predict stock prices and my model predicts a stock price of 2,000 when my actual stock price is at 100, that would mean my model is very, very bad, right? Because it isn't able to predict accurate prices. And you would have a very high loss in that case. But an example of a loss is the hinge loss. And basically what this does is that you look for the sign of the predicted and the actual value to be the same. So as you can see, this is the equation. So y is the predicted value, and then t is the actual value. So to find the hinge loss, you take the max of 0, 1, the max of 0, and then you take comma 1 minus the t value, which is the actual value, times the predicted value. So if the predicted value of our neural network is negative 5 and the actual value is 3, calculate the hinge loss. So what we would do is that we would say that max of 0, 1 minus negative 5 times 3. So that would be the max of 0 to 16. And obviously 16 is the max, so it would be 16. So we're going to do a practice of this on the next slide. So calculate the hinge loss if the expected output is 3 given the input. So this is a little longer, so what you have to do is that you have to calculate the process. Then you have to apply ReLU. Then you have to apply the hinge loss. We are combining everything that we learned from today's lecture. So a lot of you guys did the 10, 9, negative 5 thing. So you have to do that, then apply the ReLU function, which is the max of 0 to x. And then once you've done that, you have to apply the hinge loss. Wait, what is T again? T is the actual value and Y is the predicted value. Oh, good. But how do we know the real value? Yeah, so that's what we're supposed to calculate and sorry, um, one second. Yeah, I can I can show you guys in the next Question and I see a lot of you guys aren't able to get this one. So I will Get the I will explain this in the next slide So We can move to the next slide, okay Yeah, I'm perfect. So now we can calculate the hinge loss if the expected output is 3 given the input So what we do is that we use the process from last time and we say 3 times 2 plus 1 Negative 2 plus 0 because we do 3 The out the expected output is 3 plus 2 so I Can annotate this so we use the equation we learned last time. So we apply 3 here We use 2 from this So that makes up this term then we use 1 from here and that makes up and then 2 from here Which makes up this term and then we have our bias which is 0 and then so that makes up this term So now given this term we have to apply the ReLU function And so given the ReLU of 4 the ReLU is just the max of 0 to X and the obviously 4 is the max So that is how we get this right here So we take the max of 0 to 4 and with the way we get 4 is by calculating this which I just went over Then what you do is that you go through the hinge loss you take this value And so it tells you the expected output is 3 so you take this formula right here and you you calculate it And so that's what we do here. So you say the max of the Sorry, we go through the max of the 0 and then we give the 1 because it's a formula minus T Which is the expected output times the predicted output which is 4 so then we get and so that becomes max of 0 and Negative 11 and so obviously 0 is greater than negative 11. So that is why our hinge loss is 0 Does that make sense to everybody? So to calculate the hinge loss, it's a three-step process first You have to calculate your predicted output and you have to apply ReLU Okay, that's a great question. Where do we get 3? Okay, so first let me just answer a few questions in the chat. So how do you find the max? So the way you find the max is whichever number is greater. So you can see here in this value 4 was greater than 0 so our value is 4 but then in here our value was 0 comma negative 11 So a 0 is our answer Okay, perfect so where do we get the 3 so The 3 is the expected output. So to do this step, right these steps are for the Predicted output but 3 is our actual output. And so that is where we calculate the hinge loss So this is where we're predicting in these steps, which I've bracketed We were finding the predicted output but in the actual loss step where we get the 3 right here That is where we are using the expected output, which we are given here expected output is 3 So that makes sense for everybody? Okay, perfect so now let's move on to the next few slides Okay, so now let's go on to a Okay, so that's a really good question is the hinge loss always not a 0 So the hinge loss is not always 0 because it always depends on your Your model's predicted output and the actual output. And so that's a really good question So as your model is training basic, you always want the loss to get less and less Because that means the difference between your actual predicted output and your So if your loss is smaller, that means your model is getting better and better So let's move on to the next slide Okay, so perfect. So now let's go into the term about backpropagation. And so you guys don't really You guys don't really have to know this super well, but and for the competition But you guys just have it like a basic understanding of this So let's move on to the next slide You guys don't really have to know this super well, but and for the competition But you guys just have it like a basic understanding of this But basically how Naomi was talking about how the model uses predetermined weights Just a random weights at the beginning But these random weights might always not always be the best So for example, color might not be as useful as for example, tail shape as Naomi was saying But how does it assign different weights? And so this is how it does it through backpropagation So after for example, the hinge loss is calculated It goes back and updates the weights for which is the most important feature And once this happens is that it is able to slowly update and update the weights And then you're therefore your model is able to get better and better Okay, so now let's talk about the full neural network pipeline So we're given our features, which is just our input data It goes into the neural network Then after the neural network, we get a prediction Then we take in given the labels, we calculate the loss, which is just the hinge loss And then we talk when then we go into the process called backpropagation Which I covered in this previous slide And then the process repeats And so backpropagation in a simple sense, just essentially it makes your model better By updating the weights and biases So now let's go into a code example And I think this is one of the questions on your homework So I think you guys do want to take some notes on this So basically the way Yeah, let's go ahead Okay, perfect So let's move on to the code example So let's look at the So in these first few lines, we are just importing some Libraries to use for a machine learning model Then what we do is that we say the model is equal to the sequential model And then we're saying the model is equal to the sequence And then we're saying the model is equal to the sequence And then we're saying the third, we're having three inputs And then which is our relu here Then we're saying So we're saying the input dimension is two, which you can see is here Our relu is three, which is this value right here And then our five is this value We do another relu and then our dense So we go into the sigmoid, then we go here So this is what happens in the full neural network pipeline And so you guys will have a homework like this So if you guys do want to write this down I think we will post a recording of this on the Google Classroom as well So you guys can look back through this as well Okay, so now let's go into like some of our final slides So basically some of the pros and cons in neural networks Is that they can learn complex nonlinear relationships and data And they're very adaptable to various tasks and domains They are capable of handling large amounts of data And can automatically extract features from raw input Some disadvantages include that they require large amounts of data for training Can be very computationally expensive and time-consuming to train And not very interpretable and very prone to overfitting Okay, and so let's just go over overfitting And you guys just need to know a quick definition of this Nothing too serious But overfitting is when the model starts memorizing the training data And cannot generalize well to data it has not been trained on So I think Naomi can go over the homework now Or like for next class just show you what it is Yeah, thank you guys Yeah, thank you, Cobin So our homework I can stop sharing So I can get open the Google Classroom Also, feel free to drop after 9 30 Or we can go over one more example of computing neural networks Since I know that was a little tricky And it is the baseline for a lot of the things you've done today So if you're able to, you can stay for that If you guys want to learn about more things in using machine learning models, such as neural networks, you can head over to this site to see some student projects that have been done. You can look for the name neural networks to see if any of those models there use neural networks. The homework you should just be able to find on the Google Classroom. It should be labeled bootcamp or workshop number 4. I can start doing the neural network computation example again. Feel free to drop off now if you have somewhere to be or you can stay if you want one more quick run through. Thank you to everyone who showed up and I'll see you next week. I'll get the whiteboard sharing right now. Okay. So let's see. Okay. Can everyone see the whiteboard? Okay. Can you please not write on the whiteboard because I'm about to be doing some computations there, so it might distract some people. Okay. So let's compute neural network again. Let me just get an example open. Okay. So see, or maybe it'll just be easier to share what we just were doing this time. Okay. Can, let's see, what can you guys see? Okay. This one, perfect. So let's present your view. Okay. So we can go over this really quick. Maybe do draw. Okay. Sorry, guys. Coven, are you able to annotate? I don't think I can do it when I'm in the Google Slides. Okay. Yeah, I can annotate if I want to. Okay. Perfect. Okay. So I was just going over another example. Yeah. Okay. So just, yeah, take your time. This is the same example we went over, but just maybe one more time might help. Okay. Perfect. Okay. So to anyone still here, thank you so much for being here. We really appreciate it. And yeah. So anyway, so what the formula is, is that we want to go to w1, 1, which is this value, plus times our x1, plus our w21, times our x2, plus our b1 value. And so if we now input all of these back into our formula, all these values, the respective values into our formula, we say that this is 0. Then we get our weight is 3. So like this, plus our w21 value, which is 1, plus times 2, plus our bias term of 0. So if we do this computation, this is 3 times 0, which is 0. 1 plus 2 is equal to 2, plus 0 is equal to 2. So this is the value for our first node. And so that is how that is calculated. So that is how that is calculated. Now let's go, does that make sense to everybody? Can we just get like a quick mention in the chat if that makes sense? OK, I'm not seeing anything. So I think, OK, so that makes sense. OK, perfect. OK, and I think she was saying throughout the lesson that it was a little difficult to understand. I'm really glad that you understand it now. OK, so now what we can do is that we can do the computations for our second node. And so the way that we do this one is with, oh sorry, yeah, I can delete my drawings. Yeah, see. And so for our second node is that we can use this formula right here. So we can do, I mean, I can talk through it. I can also annotate. So I can just annotate this. So we do w12 times our x1, plus our w22 times our x2, plus our second bias, because we're on the second term. So then what we do is that given this value, we can substitute our values in again. Once again, we say 0 and then our xx1 value is 2, plus our 2 times negative 1, plus 4, which is our bias, is equal to 2. That makes sense to everybody? Can we get a quick check in the chat? Okay, perfect. I just got a yes. Okay, perfect. So now let's move on to the third node. Okay, yeah. Our third node is now here. So basically the way this works is that it's the same compute. It's like the same type of computation, just a different formula. So we can do our w1, 3, our x1, plus our w2, 3, times our x2, plus our b3 value, because we're now in the third node. So you guys have to understand that we're iterating through all our biases and all our weights to calculate for each node. So given this, we see our 1, 3 is 0, because that is our input, times negative 2, plus 2 times 0, plus our third bias, which is 1. So this all equals 1 when you do the computation. How did I get w1, 2? How did I get w1, 2? For which value? Is this on the third node or the second node? Are we talking about the... Oh, second node. Okay, but okay. So before we go back to this, do you understand the third node? Yeah, we can definitely go back to the first, this last set. Okay, perfect. You understand this one. Okay, so we can go back to the second one. So yeah, perfect. So yeah, the way this one works is we do w2. Sorry, the order is a little switched around here. But if I was to write this out numerically, we have our w1, 2 is 2. You always have to look at your key. So we will always provide this sort of key here, like this box. We will always provide this. And you just have to extract the values from this table here. And we'll also always provide this. So you have to extract the values from this and this. So you take your w1, 2 value from here, which is 2. And you take your x1 value, which is 0. So you do 2 times 0. And then you do plus your w2, 2 value, which is negative 1, plus negative 1 times your x2 value, which is 2. Plus your second bias term, because we're on the second node, plus your 4. So when you do these computations, you get 2. And so that is our value for our second node. Does that make sense now, Dia? Okay, perfect. I just got a yes from her too. Okay, sounds good. Okay, now what we can do, does that make sense to everybody? And then you guys want to go over the relu and the sigmoid stuff again, if you guys have time, or is that too much right now? Okay, yeah, some people want to go over that. So we can go over that. Let me clear all the drawings. So yeah, okay. Yeah, okay. So now what we want to do for this one is go into the formula. So what we're going to do here is that we're going to take a step-by-step process. And this usually consists of three steps. So in the first step, we do the calculation that we just did. Then what we do is that we apply the relu function, then we apply the hinge loss. So if we do the first function, and the way I would annotate this is similar like this. So what we do here is we would do three, we would take our w1 value, one value times our x1 value, plus our w21 value times our x2 value, plus our bias term. And substituting these back in, we see that this is two, three. And then we can go, we can say this is two plus three. And then we can say this is w2 is negative two plus one, plus our bias term, which is zero. So doing this computation gives us four. And so now what you do, so this is step number one. Okay, guys, so this is step number one, and you've gotten your value. Then step number two is applying relu. So you want to do relu, and your input for the relu is the output of your step one. So you do relu four, and given the equation of relu is just max of zero to four. And the way this max function works is that you always have the zero in the first place, and your other number in the second place. So the relu four would make it max of zero and four. And so you pick whatever number is highest on a numerical scale. And that is why I think someone else asked during the lesson, will the relu or the hinge loss always be zero? And this is why. It will always, it's not always zero, it would have a perhaps like a negative value here. So for example, if I had a negative four, the relu would be zero. But since we have a four, the higher number is four, so our output is four. Then we go through the hinge loss. And so like I mentioned before, these steps, they do give our predicted value that the model predicts. They're not the actual value that is in our data. So now we want to find the hinge loss. And the way we find our hinge loss is we do the max of zero minus our expected output is three. So this is our actual value. So that's why we would use three here multiplied by four, which is our predicted output. So now given this value, we can say this is the max doing this computation. We can say this is the max of zero and negative 11. Oh yeah, yeah, sorry. It should be, yeah, yeah, sorry. I forgot. Yeah, it should be something like this. Yeah, good catch. Good catch, Arjun. So given that computation, we can see that this simply boils down to max of zero and negative 11. And since negative 11 is smaller than zero, our max is zero. And that is why our hinge loss is zero. So it's a three-step process. So this, I can maybe use a different color to show you guys. This is step number one. This is step number two. And this is step number three. And so this is all, and this is what you do. And so for example, if they ask you calculate the hinge loss, you have to do this three-step process. And so this is how you get the hinge loss. And this, for example, if this was the test question, your final answer, you would box it and it would be zero. Does that make sense to everybody? Yeah, okay. So how did I get max of zero of negative, max of this value is equal to negative 11. Okay, so yeah. So the way this works, right? Is that the max function is you have two numbers, you have zero and a negative 11. So which number is bigger? It's going to be zero, right? Because negative 11, if you think of it like on a number scale, we have something like this on a number scale. We have zeros here, we have 10 here, and then we have 10, negative 10 here. Obviously our zero value is going to be greater than negative 10. So that is why if you take a max function of whatever value you have, it's always going to be the higher value. That's why for this value, it was four. Does that make sense? Okay, yeah, perfect. Okay, so I got another question. I don't get step one. Okay, I think we're running out of slide space here, but I can make this work. Okay, I don't want to clear all my drawings because I... Okay, so what I can do is I can erase... Okay, so is everyone good with step two and three then? So I'm going to arrest step two and three. Do we get step two and three? And we can go back to step one. Okay, perfect. So we can go back to step one and I can erase all my drawings for this. So yeah, so the way the model... So this is the way of doing the model's prediction. So we're going to use this formula right here. And basically, so we're going to use this formula, right? And like I mentioned before, we're always going to give you this for a question and we're always going to give you this for a question. So given this value, we say that our w11... And I can rewrite this, it's no problem. Okay, yeah, okay. So yeah, I can quickly just do this value thing. So we can do this w... Sorry, this is x1. So we can do w11 and we can do x1 plus our w21, 1 plus our x2 plus our b1 value. Given this, we can see this is 2, 3 plus negative 2, 1 plus 0. And this, after doing this computation, we can get 4, right? And so this is... And then you apply the ReLU and then you calculate the hinge loss in steps 2 and 3. So I think this is where we're going to have to end the meeting, guys. If you need any more extra help, please message me on Google Classroom. We can definitely figure something out. And once again, if you do struggle on your homework, just shoot me a message through Google Classroom and I'll definitely get back to you. Okay, thank you so much. Thank you, everyone. And we will be going over all homework problems next meeting. So please try and attempt them. That's the best way to learn. Just try and then you'll have some understanding. If not, you guys will be able to get it. And then next time, we can go over the homework. So we'll correct some things. And yeah, we'll see you guys next Sunday. Yep, thank you so much, guys. Bye.
Video Summary
The video transcript focuses on a homework review session, followed by a lecture on neural networks. The instructor first explains how to calculate accuracy using a confusion matrix, emphasizing true positives and negatives. The class progresses to discussing the mean squared error and how to approach a best-fit line using regression analysis. The instructor also clarifies the difference between regression and classification problems, illustrating with examples like Yelp reviews and predicting energy consumption.<br /><br />The lecture pivots to neural networks, where the instructor illustrates their structure, inspired by the human brain, consisting of neurons organized in layers. Key terms such as weights, biases, and activation functions, particularly ReLU, are explained. The complex concept of backpropagation is briefly introduced as a method to optimize the network by adjusting weights and biases to reduce loss.<br /><br />A practical coding example outlines constructing a neural network model using Python's Keras framework. The instructor highlights the importance of understanding neural networks' computational aspect by breaking down calculation steps, applying ReLU activation, and deriving hinge loss.<br /><br />Pros and cons of neural networks are debated; they can handle large data sets but require significant computational power and risk overfitting. The session concludes with Q&A, reiterating neural network computations and emphasizing hands-on practice through provided homework, with a promise of reviewing submitted work in the next class.
Keywords
homework review
neural networks
confusion matrix
mean squared error
regression analysis
classification problems
activation functions
backpropagation
Keras framework
ReLU
overfitting
×
Please select your language
1
English