AI 101 - Course & Competition - Grades 7-12 - Sunday@11:30am EST-Recording Class 3

Recording Class 3

Video Transcription

Okay, so let me just copy the table that we have from the homework assignment, right? It's just this table.  Oh, is asking questions about ML on Google allowed? Oh, you can definitely, if you want to do some research or you want to spend more time  learning some of the concepts or knowing what's going on, you're definitely allowed to Google. But I would say, for the homework assignments, it's stuff we covered in class.  So try to just re-watch the lecture or do something like that as opposed to, you know, like Googling the answers or anything like that. But yeah, if you wanted like a kind of  a second way of learning stuff, you're welcome to look at other things. But yeah, any sort of Googling will not be allowed during the competition.  OK, so our first question is we want to do the Gini index when we split based on the browsing history attribute. So what I'm going to do is.  I'm just going to copy this part because we just need when we are looking at an individual attribute. Right. We are only looking at that attribute and nothing else.  And then what we're going to do is I'm going to write down the Gini equations here. Right. The Gini index was one minus the proportion squared.  And then if you wanted to do for the split, then the equation for that is the weight times the actual Gini index.  Awesome. Some people are saying, do you re-watch the lecture? That's a really great habit.  Even like when you go to college and stuff, you are welcome to submit the homework any way you want. You can do it a Google document. You can do screenshots of the work.  I know some of you don't show work, but I would love to see that work just so I know your thought process in case you go wrong anywhere.  So if you can show the work and just like you can take a picture and upload it like totally anything is fine, whatever works for you.  You don't necessarily have to type everything. Yeah. No worries if you didn't get a chance to do your homework. You know, you can always submit.  I got a bunch of submissions for not this week's homework, but the past week's homework, the one before that. And that was awesome. Like you can submit any time.  The idea is this is for you to learn so that you are know how to solve these problems for our competition.  And sorry, my mouse is controlling my iPad right now instead of my computer. So let me. Trying to move as far away as possible. OK. Really annoying.  Today's a really bad technical difficulty. OK.  So what we're going to do is first we need to kind of classify this into two different attributes like the before the browsing history attribute,  the visited product and didn't visit. So we need to kind of take each case separately and we need to calculate the Gini index for each of these.  Right. And so then we see how many will buys and will not buys are there.  So for visited product, how many will buys and will not buy, you can just put it in the chat.  Very quickly. And then I'm going to also put total how many total there are of visited product and how many total of didn't visit.  In the chat very quickly. How many of these are visited product will buy. Will not buy. Yeah. Two will buy. Right. One will not buy. Right.  We can just look at these three rows and there's two that will buy and one that will not buy. So for a total of three and four didn't visit, we look at these rest of them.  So for them, you have two that will buy and then three that will not buy for a total of five.  OK, so far so good. OK, so then we need to just plug and chug into this equation here so we can just say one minus.  And first, we're going to calculate the Gini index for visited product. So one minus two over three bracket square plus one over three bracket square.  The three here is coming from the total, right? It's a proportion. So two divided by three and one divided by three.  We square them and add them and then we get four over nine for this. For this part, you can definitely use a calculator. Then we calculate the Gini index for didn't visit.  So then we do something similar where it's two over five, which is our total bracket square plus three or five bracket square equals twelve over twenty five.  For now, you can work with fractions or decimals at the time of the competition. We will say which one you should work with. But for now, either is OK.  So then. Then let me just copy this over to the next page just so we have room. Now we need to calculate the Gini split, right?  So we need to see what the weight is of each. So for the weight, all we need to look at the total.  So we have three total five total for a total total total of how much three plus five is eight.  Right. And so the weight for each individual thing is three over eight and five over eight. Now, this one is the summation of the weight times the Gini index.  So we calculated the Gini index already. So we're going to just multiply it over here. And that will give us seven or 15. And that's our Gini split so far.  So good. Any questions? OK.  Then I will move on. So then we have we need to do the same thing with age and purchase decision.  Again, we can copy down our equations here just for our ease.  Definitely if you're making a cheat sheet for our competition, then having these formulas would be helpful. Yeah, could I explain it again? Yeah, so which part?  Can you, is it just the Ginny split one or subtract from the baiting? Okay, Ginny split. Yeah, so we calculate our Ginny indexes, right? So that's where we were at.  Then I just take, this is the total, right? Like if we look back at our chart, right? There's three of them are visited product, five are didn't visit.  So that's a total of eight rows right there. And so our weight is calculated, it's again, a different type of proportion where we have three visited and five didn't visit.  So then our weight is three over eight and five over eight. And so basically the equation for Ginny split is you need to sum up the weight times the Ginny index.  So the weight I just getting from this proportion that it's three over eight and five over eight. And then the Ginny index, we just calculated. So I'm good.  That's where I'm getting it from. And for a total of seven over 15. So that's our full Ginny split. Does that make sense? Or is a particular, okay, cool. Okay.  So now we're going to do, now we're going to do the same thing for age and purchase decision. And so now there's three categories, right? Young, middle aged and old.  And again, we need to break down here, the will buy and will not buy.  I'm just going to copy that over for each of these categories. Sorry, I'm running out of space here. Okay. So for young, how many of them will buy and how many of them will not buy?  Yeah. All of them, right? All of them will buy. There's none that will not buy, right? All you need to do is just look at this graph, right?  See there, see all the rows where it's young and purchase decision is will buy. For middle aged and old, will buy, will not buy. Yeah. All are going to not buy, right?  So we're going to have two here, two here, zero for each. Okay. So far so good. So what is our Gini impurity for each of these? It's zero, right?  I don't need to even put this into my equation. You can, and if you did, you know, that's totally fine, but we want to save our time, right?  When there's so many mathematical calculations. The fact of the matter is that all young people are going to buy and all middle aged and old people are not going to buy.  There is no impurity. Impurity would mean that, you know, some will buy and some will not buy for each of these  categories, but it's pretty consistent throughout that for a given age group, they all are going to do the same thing. So the impurity here is zero, zero and zero.  Now if we were to calculate the Gini split, and I'm just going to do this, even though it might be already obvious to you what the Gini split is, just so that we have a little  more practice with Gini split since that was the new concept. What is the weight for young, middle aged and old?  And there are like four young rows, there are two middle aged rows and two old rows. So what is our weight for young?  We're calculating a proportion, right? How many are there young in this whole full, how many rows? Yeah, four over eight.  And then same thing for middle aged and old, if we were to do it, what are the proportions for them? Two over eight. Perfect. Excellent.  So then we just multiply according to our split equation with our Gini indexes that we calculated. So and then we just add it up and that obviously gives us a zero.  If you have the exact same, you know, Gini index for each of these categories, the Gini split is going to be the exact same because this is nothing but eight over eight or one  times zero, right? So zero. Okay, so we got that our Gini split was seven over 15 for browsing history. And for age, it's zero. So what should we split on? These are the G splits.  We should split on age, right? Because that has the least bit of impurity that has zero is less than seven over 15. And we always are striving to get the Gini impurity smaller.  And that makes sense, right? Because now and this will not a homework question, but I could have also asked you this. If you wanted to make a decision tree, how would you make it?  Well, you could have something like this first root node would be if the individual is young or not. If the individual is young, then will they buy or will they not buy?  They will buy, right? They will buy. And if the individual is not young, hence their old or middle age, they're not going to buy, right? And that's a super simple decision tree.  Had we chosen browsing history instead, this decision tree would have had multiple levels, right? And we always want to go for the simplest sort of decision tree, right?  So this one is very, very simple. Make sense? Yeah. Thumbs up? Perfect. Cool. OK, now for our next question, which columns should definitely not be considered as features?  And so you have the situation is you're creating a machine learning model that wants to predict which customers will cancel their storage subscription.  So you have this data set and within the data set, the columns are customer name. And so I'm just going to write each of these down and then we can talk about them.  Customer name, account creation date.  Customer email address. Customer ID. Number of uploads in the past week.  OK. Customer name. Do we want to consider this as a feature? Yeah, I see a lot of no's, right? Customer name. And I saw this in someone's assignment. I forgot who it was.  I think it might have been Kitty, which I really loved the way she described where it was like, hey, these things should not be considered as features because they're only  particular to a customer, right? And I love the way that was explained, where customer name is not really helpful in predicting  which customers will cancel their storage subscription. It doesn't matter if the person's name is Bob or Emily or whatever, right?  So that definitely we don't want to consider as a feature.  Yeah, that's fine, Akhil. So account creation date. Do we want to potentially consider that if we want to predict which customers will cancel their storage subscription?  Yeah, it's sort of important because if you have like a customer that just joined maybe in the last six months, usually they're more willing to cancel as opposed to someone who's  been there for 12 years. You might modify this column.  So such as instead of account creation date, you might do like years since their account was first created, one year, two years or something.  But you still want to use this as a feature somehow, right? Customer email address. No, we don't, right? That is, again, unique to a customer.  It doesn't matter if the person's email is ilovecats at gmail.com. It has no bearing on this. So again, we're going to like delete this. Customer ID. Do we want it as a feature?  No, right? Again, that is unique to the customer. It doesn't matter if it's 123XYZ, whatever their ID is. Number of uploads in the past week. Do we want this as a feature?  Yeah, exactly. We want this as a feature, right? If the person has done like 100 uploads last week as opposed to zero, that person is less  likely to cancel because they are using this service constantly. And honestly, here's what a pro tip I'm going to say is machine learning, once you kind  of get the fundamentals, is not that complicated. Anyone can do it, right? The code is just a matter of knowing how to use the template codes and putting it all together.  So what distinguishes a good machine learning engineer or a mediocre machine learning engineer from a really great one? And that's the selection of features.  The really great one will really think deeply about what features need to be selected. You know, the actual models and stuff, there's a lot of like Python libraries that kind of  do it for us. And it's not that complicated, but it's this part, this theoretical stuff that is super important.  And the fact that, you know, I saw mostly, you know, correct answers, that is awesome, right? You guys are definitely on the right track. OK, so that was question two.  Now for question three, I know some of you wanted us to cover feature vectors a little bit more, so I'm going to do that because that is, again, very important and crucial  to machine learning, right? So we have individual's name, right, Alice, Bob, Kara, and then grade and class, and then  study habits. And so while I'm writing this down, the first thing that you guys should be thinking about,  which you already have, if you did the homework, was which of these are indeed things we should consider as features, right?  To create our feature vector, you should only consider features. So this is really a two-step question here.  So feel free to write in the chat which one should be considered as features. And this one was a little hard because I didn't actually say what our aim is, what  our goal is, but I think you guys guessed, like, what the goal of, if you are making a machine learning model, what the goal of this problem was.  OK, interesting, I'm seeing a lot of different answers. A lot of you are close. And this is my bad a little bit that I should have said what we were trying to predict for.  But before we kind of go deeper, what do you think we're going to use this data set to predict? Whether an individual what? Passes the exam. Passes the exam, yeah.  And I would make this clearer on the test because in theory, you can predict anything, right? But passes the exam, right? So passes the exam, is that a feature?  No, someone wrote it in the chat already. It's a label. So this is a label. So we're going to leave this separately. So I'm going to cross this out.  I actually don't want this in our feature vector because this is what we're trying to predict. Make sense?  If we're trying to predict something, that's not part of our feature vector. The feature vector, the way I would think about it, it's like all the different independent  variables you're working with and the label is like the dependent variable, right? So you want to predict if an individual passes the exam. Now, a lot of you got name.  That name is not also important useful feature vector because of everything we talked about in question two, right? It doesn't matter what the individual's name is.  So now we're left with this table.  Okay. So now, well, how do we construct it? First of all, I'm still going to write their names.  I don't want that in the feature vector, but at least we'll know the feature vector corresponds to who, right? So what we need to do is look at each row separately, if you will.  But first of all, grade and class, is it like a number or is it a classification? Yeah, it's just a number. So that one shouldn't be too difficult.  Now study habits, is it a classification or a number? It's a class. How many classes are there? There's three, right? Okay.  What did we say about last week that if you have more than two classes, how do you kind of create this feature vector? Like what is that word called? One hot vector, right?  It's kind of a funny word. One hot vector. So you'll need to create a one hot vector for this. Now for study last night, is it a class or is it a number? It's a class, right?  And so how many classes are there? It's just two, right? So this shouldn't be too bad. You can just use a Boolean, someone wrote that in the chat. Perfect.  So now let's make the feature vector for Alice. What do I have in this first one? This first one should represent grade, right? What do I want to put there?  I saw different answers for this and I think anything was acceptable since I didn't clarify. Yeah, you could have written 95.  I personally prefer 0.95 in machine learning and this is not something we covered. So you could have done anything.  In machine learning, you want to try to use as small as numbers as possible. The reason is the bigger numbers you use, the longer it takes to train, the longer the  code runs and you're sitting there for like tens of hours and it's just, you don't want that. So 0.95 would be a better way to prescribe it.  But if you had 95, I will accept that answer as well. Yeah, it was in the notes for the splitting Gini index with weight. It was this one. It's this one. Yeah.  But you can look at the lecture notes. Yeah. Okay. Now for study habits. Now it's a one-hot vector. So let us first kind of see how we want to structure our one-hot vector.  So we said there were three classes. So what we're going to say is let's just structure our one-hot vector like this and I'm just  keeping this separately just so we know like what the ordering is here.  I had a question of does it matter the order of the features are placed in? Most of the time it doesn't, but like for this one-hot vector, it needs to be, we need  to make sure like I can change up the order, right? I can have cramming before exam, the first row regular studying, the second row minimal studying the third row, right?  Or I could have this grade as the last row. It doesn't matter what order it is, but it has to be consistent, right?  For Alice, if you put the grade as the first row and for Kara, you put it as the last row, your model is not going to work because it's not going to understand what's going on.  Okay. Does that make sense? So just consistency matters. It doesn't matter exactly what row you, what order you end up going with, but we're going  to just keep it simple and do it in the same ordering as our table here. So regular studying, cramming before exam and minimal study, that's the order I'm going  to go with our one-hot vector. So for the study habits for Alice, what is that going to look like then? What's going to be my column vector? Yeah. One, zero, zero.  Does that make sense? One, zero, zero. It's one because it is regular studying and it's not cramming before exam. It's not minimal studying. So there's zeros.  And then the last thing is study last night is a yes. So what do we want to keep there? One. All good. Does this make sense? Any questions?  Avi, did you have a question? I see your hand raised or is that from before? Yeah, I have a question.  So isn't, so the grade in class is a 0.95 and then the study habits, regular studying would be a one. Where do you get that extra zero, zero, one? I was just confused.  Oh yeah, yeah. So study habits, it's because there's three classes and not just two, we need to treat it as a one-hot vector. Oh, yeah. Yeah, yeah, yeah.  So there's like three classes here. And so that's, that's why that's this, this whole three thing is actually just representing study habits. And we can do the same thing for Bob.  I understand. Thank you. Awesome. Let's do the same thing for Bob and Kara, making it smaller here so I have room.  So for our grade, and now that I have told you what Alex's look like, Alice's looks like, what does Bob's one-hot vector, oh, sorry, what does Bob's feature vector look like?  What's the first entry? Me? Oh, anyone, anyone. Yeah. 0.62. Yeah. Do you want to try the one-hot vector? The next part?  Zero, one, zero, right? Yeah. And so that's just for regular studying, cramming before exam, minimal studying, right? We're just following that.  And because Bob was crammed before exam, that's why it's a one. Yeah. Awesome. And then the last one. Would be one. It'd be one. Awesome. Perfect. Cool.  Does everyone understand? Everyone make sense for them?  And then lastly, Kara's is just going to be, you can write it in the chat. Yeah, perfect. 0.92. Zero, zero, one, right? Because it's minimal study. Zero, zero, one.  And then our study last night is zero. Awesome. No. Okay, perfect. Everyone got it? That's the hallmark. So now let's go on to today's material. Let me stop screen sharing.  Okay, Minoo making a co-host.  OK, perfect. And Minoo, feel free to jump in when your slides come. I already forgot which slides you started. So let's just continue what we were talking about last time,  and then we're going to move on to classification and regression. So feature selection, right? What happens if all the features are not necessary to complete the ML tasks, right?  So example, we're trying to determine the type of fruit. So fruit name, fruit color, fruit diameter, where fruit was found, which of these don't really matter? Well, oops, sorry.  Why is this moving? The one that doesn't matter is where fruit was found, right? Obviously, all food at the beginning comes from the farm. Eventually, it's at grocery store.  It doesn't matter where it was found. But if you want to try to determine the type of fruit, the fruit name is our label, and our fruit color and diameter are features.  And so what you can do is you just eliminate this column. You don't use it, right? Just because you have columns in your data set doesn't mean you use them.  And so this whole process, and you did this in the homework assignment as well, is called feature selection. It's very important because we don't  want our model to learn patterns on unrelated features. This can lead to low accuracy, right? Whether a fruit is coming from a grocery store, a farm,  or anything like that, that's going to definitely make our accuracy lower. Or it can also elongate training times. And machine learning, like models, they take insanely long to run.  So you do not want to elongate it at all. And so if we were to continue with our previous example, you already created feature vectors for our homework assignment, right?  And so this is our fruit color and fruit diameter are only features. Fruit diameter, we just keep here at the top, 16, 3, 2.5, right? Fruit color, we do green, red, blue, right?  So green is 1, 0, 0. Red is 0, 1, 0. This is 1, 0, 0. So that's our feature vector. You guys already know how to make one. But then how do you make your label?  Well, if your label has more than two classes, then that also has to be a column vector. And in our case, watermelon, apple, pear, if those are our three rows for our labels,  then it will be 1, 0, 0, 0, 1, 0, 0, 0, 1. Makes sense? If there's only two classes, then we could have just gone with 0 or 1. Everyone understand? It's the same thing.  It's just now we do it for our labels separately. Cool.  Do you still have to use a one hot vector? Yeah, that's a good question. So for three or more options, you definitely need a one hot vector. It's only if you have two options,  then you can just do an easy 0 or 1, and you don't need a vector.  And so then this is what we do is we feed this into our model, and then you would just code up your model. What the model is is the topic of the discussion for today,  as well as all of our future sessions, is what are these different types of model? Because our model differs depending on what type of problem we're trying to solve.  But this part is consistent, no matter what the model is. It's that you need your feature vectors, you need your label vectors, and you just feed it into the model.  And then what the model is, essentially, is you're trying to find the variables, the parameters of a mathematical function that can fit the inputted features to the labels.  And by fit, all we mean is how can we map our features to our labels?  So far, so good. Does it make sense on a high level what we're trying to do with this AI ML stuff?  Thumbs up? Cool. Now, accuracy is very important, right? You want to see how accurate our model is. And so accuracy is the way you always have calculated in school, where it's  like it's just the total number of correct predictions over the total number of predictions, right? And so our first question for today is, let's find the accuracy.  And so just put it in the chat. We're not going to do poll anymore, because that just takes too long. So just look at this chart.  And I'm not really explaining what this chart is right now. I just want you to try to figure it out yourself. And definitely use a calculator for this.  I don't want you calculating by hand.  I've got one answer already. Let's see.  And if you're confused, let me know and I can explain and I can give you a hint.  OK, so we do have some people who don't understand. OK, so what this is, and it doesn't really matter. You know, Grand Priest Jeremy.  I imagine your real name is Jeremy, but I'm not sure. So I'll just call you Grand Priest Jeremy. It's so funny you wrote me confused,  because this chart is actually called a confusion matrix. Now, it doesn't matter for this class what it's called, but this is a confusion matrix.  And what it tells you is that you have the predicted class up on top and the actual class, right? And the predicted class columns, positive, negative,  is like what the model is trying to predict. Is the model trying to predict it's positive, or the model is trying to predict negative?  And then you see horizontally, you also have positive, negative. And that's the actual class. Like, is this actually positive or is this negative, right?  Another way to think about it is if you're trying to do an image recognition problem and you're trying to predict images of a cat or a dog, right?  And so your predictive classes might be cat and dog, and your actual class might be cat and dog. So now, the way to read the chart is you look horizontally and vertically.  Oh, it's not a poll question. So just put it in the meeting chat. I saw some requests for a poll. But you look at the positive for predicted, right? Can you guys see my cursor?  Awesome, yeah. You see the positive for predicted and positive for actual. Does that mean our classifications match? Like, are we correct in this case? Yes, right?  Now, we look here. We predicted our model, rather, predicted positive, but the actual class was negative. So are we correct in this case? No. Let's look at here.  We predicted negative, but the actual class was positive. Are we correct?  So then, when we predict negative and our actual class is negative, are we correct? Yeah. So now, does that help you calculate the accuracy  for those of you who were confused about our confusion matrix?  So, the way to calculate the accuracy, right, would be you add this up, the positive, positive, negative, negative, so 1546 plus 1648, right?  And then you divide that by the total number of predictions we have here. So, time for the answer. It is indeed C, which a lot of you did have.  And this is the calculation, positive, positive, negative, negative. Those are the times when we are correct, right? So, 1546 plus 1648.  And then we divide and we add all these numbers, 1546 plus 286 plus 862 plus 1648, and that should give you C. Make sense to everyone now? Cool. Okay. Yeah. All right. Hi, everyone.  I'm Minoo. I'm going to be teaching a portion of the class today. So, today we're going to be learning about classification versus regression, and these  are a supervised learning algorithm, which should sound familiar from one of the first two weeks.  So, supervised learning algorithm is when your model is trained on a labeled dataset, and classification and regression are two subtypes of supervised learning. Can I switch slides?  Yeah. I have you access. Yeah. Can you try? Oh, okay. Okay. So, classification is a process in machine learning where a model or function categorizes  data into distinct classes based on its features, and as we talked about, it's a type of supervised  learning, and the kind of classification tasks are categorical, meaning it represents different groups like spam or not spam.  So, one type of classification is binary classification, where an instance is classified into one of two tasks, again, binary two.  So, we have something like the Cs or not the Cs, and then we also have multiclass classification,  in which a classification task, or it's a kind of classification task in which an instance is classified into one of three or more classes.  So, binary classification would be like an email is categorized into spam or not spam, but then a more advanced kind of classification is multiclass classification, where now an  email can be categorized as spam, primary, social, or promotions. Next, we have multilabel classification, and this is a classification task where each input  can be assigned multiple labels simultaneously, right? So, before where we had just one label, so it could either be spam, it could only be  primary, this kind of classification task can allow multiple categories per instance. So, this is seen in text classification, where if you're reading an article that's related  to multiple topics, you can kind of see at the bottom where it's related to science, technology, or health. And then it's also seen in the images below, right?  So, this is a picture of a beach. It's also a picture of a sunset, and it could also be a picture of a vacation.  Any questions about classification so far before I move on to regression?  All right.  So, regression is a type of supervised learning algorithm that predicts continuous values from given input data.  So, before we had classification, which would only predict discrete labels or classes, but now regression is used to predict a continuous numerical values.  So, for example, right? With classification, we can predict whether a cell is cancerous or not. And now with regression, we can see the size of the tumor.  So, again, some more examples are predicting house prices, stock prices. With these kinds of examples, there's no categories you can fall into.  It's just a range of numerical values.  So, does that make sense to everyone where regression is?  All right. So, let's do some practice questions.  Basically, there's going to be three questions, and you guys should identify whether each example is regression or classification.  Yeah, and you can put your answers in the chat.  Ms. Harper, can you see the answers? Because I don't think I can. Or do you see people answering? Oh, yeah. I do see a couple of answers.  Do you guys have an option to write to all of the hosts?  And co-host as well? And if you can send through that. OK, yeah, I see it. Yeah.  OK, yeah. So I see some answers jiggling in. So let's just go over this. So predicting the number of units a store will sell next month based on past sales data, promotions,  and seasonality. For these kind of questions, you really don't even have to read the second part of the question. We can just look at the first part.  So predicting the number of units a store will sell. That's a numerical value, which means it's going to be regression. Second question is determining whether a patient  has a high or low risk of developing a certain disease. Again, the question is basically answering it for us, high or low risk.  And then finally, estimate the time it will take to complete a project. Time is going to be regression. You can't classify time. So that's going to, again, be regression.  So yeah, I think actually everyone got the same answer. So good job. Any questions on that, though? All right. Let's then do a couple more practice questions.  Oh, sorry. Didn't see that.  OK. Yeah, three more questions for you guys.  OK, cool. So I see answers popping in. And it seems as though most of them are right, although the first one is a bit iffy. So let's look through that one.  Predicting whether a customer will make a purchase within the next week based on their browsing history and previous interactions within the website.  So again, if we just kind of focus on the first part of the question, predicting whether a customer will make a purchase, that's going to be classification,  because either they make a purchase or they don't. We're not trying to see how big their purchase is. That would be a regression question.  The question is just asking whether a customer will make a purchase. So this is going to be a classification example. The second question, estimating the fuel efficiency  of a car based on its engine size, weight, and other specifications. It seems everyone got this right. It's going to be regression, because we're measuring fuel efficiency.  So that's going to be a range of numerical values. And then finally, the third question, predicting the likelihood that a new product will  be a commercial success based on market research, competitor analysis, and historical sales data.  That's also going to be classification, because you're just predicting whether a new product will be a commercial success or not. So again, that's going to be success, no success,  a classification task.  All right, so Ms. Harpia will continue to talk more about regression now. Awesome. Let me stop the remote control.  OK, cool. So there are multiple different types of regression. One of them is linear regression, where we assume a linear relationship between the input features and the output.  And I'm correct? Someone asked a question about the previous slide really quick. Oh, yeah, yeah, yeah. Yeah, go over it. Yeah, for sure. Yes.  So something like a numerical value making it a regression. Likelihood that a new product will be a, yeah, I see what you're saying. If it's a percentage, if it's the likelihood  that a new product will be a commercial success based on market research, you're right. Yeah, based on how you read the question, it could also be regression.  Because I think what the student is pointing out is that you could say it's an 80% likelihood that it would be a success or a 20% likelihood that it would be a commercial success.  So yeah, I mean, if you can justify your answer correctly, it can always be a regression type of question. But if you want it to be like it's either a success  or it's not a success, it can also be a classification question. So both answers can be correct just based on how you justify your explanation. Yeah, good point. Yeah.  And in fact, a lot of times, one of the techniques we use as ML engineers is that you can always convert a regression problem to classification or classification to regression.  And sometimes that makes life easier. So yeah, speaking about, oh, sorry, what's the speed up button for the new sheets? I'm distracted a little bit.  Anyhoo, the linear regression, that assumes a linear relationship. We talked about linear and nonlinear a little bit. And granted, this is a little simplistic  because we have said a lot of stuff in life is nonlinear. But this can be useful, and it has its uses. How many of you have learned about linear regression  in any of your classes yet? Some of you? Yeah. So you may or may not have known at the time, but this is an AI technique as well. And so it looks something like this.  And our equation would be very simple. It's some coefficient multiplied by x plus some constant. So this is what it looks like. By the way, are you guys familiar  with the Cartesian coordinate? Or is there anyone who can write it to me in the chat? Anyone who has never seen this type of charts?  OK, so here's what I'm going to do. In a bit, I'm going to talk you through how to interpret these charts. It's pretty easy, and you'll get to see the material  before you see it in math class. So when you do see it in math class, it'll be awesome. The more complex sort of version of this is polynomial regression.  And so this is what we use to model nonlinear relationships between input features and dependent variables, and the dependent variable. And it adds polynomial terms.  You don't need to worry about this one. This one's very complicated, so we're not going to really focus in class. But you can see the equation is much more complicated.  The linear regression was just these last two terms. And then this is x squared. You have a lot more complicated. You have these fancy curves and everything.  But again, don't worry about this, just kind of just to tell you that there are many, many different types of regression. There's, in fact, also decision tree regressions, right?  We talked about decision tree classifications last week, but there's also decision tree regressions. So for a simple linear regression, what we are basically trying to do  is we're estimating the linear relationship between one dependent variable and independent variable, right? So just one independent, one dependent, that's it.  And what the goal of this is, you have all these points, right, that points are from your data set. And basically, your data set has two columns.  There's one feature and one label, essentially. And when you plot these points, there's some math that goes in to find this best fit line through the data points.  And what this best fit line does, it minimizes the error between the actual value, which are where these red points fall, and value predicted by the line of best fit, right?  So basically, the idea is the line that we're trying to make should be very close to where these points lie, right? If you have a line like this, if you look at my cursor,  that's not going to be a best fit line, because the red points are very far. And the idea is that when you don't know what, like when you're trying to predict something,  you're going to use this line to predict approximately what it's going to be. So that's how the linear regression works. So if we want to predict plant heights based  on the amount that has been watered daily, you can use linear regression, right? So maybe you look at this axis, and this is called the x-axis, if you are not  familiar with this kind of chart. And this is called the y-axis, so horizontal and vertical. This would be like, hey, if you watered 1 milliliter,  then the plant height is going to be either 3 inches, or if you water 0.5 milliliter, it'll be 4 inches, the height, something like that.  And then you use this best fit line to predict, like, hey, so if I were to water this plant 2 milliliters, how much is my plant going to grow?  So approximately, it's going to grow like 5 inches or wherever it lies in the line. Does that make sense overall? Or confused? Thumbs up, yes? Makes sense?  We're using this line to predict. These points are the actual, OK, red points are the actual data set, right? That's in our data set that we are given.  We know that, hey, when I watered this plant with this many milliliters of water, this is what the plant height was, right? So I had a data set full of these points.  The red points are just a graphing of those points, whereas the blue line is once we do our AI model, our linear regression, we are like, hey, we can use this line to predict.  And the way to read this line is you just look at the x-axis, right, and you look at 1 here. And then you go all the way to when it's 1, you go all the way on the line.  And then you see what the y-axis here is. And so that's like 3.5 or something. So you know if you water 1 milliliters, then it will be approximately 3.5 inches the height.  Does that make more sense?  How do you calculate linear regression to make the chart to input? So we're not going to talk about that because that involves a little more complex math.  So that's a great question. What we're going to do is kind of after the fact to see how good our line is, essentially. So yeah, we won't unfortunately talk about that.  But let me move to the next slide here since we have six minutes left. You can have multiple linear regression, right, which we're not going to focus too much on this class  either because it's more complicated. But essentially, you have multiple independent variables and one dependent variable. So if you wanted to predict a person's salary based  on multiple things like years of experience, education level, and age, right, multiple linear regression would allow us. And so instead of a line, you're looking  at planes and stuff like that. But the idea here is we want this best fit line, right? We're again going to just focus on linear regression.  And you want to find this best fit line through the data points that minimizes the error between the actual value and the predicted value, right?  So this is our predicted value is where the line lies. So if you were to look at 0.25, right, in the line, it's 2.7. Our actual value is where the red point actually falls, right?  So it's 4. Does that make sense so far? You have so many different points, right? And some of these, like for example, for this one, 0.25, the height's approximately 3, right?  There's going to be variation and stuff. And that's fine. But what this best fit line does is helps you try to predict approximately, you know,  what the height of the plant is going to be or whatever you're trying to solve. Before I go on to this example, I just want to quickly share.  If you're not familiar with Cartesian coordinate, I just want to quickly share that, and then we'll go to the example. So if you guys can see this, right,  this is our Cartesian coordinate. You have your x-axis, which is your horizontal axis, and your y is the vertical axis. Your center here is the origin.  We call it the origin, and that means 0, 0. That means that's where x is 0 and y is 0. This first point always is x. The second point is always y. So here, the point is 1, 0.  What would this point be? Especially, I want to make sure that I'm this point be, especially I want the answer from someone who has never seen this kind of stuff before.  What would this point be? If this is 1, 0, then what is this point?  0, 1, exactly. This one would be negative 1, 0, because x is negative 1, y is 0, right? And then this one would be 0, where x is 0 and y is negative 1.  So you just need to look at the grid lines and see what the x value is, what the y value is, and then write it like x, y, right? So this point, for example, 1, 2, 3, x is 3, y is 2,  so 3, 2. Here, x and y, this is negative 2, x is negative 2, right? Negative 1, negative 2, 1, 2, y is 2. And then this point over here, x is negative 3,  y is negative 1, negative 3, negative 1. So far, so good. Does it make sense how to read these charts? OK. Let me go back to our slides here.  So I just want you to put this in the chat, and I won't give you a lot of time. But I want you to see this red point, and I want you to tell me what is the actual value  and what is the predicted value. Actual value is where the red point is. Basically, when I'm talking about value, I'm only talking about what the y value is, right?  I'm not talking about the x value, only the y value. So the actual value is where this red point lies. What's the y value of that? The predicted is where it lies on this line.  So you need to kind of look up here and see where it's landing. Like, what is the y value here? What's the y value here? What's the y value here? Yep. Oh, thank you, Emilio.  That's super helpful. OK. Oh, is there a way you can delete that, Emilio, now that I'm going on to the answers? Thanks. So just as Emilio drew that, negative 1, right?  That's where our y value is, right? Negative 1. I see a lot of c's. Remember that here and here is positive, here and here is negative. So it should be negative 1.  This is the thing. And then over here, the line that Emilio drew, that's 2. y is 2. Does that make sense for everyone, especially for people who said c? Yes?  I want to see some yeses.  If you're confused, let me know in the chat, or forever hold your peace. Because you need to know this stuff for the homework. OK, I'm moving on then. Oh, you're confused.  Isn't the 4a are positive, then negative? Just look at the grid lines, right? This is positive. This is positive. This is negative. This is negative.  So you are looking at, Emilio, do you mind drawing? Thank you. Do you mind drawing a line over here to the y-axis here as well and draw this one again as well? This one also.  Yeah, and then this one also. I just need you to look at the y value, OK? So the red point, the y value is negative 1. Then you go to the best fit line, right?  The y value here is 2. Does that make sense? Just look at this. This is negative 2, negative 4, negative 2, negative 4. You don't need to remember anything.  The numbers are there to help you out. Does that make sense more? OK, cool. Thank you so much, Emilio. That was super helpful. So once you have this best fit line,  you want to kind of see what your error is, right? And so the way to calculate the error for regression is mean square error. So that's 1 over n.  So that's just the number of data points. And then you multiply the summation of each of the actual values minus predicted values.  So very quickly, I'm just going to show you this example. And then we'll be done. Imagine I'm not even giving you the x values. I'm just giving you the y values.  These are the actual values. These are the predicted values. Well, there's 1, 2, 3, 4, 5. There's five points in our data set. So we're going to divide the whole thing by 5, right?  Because 1 over 5, that's the 1 over n part. And then this part, you want to just do actual minus predicted. So 4 minus 5, and then the bracket square.  So 4 minus 5 bracket square plus 5 minus 4 bracket square plus 2 minus 4 bracket square plus 3 minus 2 bracket square plus 6 minus 6 bracket square.  And that whole thing gives you 1.4. Does that make sense? That is the mean square error for this best fit line that we have, imaginary one.  It's not corresponding to the one from the previous example. Are we all clear? Does that make sense, how we calculate mean square error?  This basically tells you how good your best fit line is. If you have a massive, like if our mean squared error was 100, that means our best fit line is no good.  Where do we multiply 1 over n? Yeah, 1 over n is also just the same thing as dividing by n. So that's the dividing by 5 is the 1 over n part, because there's five data points.  1, 2, 3, 4, 5, right? Any other questions before I let you guys go for the day?  All good? OK, folks. Well, is it always 5? No, it's just the number of data points. So if your array is bigger, it's going to be different.  If there's four points, then this will be divided by 4. Any other questions? OK. So I will assign you guys homework on this, and we'll see you guys next week. Thank you.  Great work, guys. Thank you.  Yeah, each data point can have a different number. It's just the number of data points.  Does that make more sense?  It's the number of data points. So you just need to, OK, let me share screen again.  This is one data point, 4, 5, 2, 3, 6, 1, 2, 3, 4, 5, right? So it's five data points. The size of this array is five, because there's five numbers in this list.  Does that make more sense? So if there were four numbers in this list, then the number of data points is four.  You just need to see how many numbers there are in this list. Does that make sense? OK, cool.  Awesome. Well, if nothing else, thank you all, and see you next week.

Video Summary

The video covers a lesson on machine learning, focusing on concepts related to homework and more general principles of machine learning, classification, and regression. Initially, the speaker discusses the proper methods of doing research for a homework assignment, encouraging students to revisit lectures rather than Google answers when dealing with covered topics. The lesson proceeds with a practical breakdown of the Gini index calculation for a decision tree split based on attributes like browsing history and age, exemplified with a decision tree. The conversation touches on the importance of selecting significant features for machine learning models. Examples include features to predict customer subscription cancellations and constructing feature vectors using a dataset regarding students' study habits. Additionally, the lesson transitions to supervised learning algorithms, specifically classification and regression types, and practice questions to differentiate them. The session concluded by examining linear regression, including its simple and multiple forms, using best-fit lines to minimize prediction errors, demonstrating this with examples and explaining the mean squared error calculation to assess prediction accuracy.

Keywords

machine learning

classification

regression

Gini index

decision tree

feature selection

supervised learning

linear regression

mean squared error