false
Catalog
AI 101 - Class Recordings
Recording Class 5
Recording Class 5
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
over the homework right now, so please pull out your homework. Okay, so here is the first question of our homework from last time. So given the input 1, 1, we have to first calculate the predicted output. So first of all, here's the neural network that we are working with the graphical representation. And remember all these lines we see, these are weights, and these correspond with our numbers here. So if we go over to the key really quick, we still have, here's our same neural network, just this time all the weights are written in so it's a little easier to picture. And to get the output of our first hidden layer, we have to do the weight 1 times x1 plus weight 2 times x2 plus our bias to get first what we need before activating it. So to do this, first we will put 1 times negative 1 since 1, remember that's our node 1, so x1, plus our node 2 times weight 2 plus our bias, which is 0, which is 2. Then taking the activation function ReLU of this, do max of 0 and 2, which is 2. So that will be our output for our first node in our hidden layer. Then to do this for our second node, same process, our node 1 times our weight 2 plus our node 2 times our weight 2 for that node, plus our bias, which in this case is 4, and then that equals 6. And then we do ReLU, so max of 0 and 6, that is 6. So now to find the output of this whole neural network, we do the same process, but with our new node 1 and node 2, which we just found is 2 and 6. So our new node 1, 2, times our weight 4 plus 6, which is our node 2, times our weight 2, plus our new bias, which is 1, that equals negative 3. Notice here we did not take the ReLU activation function of our last node. Okay, how's everyone doing? Okay, I don't see any questions to go over it again, so I'll keep going. Now B, if the actual output is 2, what is the hinge loss? So if you remember, the hinge loss is the max of 0 and 1 minus t times y, where t is our actual output and y is our predicted output. So what we found in A, this is our predicted output, and we were given in part B that our actual output is 2. So moving on, so the max of 0 and 1 minus 2 times negative 3, so our second value would be 7, so the max of 0 and 7, that is 7. Okay. So now, I didn't know you could do that in Google. We're going to move on to part C, fill in the question marks for the code which represents this neural network. So right now, this code is in Keras. If you may or may not know, Python is the language used for machine learning, Python is the language used for machine learning, and there are different what's called libraries in Python that make coding more efficient and simpler, and Keras is one of these libraries. So as you can see, this code is pretty simple, just writing it out. However, it still does all the mathematics that we were doing in the earlier part of question one. So moving on to what would we put in these question marks if we're coding this neural network, the first question mark would be 2. Since we have two nodes in our hidden layer in our input, Tim, this would be 2. And as you can see, our activation is ReLU. Now, just for understanding, our second line would be our output, so that's 1 because we have one node, and our activation is linear because notice we didn't apply the ReLU activation function in our output. Now, just for understanding sake, the reason why we have our first line to be our first hidden layer is because this is where we do our computations. We don't compute anything in the input layer. Our input layer is given to us. These two nodes just represent our input vector, which in this case is 1, 1. And then we start our first computation in the first hidden layer, which has two nodes, so hence why the first question mark is 2. And how many nodes from the input are we putting in? That is also 2. Okay, how is everyone doing? Does that all make sense? And if so, we can move on to our slides for today. Can we get some more, maybe, input? So I see a few people saying they're a little confused on question one. I can go over that one quickly. Let's see. Okay, so question one, again, remember all these weights and biases, they are written in to our neural network up here. So it'll just be easier for referencing. And then remember our equation, x1 times weight 1 plus x2 times our weight 2 plus 0, or our bias, and that's going to be the output for our first layer before we take the activation function. So just matching up the values at this point, I'll do the first one for an example. So our x1 in this case would be 1, since that is what our node 1 is, times our weight 1, which is negative 1, as you can see in this first green arrow, plus 1 again, this is our node 2, times our weight 2, which is 3 in this case, plus our bias, which is equal to 2. Then we take the max of 0 and 2, which is 2. So this is the output of our first node. Is that making sense for everyone? Okay. And then just going over, let's see, I think the third question, sorry, the third question still has some confusion. So this piece of code, think of it graphically or like topography of our graph here for our neural networks. So we have two nodes first coming in, then our first hidden layer, and our only hidden layer, we have two nodes again, and then one node again in our output. Now this is pretty much all we need to complete this code. So our first line would be for our first hidden layer, and we have two nodes. That's why the first question mark is 2, and our input dim, we have two nodes that is coming in to our first hidden layer. So that would be 2 here. And our activation function is ReLU, if that makes sense to everyone. Okay, looks like that cleared some things up. Okay, so now moving on to our lecture for today. Sorry, I will stop share for a bit. Okay, let me know if you can control my screen. When I start presenting. Okay, I'm going to request remote control. Maybe that'll make it easier for you. Okay, perfect. Yeah. Okay, perfect. Yeah. Okay, awesome. So, okay. Yeah, perfect. It's okay. Okay, so moving on to our slides for today. We're going to be talking about convolutional neural networks, or called CNNs. We'll just reference them as CNNs in the slides today, since it's a lot faster to say. But notice how we have the words neural networks here today. So we're going to use knowledge that you used, just based off from last lecture, too. So first of all, what are CNNs? So CNNs are a type of neural network that was made specifically for the input of images in mind. And one of the big challenges in machine learning was processing images, just because it's very complex doing machine learning on an image. And CNNs was so revolutionary at the time it was invented, because it didn't require something called feature engineering. And basically, what this means is machine learning researchers, they would have to input tons of data or information about a picture, like how many, the position of some lines or corners in the picture, maybe some colors. And you know, you may have heard an image is worth a thousand words. So this was very difficult just doing, and CNNs made it a lot simpler. Now there's more models out there today that can do similar things to CNNs. And CNNs can have other inputs too, like not only pictures, but that is outside of the scope of our class today, just something to keep in mind. And the reason why CNNs are so good at like not requiring feature engineering is because they do their own, it's called feature extraction. So throughout the pipeline, they find patterns in the data, and they themselves can distinguish what makes a dog a dog and what makes a cat a cat. And we'll talk a little more about this later on. So where is image recognition used? It's used in autonomous vehicles, facial recognition, medical imaging, and more. So first, let's do a little image classification activity. Let's classify these images based on whether they have a cat or a dog. So for us, this is really simple. But for a machine learning model, this is pretty complex. And how does a computer know what a dog and a cat is or how to distinguish them? So each image is composed of small dots called pixels. And pixels can have one channel, which is called grayscale, or they can have three channels, which would be the RGB, each contributing to the final image color. And RGB values range from 0 to 255. So each of these numbers here in our picture, each pixel has a value associated with it. And this is going to be our numerical input to our machine learning model. Of course, we might do some things to this number later on, you might see. Thus, when a color image is passed into a CNN, it has the form WHC, where W is the width and H is the height and C is the number of channels. So a picture that is like a colored picture would have three channels. And like we said before, a picture that is grayscale would only have one channel. OK, now moving on to image preprocessing. So first, we have image resizing, which makes the images the same dimensions. And we also have image normalization. So we can scale the pixels in RGB to commonly arrange between 0 and 1. And it helps the model converge faster in training and allows it to focus on relevant features. So remember, the high numbers that RGB can go up to 0 and 255, these numbers can be scaled to 0 and 1. OK, first, in data augmentation, we can artificially increase the data set by creating modified copies of images. And this helps provide a larger and more diverse set of images to train on and can be used to address class imbalance. Now, class imbalance is basically if we're inputting images of cats and dogs into our CNN and we have a lot more pictures of dogs than we have cats, then we would want to do some data augmentation on the pictures of cats and create more pictures of cats to input into our CNN so there's no bias towards dogs or anything like that. So one thing you can do for data augmentation is adjust the color, such as brightness, contrast, saturation, stuff like this. And we can also do some geometric data augmentation, including flipping the photo, resizing it, scaling it, maybe rotating it. And there's also other effects we could do, such as increase the noise of the image, maybe make it a little blurry. This all just helps increase how robust our model would be. And like in neural networks, CNNs mimic how human brain operates to recognize patterns and features in images, and they don't require as many weights as regular neural networks to process images. Okay, now moving on to the math behind CNNs. I'm going to let Minoo take it from here. Hi, everyone. So I'm just going to kind of branch off of what Naomi talked about so far and we're going to zoom into the neural networks itself. So how, like Naomi mentioned, CNNs kind of, they operate how the human brain operates to recognize patterns and features in images. So this is a branch of neural networks that's kind of optimized specifically for images and then we're going to look at how the human brain operates that's kind of optimized specifically for images and the layers that they have also help to recognize patterns and features in these images. So they have four main parts. They have convolutional layers, ReLU, which should be familiar from what you guys learned in neural networks, pooling layers, and fully connected layers. So convolutional layers consist of multiple filters, or they're also called kernels. And these are the filters that detect specific patterns in the image data, such as edges, corners, or textures. So a filter would look something like this to the right. This is a 3x3 filter. And how a filter works is that it would slide over the entire input image and the sliding is called convolving and it performs a dot operation on the pixel values of the image and it produces a feature map. So we're going to look at an operation and then we're going to come back to the slide really quick so we can see it in more detail. But the dot operation looks something like this. So if you have an image with, like, let's say these are the pixel values, this filter is going to slide over the image. So here you can see it's first at the first three pixel values of the image and it's performing an operation. So it's doing negative 1 times 1 plus 2 times negative 1 plus 1 times 0. So it's just multiplying the corresponding pixel values in each in the filter and the image and then it's adding them together. And then it's going to move one to the right and it's going to do the same thing. So now it's doing 2 times 1 plus 1 times negative 1 plus 3 times 0 and it's going to keep going as however long the image is. So if we go back to the previous slide, you can see here it's kind of doing the same thing, right? So we have a 3 by 3 filter. Let me see if I can annotate. It's fine. So we have a 3 by 3 filter and first it's going to go to the first part of the image, the first 3 by 3 part of the image and it's going to multiply. You can see everything else is 0 and only the middle is 1. So effectively what it's doing, it's doing 0.4 times 0 plus 1 times 0 plus 0.1 times 0. All of that is basically going to, it's going to be 0 and the only value that's going to be, that's going to add anything to the feature map is going to be the center value. So that's going to be 0.3, right? So that's why you can see 0.3 is the first value here and then it's going to slide 1 over, right? So now we have 1, 0.1, 1, 0.3, 0.2, 0.3, 0.2, 0.1, 0.5 and then the only value that's going to be translated over to the feature map is going to be 0.2 and you can see that again in the feature map on the right. That's the second value. So it's going to convolve over the entire image and produce a feature map like the one you see over here to the right. So now you guys can do your own, any questions about that so far by the way before we move on to a practice question? Okay, so let's just do a practice question on the dot operation. So just continue the convolution. What number should go in the question mark over here? I'll give you guys a couple minutes for that. Okay so most people got it right but there does seem to be just a bit of confusion on how on what values go in so basically what it's doing is that we're not going to do one times zero because that's the last value and we're going to put in zero that's not what's going on we're doing a um an operation across the last three values in the image so we're actually doing if you can see here right we have lines kind of um between the filter and the image so what we're doing is that we're doing one times one um plus three times negative one plus one times zero we're not just doing one times zero we're doing this entire operation so that's why we're going to get negative two so most people got that right so good job any questions on that for anyone that um to answer negative two does it make sense now okay why is there negative oh sorry let me go back um why is there negative three and one in the first boxes okay so that's because um that's the first couple operations right so first the filter is actually going to be right under negative one two and one um maybe it's actually in the previous slide let me go back there yeah so this is the original question so we the original image that we have is negative one two one three and one and the filter is one negative one zero same as the question you guys did so first it's going to go over the first three pixels in the image that's negative one two and one so if you perform a dot product operation over that you're going to get negative three and then you're going to go to the next three values and then if you do a dot product over that this is wrong you're going to get you're going to get one not negative three so um that's why you have negative three and one in the first two boxes and then what you guys figured out is what will go in the last box let's look to the last three values yeah okay perfect okay so here you can kind of see a visual output of what the filter is actually doing so you can see earlier we talked about how the feature produces a feature map a feature map like we said it's going to detect some sort of pattern in the images in earlier layers filters are going to identify colors edges textures and then in more later layers filters can start to identify more complex patterns so here you can kind of see there's like there's circles curves whereas in the first couple layers it's just edges so it's horizontal lines and you know whatnot so this is kind of a visual representation of what the features are actually okay so now we're going to move on to relu so this is what's going to um this is your activation function and you guys should have heard of this in a neural networks lesson as well but what an activation function does is it's going to introduce non-linearity to the model so that's going to allow it to learn more complex relationships and then it's going to learn more complex patterns and then it's going to learn more complex patterns and then it's going to learn more complex patterns and then it's going to learn more complex patterns to the model so that's going to allow it to learn more complex relationships between features in the image so a relu activation function looks like this so you can see any values up to zero let's say any negative values it's going to be counted as zero and then any values after that after anything greater than zero is going to be whatever that value is and so if you guys want to say it and like what what would the formula for relu activation function be and you can put this in the chat just like um let me put the picture of the graph up so it's it's gonna get easier for you guys to figure it out but if you had to put it in like a formula terms for this graph what would the formula of relu be So let's say right on the graph, y equals x. So one part of it is y equals x, but you can see there's also another part of it saying y equals 0. So if you combine them, I'm getting answers y equals. OK, yeah, so most people are getting it right. So the answer is going to be value equals max of 0 comma x. So if we're going back to that graph over here, what it's doing is that it's saying, if I have any negative values, I'm just going to make it 0. And then if I have any positive values, it's going to be whatever that positive value actually is. So if I get negative 3, it's not going to be negative 3 after the activation function is applied. It's going to be 0. But if I have positive 3, it's just going to be positive 3. So effectively, what it's doing is that it's finding the max of 0 comma x. Does that make sense to everyone? And I know most people got that right. OK, awesome. So now we're going to move on to the pooling layer. So the pooling layer ain't, oh, OK, let me go back. There's some questions about that. I don't know if the question went through. How do you know? How do you know how big the filter is going to be? So that's something that you determine when you actually create the model. You get a choice over that. You did not get the, OK, so let's just talk more about the activation function and the formula, because there seems to be some questions about that. So if we're looking at this graph over here, you can see that any negative values that I would potentially get, all of that is just going to be 0. It's going to be converted to 0. Whereas any positive values, they can be any values greater than 0. They're just going to be whatever value they originally are. So if we put that in terms of a formula, it's going to be max of 0 comma x. Because let's say your x is a negative value. It's like if you get negative 3, 0 is greater than any value. 0 is greater than any negative value. So it's just going to be converted to 0. Whereas any other positive, any value that's greater than 0, it's going to let it be that value. What does max? Max is like the maximum, like whatever number is bigger. OK. Does that make a bit more sense? OK, so now we're going to move on to the pooling layer. So the pooling layer will aim to extract the most significant features from the convoluted matrix by applying operations such as max pooling. So this is going to reduce the dimensionality of the feature map and decreases the memory usage during training. So from the previous layers, we see that we're going to get a feature map. And let's say this is the feature map. What the pooling layer is going to do is that it's going to perform some sort of operation on this map, so it's only going to retain the most valuable information. Oh, so we have a question over here. So why can't the filter be the size of the whole picture then, if you can decide? So I mean, there's many things to that, right? So if the filter is the size of the picture, then it's just a lot of memory usage during training because you're going to get feature maps the exact same size as the picture. So you're not really doing anything in terms of making it more efficient for your model. And also, the whole idea of the filter being able to convolve over the image, that's what's actually helping it detect patterns in the image. Ms. Harvey, am I answering that correctly? Because I don't know for sure either. Yeah, no, that's perfect. Can you hear me, by the way? Yeah, we can hear you. Yeah, so the filter has to be small because some of these filters, like for example, and this was already said, trying to identify lines, right, lines or edges. So you want to see where the lines are everywhere on the image. Now, just like Meenu said, that if you have a filter that's as big as your image, then it's really not doing anything much better than a neural network, which we talked about last week, right? Which for images, that's not the best thing. You want to use something like convolution neural network. You have these tiny filters. These filters are nothing but weights. And so you just want these very small filters, very few weights. And that makes this process efficient. That's why we use convolution neural networks for image recognition. Does that make sense, whoever asked the question? Yeah, OK, awesome. And that's a very good question as well. OK, so going back to pooling, it's going to help basically retain the most valuable information from your feature maps. So we can apply a function called max pooling. And there's many other functions. There's average pooling. But here, if we talk about max pooling, we're basically going to look at each 2 by 2 square. So first, we're going to take 5, 3, 17, and 9. And what max pooling is going to do is that it's going to try to find the biggest, the most maximum value in the 2 by 2 matrix. So out of 5, 3, 17, and 9, obviously, the biggest number is 17. So that's going to be our first number in the matrix. Then we're going to go to 1, 23, 8, 42. The biggest number there is 42. That's going to be the second number. And then we're going to slide down to 4, 2, 8, and 5. The biggest number there is 8. So that's going to be the third number in our max pooling matrix. And then finally, we have 22, 56, 9, and 77. And the biggest number there is 77. So this way, we can kind of assume that everything else is either redundant information or information that the model doesn't necessarily need to make its predictions. So the biggest number is the most significant. Yes, that's one way to do it. There's also other ways. There's average pooling, where what they do is that they're going to take the average of the four values, so the average of 5, 3, 17, and 9. What happens if the starting layer doesn't have quadrants? Can you explain what you mean by quadrants? I'm not sure I understand. So while they're doing that, we have one question that asks, so you do the biggest number as it's the most significant. It doesn't necessarily always mean that the biggest number is the most significant. It's just what we choose to do. We can also do average pooling. Max pooling is just one way to do it. Oh, just like if it's not divisible into four pieces. So again, you can choose this if it's not divisible into four pieces. Then it could go ahead. No, no, no, sorry. Yeah, so I just mentioned, it's a great question, but for the purposes of this class, we're simplifying it. And we're going to always assume we have a 4 by 4 quadrant, and we're only going to take 2 by 2's. But in real life, actually, basically, you are given the filter size. You are also given a stripe, so it means that you're skipping over some squares. So instead of just like, if you see my hands, instead of just scanning through the entire thing, you are scanning one part, then you're skipping over from green to purple section, right? And so there's all these other parameters that can be chosen. But for the purposes of this class, we're going to ignore, and we're going to always assume the 4 by 4 quadrant. So yeah, that's a great question. OK, thank you, Sarpia. And then here we have two more questions. So does the pooling layer reduce the computing power needed to learn patterns? Yeah, that's exactly what it's doing. It's making it more efficient. And that's the whole idea of the convolutional neural network in itself, is that it's making it more efficient for us to learn patterns in images. Next question, so whenever we do max pooling, we always take the biggest number. If we do a different pooling, then we do a different number. Yeah, so based on the pooling operation you choose, you're going to get a different number in your feature map or in your output. So right now, we did max pooling, so we got 17 as the first number. But if we do average pooling, then we would be doing 5 plus 3 plus 17 plus 9 divided by 4. That number is not going to be 17. I'm not going to calculate it, but it's not going to be 17. So yeah, you're going to get a different number. It depends on the operation you choose. Any questions on that? OK, and what is the difference between max pooling and average pooling? Max pooling and average pooling, it's just a different way of reducing the dimensionality of your feature map. Ms. Harpena, do you know if there's a difference in terms of what's more efficient or what's better? Or is it just based on what you choose to do? It's based on what you choose to do. To be fair, for all of the CNNs I've ever worked with, I've only used max pooling. But it depends on what you're trying to do and why you need to do average pooling. What does dimensionality mean? Dimensionality is basically like 4 by 4, 3 by 3, 2 by 2. So when you're reducing dimensionality, you're reducing the amount of values in your feature map. Basically, you're making it smaller by reducing dimensionality. Do you have to use a certain way for a certain instance? I can't think of a certain time where you would have to use max pooling or you would have to use average pooling. Ms. Harpena, do you know if anything like that exists? Yeah, I'm not sure from the top of my head. I'll take a look and I'll give you guys a response on Google Classroom. Those are good questions. But like I said, I've only ever used max pooling for anything I've done. Are there any other types of pooling? Yes, there are many other types of pooling. You can always Google search. And it's going to give you a full list. And you can explore each one. Again, max pooling is the most common for the stuff that I've used as well. I've used max pooling. But yeah, these are really good questions. And I definitely think that you guys should explore it. And I'm also going to answer in the Google Classroom whether there's actually ever a time when one is better than the other. What's a time where you would need to use average pooling? Again, like I said before, there's no specific instance that I've ever run across where they say specifically, OK, one is better than the other. Or you have to use this one. I can always look it up after class and see if there's ever an instance like that. But I can't imagine anything. Because with neural networks, there's always a free reign on what kind of activation function you choose and all of that. So normally, you have pretty much whatever option you want. So assume that there's no time needed where you have to specifically use average pooling. But again, I'll check it. And I'll answer your question more in detail. Can I butt in here just really quick? Yes, yeah, go ahead, Naomi. Sure. So activation or, let's see, pooling functions, these you can choose. You can tune, so like hyperparameters. So you can try max pooling in your convolutional neural network. And then maybe your accuracy isn't what you choose. You can change some hyperparameters. And you could change setting max pooling to average pooling and seeing if that increases your accuracy. So these are all such as things Minoo was saying, things you can change. But you can test them as well. So just for whatever output you prefer or increased accuracy such as that. That's a great point. Yeah, so just like Naomi was saying, machine learning is an empirical science. So what that means is you just experiment. Just try out different things and see what works out best for you. The stuff that we work on, it's super, super complicated. There's a lot of research being done into why things work the way they do, how come all of this works. But no one really knows because it's just a lot of math, lots of equations that we really can't comprehend. So like Naomi said, you try out different things and just see what works. And that's the thing you go with, whatever gives you that highest accuracy. Any other questions on pooling? OK, awesome. All right, and then final, how do you know the accuracy? OK, so let me get through the output because I can relate more to your question. But yeah, we're going to get to that in a second as well. So again, this is similar to what you guys learned in neural networks. There's going to be fully connected layers, as you can see right over here. And then finally, you're going to have the number of nodes equal to the possible number of classifications in the last layer. So if we're branching off of what Naomi said originally with a dog and cat example, in these final layers, you're going to have two nodes, one mapping to dog and one mapping to cat in the last layer. So now we're going to, accuracy is, this is a very good question. So it really depends on what kind of task you have, what kind of, what you're trying to do. So this is where, I know we had a lot of questions about pooling, where it's like, when is a time where you would need average pooling? When is a time where you would use max pooling? These kind of questions can be applied more here because there's many different ways you can calculate accuracy. So the way you're calculating accuracy can be often, it sometimes matters based on the kind of task you're trying to perform. So with dogs and cats, with this example, you're going to have a lot of training data, or you're going to have, all your data is basically already mapped to an output. So for this case, accuracy can be calculated in your regular way where it's like, your model's predictions over actual predictions, right? So that's just a regular way of calculating accuracy. Sometimes you're going to have cases where each image can have multiple outputs. So if we go a couple of classes back, we can have one image be like, it can be of a sunset, it can also be of a vacation. So in cases where each model can have multiple outputs, you can't use your traditional accuracy, which means that you're going to have to start experimenting with other accuracy metrics. And there are other ones. There's something called Jacquard similarity. There's also something called F1 scores. And so it just depends on what you're trying to do and what you're trying to accomplish. And based on that, you can choose your accuracy metric. But traditional accuracy is the most common. And that's the one that I just talked about, where it's the classes that the model predicted versus the actual classes. OK, any questions on that? OK, awesome. So now we're going to be looking at an actual project example. And this is a project that I did in the MetaPlus Machine Learning Research Bootcamp. And this is something that takes place every summer. So once you guys are done with this class, if you think you're interested in actually coding and creating your own projects, you can always look out for that. It's open for 6th through 12th graders, right, Ms. Harpia? 8th. Yeah, it's already open. 8th through 12th. Oh, it's already open. So if any of you guys want to sign up and do some actual coding and coding your own projects, you can always sign up. And there's also opportunities to do research with universities. So this is also a really cool thing for your resume and also just to build your experience. So this is a project that I did in the MetaPlus Machine Learning Bootcamp. And this also uses convolutional neural networks. So I thought it would just be cool for you guys to see a real-life application of this. So in the Bootcamp, what my team and I were tasked to do is that we worked with Williams College. And this is a liberal arts-based school. And the issue that they had is that they had thousands of artworks. But sometimes in classes like literature or Asian studies, they like to use artworks for research purposes. But there was no classification system for them. So what a lot of the students had to do was that they had to manually find pictures related to their research interests. And this would take hours. And it's very time-consuming. So what we decided to do was that we decided to think of a way to computationally organize these artworks using convolutional neural networks. So here you can kind of see. Oh, I can't even zoom in. OK. Basically, it follows the same process that we learned in class. Oh, thank you, Naomi. Actually, if you can zoom in a bit more, you can see here. What we're trying to do is that if we have four images, for example, we're trying to basically group it to a theme that is most related to. So you can see we have four random images. And once our model is applied, it categorized the first and third image as Greek and Latin, and then the second and fourth images related to history. So that's what we were trying to do. And a part of what we did for this project is that we used convolutional neural networks to find patterns in these images. And this is actually an example where we couldn't use traditional accuracy, as I talked about earlier, because we set it in a way that each image could fall into multiple categories. So here you can see that the first image, it fell into Greek and Latin. Let's say there's also another theme called sculptures or ceramics or statues. It could also fall into that category. So each image could fall into multiple categories, meaning that we couldn't use a traditional accuracy metric. So we had to experiment with multiple other metrics. And each metric basically gave us a different accuracy. So if you can kind of zoom in to the results section, Naomi, it's to the right. Yeah, so you can see in that table there, we have a different accuracy based on the metric that we use. So you're going to have a lot of options, right? When you're doing your own projects to choose whatever accuracy you're choosing, you're going to get a lot of different values. You can see for one of them, I had a 73% accuracy. And then for that same model, if I used recall, I had 95.7% accuracy. So you have to kind of use your judgment to see which accuracy metric makes the most sense for the kind of project that you're doing. And so, yeah, if you guys are interested in coding your own projects in the future, definitely look into the Metaplus Machine Learning Bootcamp. And I'm also going to put a link in the Google Classroom as well so that you guys can take a look at it. But any other questions on the kind of application about convolutional neural networks? Do you guys have any questions about that? Okay, perfect. I think, Naomi, I think this was the last slide that I had and then there were two more slides relating to next week's stuff. Sure. We can talk about decision tree regression. This is something we're going to be touching on a little more next week. Just to get a quick introduction. Decision tree regression, there's classification where each path from the root to a leaf node represents a sequence of decisions leading to a final class prediction. Each path to the root and leaf node, these are just different type of neural network, and they represent, as I said before, the sequence of decisions to the final class prediction. Then in regression, it can capture non-linear relationships between input features and output variables, where leaves are the averages. Now, we can see this as a quick example. For the price of house prediction, if the bedrooms are less than 3 and it has a backyard, then the price can be estimated as 60,000. If it doesn't have a backyard, it can be estimated as 30,000. Such as if the bedrooms are greater than 3, then there is a price estimation of $150,000. Just from here, using common sense, this makes sense in the real estate market and decision tree regression, in this case, can predict the prices of houses based off their metrics, such as if it has bedrooms, maybe how many bathrooms, maybe it has basement square footage, stuff like that. We'll start decision tree regression and then continue next week. Just something to think about. Talking about the mean, if Dr. Rue ate four leaves on Monday, five leaves on Tuesday, two leaves on Wednesday, three leaves on Thursday, six leaves on Friday, how many leaves on average did he eat during the week? We can take the mean average of this, and this is just important information to know before we start going deeper into decision tree regression. If you guys can put in the chat what the average of leaves Dr. Rue ate. Getting a lot of four. Yes, four leaves. Good job, everyone. Now we're going to move on to homework. So that's just everything you need to know for the intro to decision tree regression. We'll talk more about this next week and you'll have CNN stuff in your homework today. So first of all, I see we do have a little extra time. Does anyone have any questions about CNN since that is what your homework will be about? And if not, I can show your homework. Okay, does anyone have any questions for any of us? Any questions about maybe the MetaPlus machine learning bootcamp? Some opportunities? Okay. If no one has any questions, oh. Okay, so are we supposed to upload the homework solution or just mark as done? So you are supposed to upload your homework solution. However, you can go over it in class. So you wouldn't need like us to grade it per se, but uploading it to the Google Classroom is best. And how do you know where I can practice more AI skills for free? So you can do this on your own time. We don't go into the coding per se too much in this course. So I do recommend doing the Meta AI ML machine learning bootcamp. You can learn all the Python skills necessary and do research on your own. For free, there's a website called Kaggle. They have their own data sets you can do. And there's some competitions there for whichever person has the best accuracy for their machine learning model. You can try things there, but I wouldn't say that's just a place where you can get data from. Doing machine learning, you'd have to definitely do a lot more homework to be able to code your own machine learning project, but it is doable. So kind of branching off of that, right? If you wanna learn the coding specifically, there's a lot of great resources on YouTube for free. And you can always look up specifically what you're interested in because there are a lot of branches to machine learning. So if you wanna learn more about CNNs, you can always look up CNNs online, like coding tutorial, whatever. And if you guys have any specific questions, I know W3Schools has been a really, really great resource for me in the past because almost any specific code, like piece of code I don't understand, I can always search it up and it always has a detailed breakdown explanation of what the code is trying to do. Where are you gonna post the link to the bootcamp? We're gonna post it in the Google Classroom so everyone can have access to the bootcamp syllabus and what they're gonna teach as well. But yeah, there's a lot of great online resources that are there for free. So if you guys always wanna practice with those just to kind of build your experience, also because the bootcamp is gonna be very fast-paced as well. So if you want some experience before that, then you can always look into tutorials and then use Google Colab to code. Yeah, and just adding on to that, what I really liked about the machine learning bootcamp is that you can do research with universities. I had a separate project from UNU. I worked with the College of William and Mary and they're in collaboration with UCLA and some other universities. So it was really interesting in just applying what you learned in the bootcamp to real-world problems. So definitely recommend and just machine learning in general. There's a lot of great resources out there online. Okay, and if that is all, feel free to leave early. And if you have any questions, put them in the chat or stay behind. But thank you, everyone. See you next week.
Video Summary
The video transcript details a lesson on neural networks, focusing on a homework task involving calculations with a simple neural network. The lesson guides students in computing outputs for nodes using weights, biases, and activation functions, particularly the Rectified Linear Unit (ReLU). The lesson then transitions into a discussion about hinge loss calculation, a method used to evaluate the accuracy of predicted outputs against actual outputs in machine learning models.<br /><br />Subsequently, the transcript introduces Convolutional Neural Networks (CNNs), which are particularly effective for image recognition tasks due to their ability to autonomously perform feature extraction. Students learn about CNN components like convolutional layers, ReLU activation, and pooling layers, which help in reducing data dimensionality while preserving essential information. Pooling techniques such as max pooling are discussed, demonstrating how CNNs summarize important aspects of image data.<br /><br />The session briefly introduces decision tree regression, a method for making predictions based on certain decision paths, and touches on data-related concepts like averages.<br /><br />Finally, the session mentions opportunities for learning and practical application, such as a machine learning bootcamp that offers research experiences with universities, encouraging students to explore various free online resources to enhance their AI skills.
Keywords
neural networks
ReLU
hinge loss
Convolutional Neural Networks
feature extraction
convolutional layers
max pooling
decision tree regression
machine learning bootcamp
×
Please select your language
1
English