Did you if you interrupt me because I don't it's not easy to keep an eye on the chat, right? So assuming that I can still hear you, weirdness is a very pleasing label. Thank you, Ingrid. Do agree. Okay. So we're going to talk about neural network. No, no, no. Don't want to leave that when I talk about neural networks. Okay, yes. So we're going to talk about neural networks from a historical perspective. I'm going to give you some references. Give you some references of online resources that I think are the clearest for neural networks. This is a visual resource. It's neural networks and deep learning has a lot of widgets and apps. We can modify things and see how the neural network works. So I find this to be possibly the clearest one, the most pedagogical one. This is a more strict than mathematical approach to neural network. This book is by a young fellow, is called deep learning. Uh, Goodfellow is the main person behind the Generative Adversarial Neural Networks, the fake and all that. We are ill. And this is the book that the MIT, mit Teams, but by nothing is strictly taken from the in what I'm going to go through neural network however, they also, it should be consistent with what I'm going to teach it. So let me go through a brief historical perspective. I think it's important to understand how neural networks come to be because that really demystifies the complexity of what they are. So I can't give him a little far and start talking about 1958. When neural networks more or less bigger brush here. He had actually 943, I think, is where I started the story. Where neural networks are sort of conceptualize. And we're going to go through, in the next today we're going to get through morula here, adenine modelling. So the sixties will. And so I guess, yeah, somewhere here when we start talking about deep learning, we will talk about the planning. And then in next lecture is we'll specialize on time series, time series analysis through Nirvana also neural network design for Kant domain. We'll talk about Long, Short-Term Memory, and we'll talk about transformers will also probably talk about convolution. What's, as they can be apply to time domain analysis, even though they are not per se a time too. But they learn connections between locations. I'm going to call image, so that you can take your data, sequence, your time, series, and Kirsty to become an image. They can learn relationship between what happens at different time steps as if they were different positions in an image. Right? But we're not there yet. So 1943 is when the original conceptualization of something that I'm going to call the M or the color. It's neuron arises. So this paper is by a neuro position. I don't know. I don't know what they used to call them in 940. See, somebody could study the brain and a logician thing, we come up with a neurologist, geologists, and this was the logician. And so the head, the hands, that the brain works by relatively simple units. Now, disclaimer, I know nothing about that our system. So what I learn, what I know, I learned from about neural networks and hearing explained through neural networks. So somebody correct me. But the idea really simplifies the simplified schematics for how the brain works at the time. At the time of this paper came to be that the brain itself is complex and can produce complex thoughts. So complex can make complex connection between things. But really it does that by very simple, open, good. But it does it by very simple operations that once concatenated become complex. And the simple operations are really, I think they call them all or none. In this paper. What they mean by that is that they're simple logical operators. They're essentially what in computer language, we refer to as Boolean operators, true or false. You put together an arch connections between true and false elements and you build a complex thought. So that's the I. So, so the Werther, it is found that the behavior and I bring that can be described in terms of all one character of nervous activity. We had the addition of more complicated logical means for next containing circles, circles or loops in this sense. And it has also shown that many particular choice them on possible neurophysiological assumptions are equivalent in the sense that for every net having undue, having, behaving under what assumptions there exist another net which behaves under another assumption, which will give the same result. So you can have a different neuron pass. The way of this DVD means you can have a different neural path, but lead to the same conclusion or the same software, the same conceptualization, although not in the same time. I think it's interesting, they bring it up. There were far from thinking about competition on efficiency. So the way they abstracted these units are found in neurons is these are the original graphics in the paper. These are the graphics that you use now. So the neuron is this triangle here we usually shown as a circle in neural networks. The brain takes inputs from various brain sensors, the eyes, the ears, et cetera. The inputs come in, process. And then they get an output. So far so good. That's more or less like any algorithm that we thought about currently in process output. The actual physiological structure of the brain. I don't know how much of this is dumped out for people that think about computer science. But this is the standard plot that we use in neural networks to show the actual neurons in the brain. So the brain has dendrites. This is the way in which you get the data inputs. Has a soma, That's the processing unit that has this action and synapses. This is the message travels in the synapses connected to other neurons. So you make a network parser again. And again. This is our computer scientists representation of that input, processing output. So add the style you, whether conceptualizing is a logical, right? So it's a yes or no answer. He said what they say, all or nothing, which for us means and move on. So it's something that is either 1 or 0 at this stage. By something that is either one or 0 is necessarily a class file. It cannot be, I cannot do a regression if I do not have an output that is continuous, right? If I say that my output is one, then what I'm doing with the classification. So at this stage we are only conceptualizing classification problems. So let's assume that my neuron receives three pieces of input, three features from the data, X1, X2, and X3. I'm going to write the processing unit. And as the sum of those three values. Notice that these don't have, the x doesn't have to be integer, right next, let's assume for now that it's integer just to make it easy. And then I'm going to say that I processing method is to say that the sum of those has to be above some threshold for my value to be one. Otherwise it's going to be 0. This is a logical operator in the sense that, in the sense of which we refer to logic, in computer science, it's a Boolean operator. So my question to you is, if I wanted to be present through this equation, the end operator, what should theta B? And since I have all people in the class and more people online are like the people online to tell me the value of theta to make an AND operator out of that equation. By putting it in the chats. I want an AND operator, I want to u1, this sentence to end up being X1 and X2 and X3. Going to wait for some people to put the uncooked your answer in the chat. Some more answers in the chat. Just one person together, you get participation scores in this class. Who guesses one? I guess away, the one, sorry, that was a very progressive one. You can also raise your hand for one. Who gets these two? Because these seven. Okay. Cool gas is 33 is the right answer. Thank you. Online and thank you. Raise your hand, right. So if they all have to be one for me to get three or larger. So this means they, either any of the x is 00 corresponds to pass. I get what I need for now that you know how it works, right. And so all three have to be true, true. So all three had to be one. So there are some has to be greater or equal than t, equal to 3 specifically, but we, all those barriers are equal. What if I want the operator or when you say everybody raises many fingers as you're okay. Online. Hear me something people, in its own sense, what should the operator for B? Right? That is correct. So the operator or is what? Find them. We'll leave it at that. Okay, So the idea is that the brain can do this and maybe a computer can do it too. So they built an electrical system that is able to do the same operation. Okay? Able to do the same operation, the McCulloch-Pitts neuron or alkene. So this is the structure of the neuron, and that's very simple. We sometimes call that in general, we call that a perception. Some differences. The problem is that if my operator, the only thing that I have set now in the MP neuron is the fresh bright. The only thing that I had flexibility on is to choose what data needs. If I can only choose what Theta is and I cannot, I can only stats and Piper doesn't have to be a hyperparameter. That fancy will be, I have nothing to the right. So the next step is the perceptron, and we're at 950 a with Frank Rosenblatt. The perfection adds weights to the input. By adding the weight to the input. Mathematically, this is now going to look like the same as before, but with WWI, Yeah. What does this remind you? Of something times x equal to something. Hold that thought. I'll ask you in a minute. So I added the weights. I'm going to add one other piece. I'm going to add a bias. So that becomes the sum of iss times x i plus b greater or equal than something. What is that? It's a x plus b. What do we call that? We call line and if we decide what w and b are based on the data regression, right? So this turns the individual perception into sort of a linear regression problem. We're still doing a banner it out. So it's just a linear classification sets a line between data points. So you design, you choose that what the, what the w's and b's for that you choose the slope and the intercept of the line that separates data between classification and classification 0. So we've done this about a bajillion times time. There's a hand raised and I can tell who's Can you speak up? I'm sorry. I have two screens here, so it's complicated. It's Carlin high kind to you. Can you speak up? Sorry. That was that was left over from a previous answer. All right. No problem. My slides back. Okay, So this is the perception. I added the weights and the biases I earned my neurons into linear regression 1958. And this praised an enormous amount of interest for reasons that frankly, I don't forget, because this is very simple. But the New York Times article that I posted here was advertising these particular, the creation of this particular model, a neural network that the Navy device learns by doing psychologists show embryo of computer designed to read and grow wiser. The Navy revealed the embryo of an electronic computer today that is expected will be able to walk, talk, see you Bye. Reproduce itself and be conscious of that google Home. And there's quite a long way to go from that to actual AI, but this is a consumption, move it around. So the next step. The weights and the biases that we have. A binary classifier can only do it yes, no answer. In addition to that, we have a PE heart when we try to learn things because we have a straight jump on the classifications. That doesn't really allow us to learn subtlety between the data. So the next step is utterly, It's 960. We drew and half design the perception that are in addition to what I said before, has a function f, which we're going to call an activation function that wraps around the neuron. So the neuron doesn't have to be 0 or one. We can do a smooth transition between 0 and 1 or between any other barrier. So the difference, the key difference is that now the learning happens after this transition. So I can learn from a continuous value. So I can learn suddenly if the differences between 0.80.9 and change a lot and make enlarges that if the difference between two sets of values for my weights and biases is 0.20.1, right? So I can do it in proportion of my hair. Whereas before the error was only 0 or one because only correct or wrong. So they moved the error, the error estimate from here after the barriers to hear after the biases wrapped into a function that's called an aniline. And if I write my perception again as a step function, that's the same as the previous, but generally I will have a smooth transition. This is one of the most common activation functions. It's called the sigmoid. And equation doesn't really give you perhaps the feeling of what it looks like. But it looks like that. This is a set of common, very common activation functions. The sigmoid is an S-shape activation function that goes between minus in Keynesian bus guarantee will output 01. Another very common one is a linear redo, the linear regression function for positive values of your output, right? So after the barriers to get a positive value, that's a linear relation to that. But if you got a negative value, you're throwing everything the way you're setting yet, send those values. And then there's a bunch of combinations of those. Will learn how you choose your activation function based on your data and architecture and particularly based on the past. And so this is the, the learning scheme, which is very simple. I'm going to change the weights by something that is proportional to the body or the line before the way. Multiply by the error divided by the number of pharmacists. That's just the way then find the learning function, but just to show that it's a very simple loss function. So what are we missing? We're missing the depths. We've only talked about one neuron so far. When we talked about the fact that really the idea is that each neuron is simple, but the complexity comes from collecting and sharing information between the neurons. So first of all, when they were thinking about it in 19504358, blah, blah. They were not thinking of a deep neural network, but they were thinking go in network structure. Let me skip over this. Yeah, if they were they were thinking of a network structure, a single layer network structure. So you have your input that's however many dimensions you have. And then you have a certain number of neurons. They are all receiving the input directly and their output is combined directly into the output. Or the problem is the thing that this is a single layer perceptron. Well, this is a one layer perception and it's just not good. I'll label this slide multilayer perceptron, but where you see most Not yet the mouth there, et cetera. So the input, the layers here, technically three, there is an input layer, a hidden layer, that's the layer of neurons, and an output layer that's the output. Funny. When I draw it like this with arrows that go from all the inputs to all the neurons and the neurons to the output. That's a fully connected neural networks and sometimes it here, fully connected, it's not fully connected. This doesn't have to be true. I could send an input on some neurons. They're really good way to make sure that your neural network is not bias, is to drop some of the conductivity. Also simplifies neural network. Okay? From graphical animations about all the input go into each one of the neurons. And each one of the neurons learns a set of weights that are independent of the others, right there, by independent of the others, each one of them performs a linear regression. This is a multilayer perceptron. So these are perceptions don't have an activation function. They only have weights and biases. And it's essentially a matrix multiplication. Not much to see here. Mba, like essentially what I've done is verbatim male image. It's not a petition if I only have the weights and biases. But then again, here comes idling, and we now have the activation function. So a neural network that has now perception, Adelie is called a modelling. And this was 1961 or two when the subject was created. Same like before, a single layer, but now you have the activation function, the tensing era. But I haven't no. I don't think so so far. So good questions. Okay. So what I want to do now is I want to ask you to then I will stay there and think about the fact that we are hybrid and in-person class. I wanted to talk to the groups individually like this I like to do last week about your projects. Meanwhile, I wanted you to work on a little exercise. I'm not talking to you. So let's do that. Can the people in the room get on, Zoom? How many of you have headphones? On cell phones? Now? Haskell, they're awesome. And I'm going to put those of you who are here. I'm going to ask to come to my that square root sign and put those on line. I'm going to ask you to stay online, so let's see what am I asking you to do? Let me wait until people have gotten online. So the adopting stained structures, the research, that's what TSA 22, 2022. Thank you. Remember we were wondering that Lazada yes. Its own resources. And then okay, good. I'm at TSA 2022. I did post it right on Slack. So for those on line on Zoom, Is it clear what we're doing? I'm waiting until their colleagues on zoom so that you can work with your final exam, the final project team. While I talk with one team at a time, you can do this exercise that is behind me with the rest of the team. Alright, so let me describe what I want you to do. I've written a play around with that for a second. So nice. I love this. It's from TensorFlow, which is a platform for deep learning and also a Python module for deep learning. And really I think, gives a sense of how things change based on the choices. The choices that you make are what is the activation function that I put there? Or what is the input data that I decide to casualties had to pre-process my data. And then a few other things. You had a learning rate. That's how largest step you take. Based on the same error. You have a larger adjustment of the way. So it's monitors the way regularization. Let's not worry about that at the sign and what kind of problem you're running. So I would like you to perhaps focus only on the classification problem though if you get bored, move on to the problem. So here you have a set of data, of data, examples of B. You can also increase the page, the training, and have that ratio. You can increase the noise in the data. So creating the knowing will look something like the book on the lashed. You see that the number of, that there is some mixture now between the yellow and the blue space than the yellow and blue points are very well separated in their own special regions. So here are the target variable is the color. And the input feature that you give a are related to the position. People input feature are the natural feature. So the problem says X1 and X2, the position of the point on the x-axis, the position of the point on the y-axis. You are feeling a neural network. So you can define how many neurons you want. You can increase the number of neurons that query and you can increase the number of layers or decreases. The output layer will only one because it's a classification problem. So you have only one neuron output. Can you change the color? Maybe you can change the kind of sneaky. So I just wanted to explore this. So you start. The optimization and neural network converges to a solution. You know that it converge to a good solution if all the blue points are in the blue region of the space. So you saw how the space got colored when I started the, all the blue points guarding the blue region of the space of the yellow points or yellow region of space. That means their classification is curved branch. And you can eyeball how correct it is. The other thing that I want you to look at is the test and the training loss. So looking at the movement very low. And they're really important. Let me show you, show this form of harm. So look at the problem again. I'm going to restart a look at the training and patched law. So look on the side of the screen. We start locale. It changes very rapidly initially, and then he slows down. That's a typical learning behavior at the beginning, you start from a random point. You're moving it radically in surplus or very far from the bottom up likelihood surface. And then you converge towards the likelihood services or less unless we've seen this before. What I want you to do is try at least these data configuration and this data configurations. But it's not a nice Beta configurations. And try and get them both to converge by changing primary the activation function and the input data. Also, the structure of the, you can change the architecture of the neural network like out there. And just make considerations and write them down. I don't know, open a Google Doc or something because I will ask you to tell me what you think you learn out of this. What did you need to do to get the simple problem-solve? How did the, how did it change? How rapidly got solved based on different architectures will happen if you put noise and how was it, how did that change when the problem was much harder. Okay. And work with your teammates online or in the room if they're in the room. And meanwhile, I will open a breakout room and asked the people to talk about their project with me in the breakout room. Okay. So start working. I figure out the breakout room. Is everything clear that they're the best? Do you want me to set you up with a breakout room? So I'm going to make three breakout rooms and then I'm going to visit them and signing. Actually, I'm going to let you assign manually in the breakout room number is the order in which you present a couple of weeks ago. So the accelerometer is first, the epilepsy second, and the geospatial an ester. Okay. Remote sense? Yes. All right. The wrongs are open. You should see them and you can go to whichever one you want. Oh, I have to assign Sorry about that. You, everybody that the project should be assigned. Pill I'm going to put you in the smallest grew TCS auditing and therefore you don't have a project. Say you can play around with the app. Okay. Thank you. I'm gonna put you in a bit strange but the modern work for you. All right, why don't I start talking to people in the reverse order though? Of course it is. Hello. Hello. All right, guys. Breakout room one from my palm. Like it's on the US can turn on the volume on your phone, on your computer. So here are some muted on the end. So you speak strongly? Virtually. Speakers. Yes. Good. Good. Can you hear me? I can. Just barely. Got it. So I don't really have a lot of concerns about the projects. You I think we're in good shape. The one thing that I wanted to address with the convolutional neural network, recurrent neural networks and the other methods but that had addressed in class, yes. So yeah, you can use these mean. Be aware that the results are not necessarily super meaningful and slippers stable. So if you use these knee, you're going to want to do is I put a couple of different marginalization with different protects the values and see that your result is stable to that. And it's not just a fluke of how you decided both the data and then the other thing gains. Convolutional neural networks are indeed a fine way to approach. Series analysis because of the conductivity, they learn connections and the relationships between pixels. So whichever way you give them a 2D image, we don't, we don't do 1D principal, you can deliberate. Second one, the scale, the convolutional layer, one-dimensional balance that you have a 3D image. I can RG. For cartography, a JPEG file, you would use 2D convolutional layer, right? And then each layer of the image convolution of them applied to it. And then the result of the feature map would be a combination of the sea. So when you go down to one, he currently do that. And that's why we, we fold them into a two-dimensional image typically and use a 1D convolutional layer. That's the easiest way to do that. Now you have multiple sensors. You can also decide, for example, to put the sensors one in every row of your image, quote, airport image. And that would be a two-dimensional object. You can decide to fold your images for each sensor and have the each one sensor be a 2D plane. So the layer of the image and have multiple data for all the sanctions. There is also a way to do that. Just make sure that you think about this decision. If you do any folding, that is not irrelevant because you are you need to make sure that you don't break connect with the form. And because you don't know where to begin and end, I would think that folding your time series might be very complicated. Especially if you're looking for things like when the activity begins and ends. Because that will be in a different location of the image. For different, for different cakes are the same API. So I think maybe I were you, I will probably take each sensor and are each the, each axis, right? And you're going into, into spherical coordinates. We can also not. You can seem like fun, but you have multiple sensors for the same activity. Yes. So you can take each sensor and make that your y axis on the image and then use a convolution on. Alternatively, you can just go to Long, Short-Term Memory or transformers or some other architecture is very thankful loop. Once, that's okay. Acronyms on your document, you have to spell them out every time a walker and appears and make sure you verify whether the figures that you made are you asked to state caption paid by them? Was he made what data was it made with because you have a feed side is randomly selected. Yeah. Yeah. Okay. That's awesome. How would you go first?
MLTSA 2022
From Federica Bianco April 12, 2022
1 plays
1
0 comments
0
You unliked the media.
Zoom Recording ID: 92136770169
UUID: Moj3N143TIy0WcMzr2IRZw==
Meeting Time: 2022-04-12 07:21:01pm
- Tags
- Appears In
Link to Media Page
Loading
Add a comment