Hello screen also don't clear my screen. Yes. My screen. And I'm showing the whole screen because the world swamp between coding and slide. Okay, should I just do the full presentation of a Sudoku puzzle? So let me, let me do dynamic time warping and then see if we have time for nearest neighbors today. And if not, we do nearest neighbor today and tomorrow and the next time and random forests next time as well. So remember that we have talked about distance matrices, distance metrics. Metrics is, and we've talked about this family of matrixes, that distance metrics, cascade family of metrics. It's possibly the most conventionally used for a variety of application that include numerical data. And for the most part, we're still talking a lot about processing raw data or time series. So we're still talking largely about process in the numerical data. Time series are made up. Although when we talk about random forests, we'll talk about extracting features from the time series instead of using the raw data. Then with those feature, my hat categorical variables, for example, for which like we said, there's going to be different distances. This slider all, have all been seen before. They're just here as a reminder, Minkowski metric is, can be written in a compact form like this. We're going to use that today. So let's pause and think about it for a second. So for every dimension here, the dimensions are the k indices, like epic feature in my dataset. Imagine where you've had a one-dimensional times than that, just extends one. But imagine that you have a vector of types of time series properties. For example, temperature, humidity, air pressure, and precipitation. Those are for related quantity that you can measure in terms of time series, of time series related weather. And you could have the same value for each four at the same facts that, right, so that's a multidimensional timescales. You always put the absolute value because you want these numbers will be hazardous. And also taking square roots when it asked a rural poverty that will become imaginary in some cases that will be really bad. But the absolute values are positive. It doesn't matter if one object, if which, what's the value of XI and XJ is larger? And this is four. So one of the distance between two objects, two observations to kind of sickness, right? Time series, I am pansies day to the p power Anki can be whatever my imagination allows among the real numbers that start with one and end with impunity. Although most commonly it's going to New York to as the Euclidean distance. So the one which is the absolute order one if the Manhattan distance, because distance that goes by perpendicular lines from I got from point a to point B through perpendicular lines. That's what this represents, right? And the Euclidean distances that as the cross fly distance, and it's the same thing, but my p is now a value of 2, value of one. Okay? Good. So we're going to look at how we actually compute these distances because we're going to work on modifications of this distance support that impact the dynamic time warping. So just a heads up that we'll use this package. And there are certainly other packages for distances, but this is the most common one and a lot of the other packages will eventually rely. We're going to use sci-fi. And there's going to be a couple of tricky things when we get to use it. But beside by spatial package is the one that has the distances embedded in it. Don't be fooled by that. I can calculate distance between things that are not special so long as they're numbers. But the context for which, in which the packet was. Spatial analysis. And there's a lot of things in this package. There's a lot of interesting distances that we haven't talked about. These are major prioritize, these are similarity metrics. These are the ones that are good for like categorical variable can be, can be good for categorical variables can be defined. You may wonder how do I calculate where things are different? Categorical variables. For example, in something like this, that your car, what the numbers they go in is, does your observation have Property a, B, or C, and the number that is going to appear in court, that is going to be one or 0. That's our term category think numbers in a process that we'll talk about more generally, how they're related to one-hot encoded. So instead of using the category a, B, and C, we call not a variable, a, variable b, c. And then the value becomes is a present for this observation, is a. Be present for that observation. You see present for that observation. There's also a lot of numerical distance is that we haven't covered in class. See, you've locked Minkowski one that we just talked about. And I encourage you when you have a problem that requires you to calculate distances to, most likely you're going to be fine with us. Again, most cases you're going to get there, but different distances and different properties. And for example, something that may be relevant for you is if I put my points in the distribution, how does that distribution look like? Do I care whether it has a long tail with lots of outliers? The one I'm weight, the distance between outliers the same as the distance between points near the center of the distribution. And some distributions would do that better than others, including some variance variations on the Minkowski family and distances. Okay? So I said that the words of this model are dynamic time warping pennies because we're going to talk about time series. Of course, dynamic refers to a specific kind of program. So we talked about dynamic programming when the description is a bit abstract, but so it will go something like break down the problem that you had to song into subproblems that can be solved independently and calculate the solution for those sub-problems separately and independently. Then you put them together. And yet, it seems like a fairly reasonable, reasonable thing to do. The idea is that if you're going to use the value of the sub-problems more than once. If you solve the problem this way and you store those values, then all you have to do is look up the values instead are recalculated. And looking up things. Typically much cheaper operation than cultivating intense, right? Except indexing, like figuring out which Valley to look up, actually might be extremely complicated. So typically, I think in the example that I share, you'll see that it's not, it's not obvious, especially for small number of populations, but typically, this will decrease the computational complexity of your operation, the number of floating point operations that you will make, which is how you calculate the cockpit computational complexity of your problem. The trade-off is that you're going to need a lot of, you are going to occupy memory. So you're going to have access to a lot of money. A lot of times, for example, you'd have HA, in fact, the cabinet staff there at D, which was the predecessor of the Darwin cluster, as a ridiculously small storage space compared to the computation on the number of flops that are available. So the number of computations that you make, which will make dynamic program impossibly difficult there because they're straining the memory of the computer as opposed to string. The computational resources. Makes sense so far. Okay, So what is it actually that I'm talking about? So I'll give you one example and we're going to see how if we write this program pro problem, if you solve this problem by writing code that works both in a recursive way, which is because it has opposite to dynamic in the sense we write it in dynamic programming style. I'm going to show you how the computational cost goes up and down. The problem is calculating the Fibonacci sequence in which you may know, is a very popular and famous sequence of numbers because it has some nice mathematical properties. I don't have the spiral. The spiral would have been a much more. For example, it's really important even in the graphic arts, because it provides some visual balance that is instinctively appealing to people. So if you draw a spiral like the sequence itself describes a spiral, a logarithmic spiral type, which a figure that turns out to be very comforting to humankind to look at. The, the idea is I take a sequence of numbers, that the series means it's a sequence of numbers. And I can, at any point of the sequence, I can tell you what's coming next if I know what came before. So the sequence, the sequence starts with 0. The next number is one. And everything after that is going to be the sum of all the previous numbers. So it's 0, 1, 1 plus 0 is 1, 1 plus 0 plus 0 is 2. 2 plus 1. 0 sounds about right? Now we're getting outside. It's the sum of the two. So 25, 33 plus 2 is 5 by the freeway. And it goes on teaching. Of course. So let me call it up. I encourage you to pick up a book and do it with me. And it's going to be kind of a nice piece of code, I think. So we are going to use only num file for this. Import numpy as np. And we're going to define two functions. We're going to define the Fibonacci function. This is going to be the one that recursively calculate the next Fibonacci number. And what I want to give it as argument is going to be what's the number, the n value, right? So the first, the first n Fibonacci number, I'll give it. Okay? And so the other one will be the dynamic program in which remember means we're gonna try and go ahead and do as many calculations as we can and store them and then calculate what we need afterwards. So gnomic program file. So like I said, the first two numbers are set arbitrarily directly with the 0 and 1, right? So I had to set the first two numbers because the Fibonacci sequence, it works on the previous two numbers. I'm going to have a different behavior. If the n, If the number that I want to calculate is smaller than two. So let's say if n smaller than two. So what I'm going to return is, and if the number is 0, the Fibonacci sequence zeroth-order is 0 if the number is 11. Okay? So that is a rule for and is the argument of my function. This is a simple rule for the start. Then otherwise, what would we want to do? I want to return the sum of the previous two numbers. So look what I'm doing here. I'm returning and putting inside of this return function a call to the function itself. So the com, the return of the function will call the function, which we'll call to get the return. And we'll call the function. We should get comfortable in that sense. It's because it until and we'll get small enough and then it will break with my original return number. Sorry, this should be fun. How n one plus Fib of, okay? So let's test it. Sinks, can see a series of 1 is 1, 0 is 0, 3, 2, and 1, 2. Yes, it is correct. That's right. That's the third number is to 1050 back. As a commonly have two sequences. Growth. Okay? They didn't take very long to calculate that. The time will scale based on how many, what's the number than putting it. So of two or three. With three, it's going to be 12 microseconds for 10, it's going to be 32 microseconds, shall we there going up to 3402nd with shift towards 101.5 orders magnitude and body, if I remember, that's only really starts blowing walk. So it's kind of done. The coffee computational costs kinda growing exponentially. So He's going to take a few seconds for it. To complete. Those few seconds, Let's think about how we would write it from dynamic programming. Dynamic programming is I'm going to store a bunch of things. So if I store things ahead of time, I want to prepare a storage space for them ahead of time. So the first thing I'm gonna do here is I'll call it. I'm going to, I'm going to create a container where I can store all the calculations that I'm going to do up to my n number, right? I'm going to set its size. I'm going to fill it up with zeros, which is not the least computationally expensive operations that I could do. I could actually leave it as np dot empty. That will not say any warnings into those sounds. But that's very prone to problems. Because then when you go look at the cells, you may not realize that the cells are not being filled in with your values. So we're just going to give it up with zeros. And we're going to create a n plus one large because I want to up to n. And we're going to make it integers because the Fibonacci sequence is integer numbers and that will save you some computation or some storage space. Because I can encode integers in a smaller container then. Then floating point numbers. So I don't have to worry about my first, the first thing that I was going to do was like my condition, right? So what happens for two special number 01, the zeros, I don't have to worry about it because I set it up to 0. And the second one is phase one, which I'm going to set to one so far, so good. So then this is not even that computationally efficient. I'm going to set a for loop, which in python is probably the worst thing from the point of view of computational efficiency. But I'll demonstrate, knew that a lot more efficient than the previous one anyways. So I'm going to do that for all numbers between Hi and I'm going to use, I'm Peter, a range instead of using range because I want to use those numbers and range, we'll create a list. Maybe I could use range. They use arrange them, use range. And thought maybe was better to use a range area, but I don't remember why it's awesome. And if it doesn't work and you know, from the wonder I haven't used yet. So that's number two. I'm going to go all the way to n possible to n. So that will be up to n plus one will be the end of my, of my range. And I'm going to fill in every cell with what it should contain. I is equal to. So every cell should contain essentially the same thing that I did here, right? N minus 1 plus fib result of n minus 2. And then I'm going to return fib result of AI. But just for being on the safe side, let me return result of minus one. So that will give me a check on whether my, for some reason my loop did not go high enough that we'll find is that it's wrong. Okay? This makes sense. We will set the argument to n. And I'm sure that there are some typos. Let's try it. So let's start with this moment. First of all, let's check that it's right. In dp of 0 goes ax is 0 with science. And what I do, I O, yes. That's why I don't want an m plus one. Yes. But because I have 0, yes. Because I can't do that. I didn't realize that I must have not tested it with 0 earlier. So that the Fibonacci sequence of two is one of three is to obtain what Mozi, remember, 35 or 40. Remember how long it took to calculate the previous one? The previous 12, 45 seconds. So to calculate in a recursive fashion, I needed 45 seconds. Let's do is 39 milliseconds. But notice the actually let me save them before so that we can see what they were. Let's do 310. Oh, by the way, perhaps I rounded clasts over this. And if you're not used to seeing the time, sometimes I forget all the time. So time is when it starts to the person. It's called a magic man. You don't need to invoke a package to run it just inside of an IPython. So time will tell you report on the time that it takes for a function to run. In order to do that, it will run the function a bunch of tax. Because when you run it once, it could add stochastic variations thereof, you will like the load on the computer, right? If a lot of people are using the computer, you might implement that. And you might end up waiting for some time before you can run. So it will run a bunch of times and that it takes ten microseconds for the users. And that tells me that one time, which is a 100 cock time. So this is a computation, the CPU time that has been used and the cop time that has been used. So really when I started and ended the function, okay, So 10 will take, and you see that it will vary a little bit, not tremendously. And we're not gonna redo the 40 because it took a minute. And now we're going to do worse than p 45 millisecond. So if we compare it, in fact, at low order, this has taken longer than doing the recursive step. Hi, or a 10 year 62 milliseconds, we are comparable to the recursive method. But when I go to a higher order, this is essentially the computational time has hardly increase, whereas it has increased exponentially. For the regular programming one, anybody has any sense of why it takes so much time for the screen. For order three. What am I spending time on? Allocating memory, I creating objects. Yeah, So the operation that I do with them is a memory allocation that says, okay, The computer will save this by four terabytes for whatever is going to be done and stored here. So if I want to be a little bit more technical than the computer will say, it's a potentially consecutive. Most cases you won't be concerned with what aspirationally consecutive set of bytes in memory. And it will create a pointer to those bytes that says here's where the sequence starts and each element of the sequence is distance by that many. But if it's integers, 8-bit floating-point 64 with its orbit. So that then when I index, which is the other operation that I do, what I'm doing is I'm telling the computer is okay, take this variable of which starts in memory and a which you know, how far each element is and go to the tenth element. So all the computer has to do is take the initial memory play memory location, and then add 64. This timestamp, or 64 bits or eight bits timestamp, because these are integers. That happens really fast, much faster than calculating. Setting up that pointer, saying up that chunk of memory takes sometime very short amount of time. We're talking about milliseconds per comparable to a small number of floating point operations. Okay? That was dynamic programming. Any questions about that AMI program? As a concept, an idea? Alright, so what does he had to do with distances? So one thing that we talked about is the fact that if I have two time series and I want to compare them, in some cases, in many cases I'm going to want to compare their shapes. Then there may be some scaling that I'm going to have to take care of. I have proposed a reminder of that. But it's also possible that the time series will be essentially similar in shape, but not quite aligned. And in particular, some of the features in the seamless think about these high. This is low, so open the rapid fall here might be slightly out of base or stretch slightly differently. Especially if we're talking about features that have a high gradient. So a high slope, when you calculate the vertical distance, that may contributor enormous amount TO distance calculation, right? Because if I dropped by a lot in one than that differential between the previous point, going to contribute all that height to the distance, the cumulative collected distance from all the points. But there may be undesirable because the shape might be similar even if it's shifted by one timestamp or a few timestamp, or the slope is a little bit different, but there is still a peak. And then the k that are similar so far, so good. So the way to solve the problem is that it's very hard computationally to find the distance that is robust to this dynamic program. It makes it possible. And that's why we talk about dynamic program. Remember the distance between two points that's defining with the keyboard one Minkowski matrix for now just Minkowski distance pronounced spoken implicitly. That's the absolute value of the value of the future for the spine, think about that for example, as the X location on the stack. And if I'm in two-dimensions, I've already dropped my absolute value. If I'm in two-dimensions, then I'm going to add over the features. That's what I told you earlier. So we should be your claim. So it is convenient for me to put this learning path, let me know whoever their mic over here. Yes. If I had more than two points, it is convenient for me to think about how to calculate the pointwise distance in an efficient way. We'll do that with Python with a single Python file. What I mean by that is that I can look at the distance between each reciprocal pair of points. That's a pairwise distance. So this point has a certain distance from this distance, but this is different from that, etc. Think about it. If I have N distances for each one of the m points, that means that I have an n by m matrix, right? So I can store my, my distances as a matrix like this. That's where dynamic programming comes into play. In then I program, we calculate the pairwise distance between every two points in a, in a Likert in, into time series are in n time series and we store it. And then we do something about that. And we define a distance based on that. So this matrix organization couldn't have otherwise, I can dig very much deeper into it. A boring description of it. But so essentially this off diagonal is 0 because that's the distance between itself. And all the off-diagonal a and object B be a symmetric matrix. Based on my choice of distance, which says the distance from a to B is the same as the distance from B21, which was one of the core points, or the Minkowski distances. And it doesn't have to be for power distances. Okay? So shall we start making, so we'll use this, this is a single column that I need to make. That matrix will use that in a second. Okay? So obviously, the obvious thing would be to only consider the distance and everything about that point. So if I have a distance between two vectors, the obvious thing in the time series are synchronous, have the same timestamps? The obvious thing would be to say, what is the distance between each, between the point in each time series. Add that timestamp. That's what we've done before. That's what Domo don't cluster. Far so good. There's a couple of things that we typically do to ensure that those distances are January. For example, we have been worrying about whether or not there is an absolute distance, an absolute offset between the time series, the mean, the mean of time series. And we had dealt with that by cleaning. I ask that otherwise the distance between these two time series, even though the shape is identical, would end up being larger than 0. It will be 6 times the distance of each data point of wanting interests into the, in the shape of these. Then I had to do is remove the mean, align them vertically, and then that becomes 0. And if the tax is identical distances here, so far so good. Remember that when you do that ask, you learned you can use the scale package, but the default axis along which u, this package works, the vertical axis and on the horizontal much. I've seen that mistake the types. Okay? So what I'm asking here is how do I decide what is the right slanted mess? If I have some deformation of a time series, because the path and deformation another time series than that distance would no longer be 0 even if the shape was consistent if the time series we're aligned properly. So the idea of dynamic time warping is that essentially you or the x axis or one of the two time series to realign the NCDs effectively. And keep in mind that we're talking about warping and not just shut. So we're not interested in just the ship were interesting, really a deformation of the time series where some features are stretched compared to the other one. But there is a potentially optimal solution where if I stretched the time to stretch the x-axis for one time series, the top line in this case, I can realign my points exactly. So that's the idea. How do I do that? Let me let me get back to the notebook. And again, I encourage you to work with me so you can start a completely different no book or you can continue on the previous one. And I'm going to hand off some numbers for us to work because they're the same numbers that I have in this line. Segment. Ci Xi for this one. Multiplication by I need at the very nice tiny line numbers of the first array of ranking. So let's start by creating. So I only need NumPy installed side if I import fight or in a second. Let's just make two arrays that are random numbers. I'm going to use integers for simplicity and so that I can visualize them a little bit better. So let's use. From 0 to 10, let's make them short of four numbers each. That will make it compact enough that I can easily visualize it. So import pylab, SPL, plt.plot are one, R1 and R2. Let me subtract the outputs. So if I want to calculate the distance between these matrices, calculate the distance matrix like I did before. I'm going to invoke the path to the Python package. So port I buy from, I import spatial February the very large package, not everything gets imported. The input or we could just do era, that's that's my preferred way to do that. First of all, there is a function distance. This is to calculate the pairwise distance between the objects. This is a module that allows me to do things such as cochlea and the pairwise distance. Or I can go even faster and just calculate the distance matrix directly. So the distance matrix module will produce a two-dimensional object such as the one that I showed you earlier from my one-dimensional object. So r one comma r two. So this is not going to work because the rotation of my matrices as I design there is wrong. This is a NumPy array of dimension. The typical dimension in which you might have a time series for the Y values. But really what I want is essentially the transpose of it, which I can obtain in a couple of ways. But I'm going to do is say reshape minus 11. You remember we've seen this before. We have seen this before. You have to do that when you pass it, when you're passing data to linear regression model and Zhuan. Thank you. Yes, That's correct. So it happens a lot when you're doing operations that are at their heart, linear regression or vector multiplications. And so we have seen it when we're using SQL, Learn for linear regression and we've seen a version of it. Maybe we haven't seen it, but you could see a version of it in statsmodels, ordinary, least square feet as well, where you need to give change the dimensionality of the array essentially are just turning your array around. So I'm going to do that. I'm going to use the reshapes in another way to do that ease and PDA. I have it in the slide just to give you a breadth of options. And Peter, at least to the right here, and then transpose it. So you're essentially in that case, you will create a three-dimensional object or a bill one-dimensional object. So it added a bracket to my vector. And then I can transpose that. I need to add a dimension before I can transpose brain. That makes sense. Yes. I see some concern cases. And do it with both of them. That will produce my array. So far so good. So what does this array represents? What if I look at this number right here? What does represent? It can also help yourself with the boar is the distance between 12 in one of the arrays and 0 in the other array, 0, 1, 2, and 0. So it's the distance between the orange, the third in the orange array and the 0th in the blue array, which is six months to get by. So why am I being a little bit but bear with me because it's going to get complicated in a second. So who lose track of this right now? So how do I calculate the distance, like I said before, in these lines? How do I calculate this distance, this cumulative distance? Where are my slides? How do I calculate my cumulative distance based on this array? How do I get to my d? From this array? Everything works. Okay, the sum of the diagonal, because the diagonal is what represents the distance between points that are aligned. So I'm going to add the diagonal. Very good. So np dot Diane would give me the diagonal only NNP that sum or dot some method will give me the sum of that. So the distance between these two arrays, it's great. But I'm going to copy and paste a series of numbers. So if you're working on a notebook with me, we can use the same number and these are the numbers that were used in the core paper that describes dynamic time warping. And yes, for the demonstration that they have. So if you go back to their, to their paper, you will find essentially the matrix that we're going to create right now. So I'm going to define a vector actually. Let me swap them. Because I realized that they were the other way around. Doesn't matter, doesn't matter. We're going to have to swap them when we plot it, because Python plots, plots along the y-axis, what you might expect will be plotted on the x-axis in linear algebra. So just keep track of them. So first I'm going to define that. I'm going to look at the, I'm going to do the same thing. So I'm going to do a speed of spatial distance matrix of X dot reshape minus 1, 1, and y dot reshape minus 1. Common-law on. Let me swap them. Let me do Y and X. It won't matter for the distances, of course, it will only matter when I make a plot to be consistent with the paper as was written. So we're kind of stuck with my original x and y definition. Sorry about that. So that's a much more larger, fairly larger matrix, but nothing has changed. Notice that this morning I had a brain meltdown and I was like, why does it not change when I pass it a different order? Minkowski. So the P, the P here, if you read here, is which Minkowski norm you use which P in your sum, in your difference to the p square root p. Not changing because this is just a pairwise distance. So all I'm doing is taking two number, elevating it to the p power. Taking the p root of that is the same as taking the tuna. So it doesn't actually matter. So let's pretend we're using the Minkowski matrix of order 2. It's fine. And so when I have a matrix like this, I can plot it. If it's a two-dimensional object, I plotted with insurer. So let me first save it as this not equal. And then let's make a plot. Now. So should I ask you now about or actually, let me describe something more and I will change that part of it. Let me do some things to this plot. Let me put the numbers of my arrays, x and y at the tick marks. This will reproduce one of the clots that are in the paper and it's always something interesting to know. So PLD exotics is the function that I can use actually let me do, let me make a figure object. I will need it later. So fig comma x equal to plt.figure field of subplots 1 comma 1. That means I'm going to make a figure that contains one subplots, a row of one subplot and one row subplots and one column of subplots. In other words, once I plot, this object returns both the figure and the axis object, this cone. So then I can use them here. Let me not use it for now but later. So right now I have the texts that say 0123456789 numbers. I want to kick that are actually the values of my array. So let's say for all, let me show you the tongue. So the arguments of the functions are what takes you want to use and what labels you want to put on those sticks. So I'm going to want to use all the text. So I'm going to arrange nine. And then I want to use the label and pour labels. I want to use my x and y values. It's a little thing that I could use this x by x another hour because the label is the text. It's a string and x contains integers and strings. So what I'm gonna do now is going to be a list comprehension, which is a sink, a compact syntax for writing a for loop that returns a list with the content of the for loop. And it is something that I encourage you to look into because as the syntax is both easily readable, so fairly transparent, it's more efficient than an actual for loop for reasons that are not actually entirely clear to me. And it's very common in Python. So we talk about coding in a Pythonic way. That's an actual a word that is used in the literature. So this is very Pythonic, which I'm about to do. And so I want the x on the x. Yes. So for what I'm going to say in this full, essentially what I'm writing is for I in X takes the value of, take that value of x and turn it into a string. But I'm going to turn around the syntax. So I, for I in x. So it's just a for loop turn setup. So good on the x-axis and y on the y-axis, That's right. That's right. If not, we'll change it later. And this is the white ticks would have done. I haven't got the parenthesis. Okay, so now I have the numbers that I actually wanted. And just to be an overachiever, let me put the ticks at the top instead of at the bottom because that's how they are in the paper for needed the axis. So the command for this is X tx. Don't pop tax. I always have to look that up. No tabletop access statement. When I look it up, I don't seem to be able to read it. Okay. Here it is. So that's my part. So amigo to the idea of dynamic, I'm working. Now we have that, we're going to use that in a second. So the idea of dynamic time warping is that I'm going to define a path along that matrix. If I wasn't doing dynamic time warping, the packet will just be the diagonal will take only the diagonal elements. I will sum them. I'm going to define a path on that diagonal matrix that is ideal in some sense. The sense being that in minimize the work distance of my clients. And I'm going to sample the data instead of summing over the Yakov that choose that path, I had some rules. So instead of looking, so let's look at that, say that the numbers are identical at the beginning. So I start at the first, I will always start at the top corner, my diagonal and 0's your entry. Okay? I want to take whatever is there in this case. Let's say that we're super lucky that time series are aligned and I remove the mean at the beginning and I remove the mean so the distance is 0. Next one, same distances, 0, 5. So next one's a stanza 0. But maybe if a lot of zeros, zeros, this is just whatever the businesses had been there. Let me call this distance between the object I and object J minus 1. Okay? So I'm just off the yellow. One of them takes about one of the indices, takes a minus one, the other one is just the index. What if I consider also these and choose, instead of choosing as a default 0, I give myself the ability to choose whichever one is the smallest of the three numbers around that. I guess that 0 do not represent the distance it was meant to represent those diagonal elements. So essentially what I'm doing is that at some point, every point where I am, I'm going to say, okay, what's the, what's the distance to the neighbors that a better metric distance than the distance to the aligned point. The problem with this is that in my lead to very significant warping, you might end up saying, like all the points in already are most similar to the first in array a. And you end up with a distance matrix that is not meaningful. So at the end of the day, you're going to choose that with some constraints. And there's a variety of algorithms here that define what are the constraints that you said to look for the distances. So these are two possible, typically that parentheses are, and it's the r parameter in the search. And you can say, I will only look these far-flung diagonal. That's the easiest criterion. But you could also say, I want to define a distance that, that will kind of force me to get the full time series are almost at least the full-time series on both side. So they don't neglect the end of one time series completely, but it allows me the maximum freedom. So you might have more ability to look in the neighborhood. Called the points in the center of the time series data points. Yeah. So fine. So we're now going to write the dynamic time warping from scratch because that's a ribbon I'm going to actually code. But we're just going to continue with the visual representation of this just to get a little bit more intuition. So let me ask you a visualization question. So here I have a time series where the call here have a plot in a tree where the color represents the distance. The distance by definition of distance from Caspi is bound to 0 at the bottom. And it can grow as far as I know, infinitely large has some constraints perhaps, and only that many numbers, and only those values in my array. Is this the best way for me to represent this array? This is a color bar, color map called abilities. And it goes like this. You've seen it before because it's the default in Python and you're super lucky because the default is to be rainbow. It goes blue at the bottom. And yellow. The part that I'm thinking about, the thing that I wanted to choose, which is I want to choose the smallest possible about. Is this most appropriate time series. Hanging there. It's not that bad. Um, but that's, so, think about when you think about choosing colors for your plot. Think about what is the perception and perception and what is the emphasis. That here the emphasis for me is simply to identify what is a smaller value. So I really want something that will very intuitively show me the direction by the relative direction of a single color scheme. A single grade in a gradient on a single color is the most active weight. They want to represent my data by the vertical value, the vertical, the y value. This is typically the best way. So in this part, I have a chunk of code which I'm too lazy to cook to reproduce. But I can put it in the chat if you want to plot it. It will plot the numbers that are not. Because I'm trying to copy from one computer to the next. That's not what I'm work, hooping on one laptop and pasting and the other one. But it will plot the the distance value on each cell. And frankly too lazy to do that, but I did it earlier so that you can just do it. Let me put it on the chat. Okay? And here, this is the distance between this value and that value, right? So three minus, minus, minus to pay sponsored by 10 minus minus 13 is going to take. So what I want to do is find a path that starts at five. As a default, I had started studying it the first time. But at every step, let's say we do the simple limited to q minus one and minus one. So we're only looking at specific neighboring, neighboring points of where I'm going to choose. Which way am I going to go in the next step. So they don't allow lousy. Yeah. There's not. 11237 is obviously silence of my next point this year. I really nice align my time series, what's down there, and so on and so forth. Notice that I can end up going for quite some time along one direction. That's where I'm going with the constraints. So that's basically what theories to dynamic time warping. The problem is that even once you have implemented that, So the dynamic time warping aspect of this is the implementation, as yet, is the implementation of the model as precalculated, got the distances calculating the matrix to start. The problem is that it still can be extremely computationally expensive to then find the path. And so there are some limitations and some bottlenecks. I do have an exercise on dynamic time warping. Just kinda fun because uses sound clips. And you can see if you can find by dynamic time warping, what is the actual sentence that is being said in this short clip? Set by different people in different recordings. But if you do that, it will take some time because even once you have pre-calculated all the options, you still have to, that's fine. You still had to index the time series. So these few slides that come next, or about how you index the time series and what are possible solutions for algorithmic indexing of time series? There's a bunch of people bearing find. Is there anything else that I need to tell you other than giving you the slides? I think that's enough. You don't really know any of that. So let me just see anything. So the effective time series model will implement. We will implement indexing in a smart way. We'll use a variety of metrics to make sure that they have pounds on where you're looking for things which how far you're looking away from diagonal, et cetera, et cetera. And they enable the search for millions or even trillions of time series. And this is a quite revolutionary algorithm that really has made a mark in the in the computational time, in the modelling the time series. At the end of the day, you're going to use Python to do that. There's a couple of packages that use. There's one that is kind of neat because it gives you the actual clot. That's the one on the left. Interesting. So this is a very large mentioned and it actually shows you the path that you have taken along the diagonal. And it's not it's not particularly hard to use. I don't know that it's implemented in such a way that it's the most efficient, computationally does take some time to run it. So that's a warning. And instead, consider a case learn, which is a machine learning package for time series. It increments a variety of other things. And the thing that you want to use is the dynamic time warping metric. Because remember that at the end of the day, all that we have done is the finding a distance. So today's lecture was a very computational intense and complicated machine learning method to define a distance between two time series. So this will be your input to data machine learning model that, I don't know, clusters are classifies or this is just the element that goes in. So you're still talking about data feature extraction and feature engineering. Any questions on dynamic time warping? Alright, right, so next class we'll do very quickly a couple of models and you will tell me or even goofy, fine. Now, there's a message in the chat. Thank you. Okay, good.

MLTSA 2022 7. 1| Dynamic Time Warping

From Federica Bianco March 22, 2022

4 plays 0 comments You unliked the media.

Zoom Recording ID: 92136770169 UUID: UcTdKtvxRRWg6OE6sm2Hkg== Meeting Time: 2022-03-22 07:19:17pm

Tags: time series analysismachine learning

Appears In: MLTSA2022

Comments

Add a comment