All right, Welcome everyone. This this week seminar is by our own, Willow for Tino, who is going to present on work that she did at UPenn is part of the dark energy survey. And I guess because Gary Bernstein Bernstein, I don't know how much it is. Gave the, you know, once, once the adviser. So just take it away. Hello. Thank you very much. Fantastic. Okay. I'm going to ask the usual questions. Can you see my screen? Okay. Can you hear me okay? Excellent. Okay. All right. I'm going to put you in the corner and then I'm going to press go. All right, fantastic. Hello everybody. As John said, my name is Willow. Willow for Tino. I'm a first-year grad student here at the University of Delaware. And today I'm going to be talked to you about work that I did while I was an undergrad, a senior undergrad bit last year at University of Pennsylvania. And it culminated in a paper. And the paper is called reducing ground-based astrometric errors with Gaia and Gaussian processes. So also, yeah, So here is the archive name for this paper. And also I want to point out that a another assault, someone else, Pierre-Francois, they get, was also working on very similar work for the hyper subprime can. And here is the archive link for that or not the link, but you know, and I might be mentioning that. So but anyway, let's dive right in. But before I dive right in, please let me say May the fourth be with all of you. Happy. May 4th, everyone. And also, if you're not from the United States, then May 4 with you be of course, for those who have an alternative date format, I wanted to show this image because I thought it was really cold. This is on the nasa Twitter. And of course, this is, this is tattoo when something I didn't know is that in 1977, when Star Wars came out, astronomers didn't really know that binary stars were so common. Very interesting. Okay, Let's get onto the talk. Here is the Victor M Blanco telescope at the victor and Blanco Observatory. This isn't she lame. And this is the observatory where all of the dark energy survey data is coming from. And I'll explain with the dark energy survey as soon. But I just want you to appreciate this beautiful picture and that I got to the opportunity to work with data coming out of this wonderful thing. Here is the telescope in more detail. I believe this is while it was being renovated to change the camera and I'll get to that in a second. I'm back here. You can see the four meter primary mirror. It's very shiny. And then here in the foreground is the camera. And here's another shot from the same thing with the camera there. Okay, and then here is the focal plane of that camera. It takes beautiful, beautiful pictures like this in a very interesting array. So each of these CCD tiles are arranged in this peculiar array that I don't fully understand. But nonetheless, it's a beautiful camera and it takes a wonderful images. In fact. So this Victor, I'm Blanco Telescope is. The Sarah said hello to low Inter-American Observatory. And at there, at the observatory is the dark energy camera, also known as DeCamp. I know lots of people call it dot cam. I don't really know which one is which, but I think I'm probably just going to call it DeCamp because that's the one that I've called it. It has it was given to CTIA in exchange for observation time or at least that's my juvenile understanding of the situation. It has this amazing camera that I was pointing out, which has 62 of the CCD tiles for a total of 520 megapixels. If you're familiar with the ES data, then you may know that often it's slightly less than this because some of these tiles are malfunctioning, but that's beside the point. And just for reference, an iPhone, iPhone 12 Pro has a 12 megapixel cameras. So you'd need, What about like about 45, 45 iPhone 12s or so to get a comparable picture that the dark energy camera can take. Now this Dark Energy Survey is a surgery that was hosted at this Blanco telescope. In exchange for the dark energy camera, the dark energy survey was about a 5 thousand square degrees survey, really big, much bigger than the size of a full moon. And it was, it took place over six years, about 750 observing nights. And its goal was looking for evidence of dark energy and dark matter and lots of other stuff that I in particular and not really concerned with. Ok, and now to shift gears a little bit, let's talk about the Reuben Observatory. So this is the Vera Rubin observatory being built. It's also at CTIA or maybe not at the exact same site, but I believe you can see it from the Blanco telescope. Here is its focal plane array. It has a 3200 megapixel camera. It's enormous as well. It's the size of a small car. And it's just incredible. I love this thing. This was created, this was built at Stanford. And here is an amazing picture that I took. This is at the Richard, Richard F. Keras mirror lab at the University of Arizona. This is where they make all of the up to 8.4 metre mirrors for various telescopes like LSST. And this is the Kelm specifically, and in here is glass. And this whole thing spins at a very particular frequency, angular frequency, so that the parabolic mirror, the parabolic shape of the mirror, is already created to 0th order as soon as it comes out of the kiloohm. And so after this, of course, they do many rounds of polishing to, I believe they they get the shape, the parabolic shape that they want to within like one part in a billion, something like that. It's incredible what they do there. Oh, and you should ask me more about how they like choose the glass and stuff for these matters. It's incredible. But okay, so the velocity Ruben Observatory and the Legacy Survey for space and time, as I said, it's near CTIA. Some people probably consider it to be the successor to the dark energy survey because it has a similar scope. And similar science goals. It's much, much bigger though it's 18 thousand square degrees on the sky. As I said, it has a 3.2 gigapixel camera. I don't even know how many iPhone 12s that is. And it's also a ten-year survey. So we're, we're gonna get a lot more data than the six your survey from DES. And as I said, the goals are similar, dark energy, dark matter, etc. But that's not really what I'm concerned about. I'll come back to the Vera Rubin Observatory towards the end and why that's relevant to the work. Although obviously, like everybody else, I haven't been using any data from analysis to yet. Okay, so now to get into the talk and kind of the motivation, the problem of my research. Now, what the issue is is atmospheric turbulence, or in other words, why do stars twinkle? In a very basic sense, what's happening? And this, this is very important in choosing your observing location for a telescope. But what's happening is that laminar airflow is coming in over the ocean. It's hitting your continent and it's becoming very, very turbulent. And turbulent airflow causes images to appear blurry. So you can see this image of Jupiter. You might expect to see the left image if you're at a good observing location on a good day or rather on a good night. And from sight on the, on the right image here, you might expect to see in a poor observing location on a very turbulent night in the upper atmosphere. And so how this manifests for astronomy, rather astrometry is that this turbulence in the atmosphere is changing ever so slightly. The position of the measured position of a point source. Specifically the well, I'll get into the specifics in a little bit, but that's what's happening. And so this is why atmospheric distortion, atmospheric turbulence is a problem for ground-based observatories because as this light is passing through the atmosphere, the the location of the, of the image is being distorted. And that's where Gaia comes in. So this is the Gaia space-based observatory from the European, from the ESA, the European Space Agency. This is an artist's rendition. Of course, this is a full sky survey. I believe it's primarily an astrometric survey, but it does take some beautiful full sky maps like this. And this is the primary mirror here. And the reason why I'm talking about this is because it is a space-based observatory. So what that means is that it is above all of the troubles of atmospheric turbulence. And even though this mirror is much, much smaller than the four meter mirror on the Blanco Telescope. And even though it's much, much smaller than the LSST, sorry, the Vera Rubin observatory mirror. Is, it can still get much more accurate astrometry than either of those because it's above the atmosphere and I'll quantify that in a bit. And so the idea is, the main idea of this research is that we say, Okay, we have this DES data, or in a few years, you can say, Okay, we have this LSST data. Now we want to say we have this guy data as well, and it's a full sky map. There's some way, some machine learning way to use this Gaia data to inform a machine learning model to remove this atmospheric turbulence from these ground-based observations. And the answer from this research is we think Yes. So this over, I forgot I included this. This is an amazing image that compares the sizes of all these different telescopes. And you can see this is the Reuben Observatory Telescope here. We have Hubble over here and guy is very, very small. And so, whoops. And so despite this small size, we still got these amazing errors and we can a 100 percent take advantage of that, at least that's what this research shows. And also the research I mentioned earlier by peer France while to get it also shows this and this is how we do it. So please ignore this image on the right for a moment. But the idea is that we're registering the y six DES catalog to Gaia DR2. So why 60? Yes, just means the Year 6 dark energy survey catalog and Gaia DR2 just means the data or at least two from Gaia. The specifics of that don't matter. But what's important here is that we're taking, okay, we have this DES, data. Now let's match it. These match these astrometric observations to the Gaia data. And we're just going to subtract them and construct the residual field of positions. And this is a bit of an oversimplification. But essentially what this plot on the right is showing is that each of these vectors point from a DES observation to a guy observation. So if we assume that these, that this guy a data is in some sense the true positions of the stars, or essentially is true is we can get, then, we can say that this residual field is a map of the atmospheric turbulence. The atmospheric turbulence in this DES, catalog. And you might be wondering, are there any other systematics? And what about Shot Noise? And the answer is that as far as we can tell based off of previous research, there are no other systematics. I suppose it's possible that we missed some, but I'm not involved in that research, so it's not my problem. So when we begin this research, we assume that we're left with just atmospheric turbulence errors and shot noise. And shot noise is random errors and I'll explain how we deal with that soon. But what we're doing here is we're training our Gaussian process regression model and we're minimizing the correlated variance and the residual field. So the goal is, we say, okay, here is this image, this residual field on the right. We try to give it to a model and we say, okay, now can you tell us what is the residuals at another point or what would the residuals be at another point on this focal plane? And the idea is that we say there are stars that aren't in the Gaia catalog that are too faint, but are in the DES catalog and the LSST catalog or will be in the LSST catalog because they have bigger mirrors, they can detect fainter objects, but guy can't see them at all. So the idea is that we, for those stars, we can interpellate this atmospheric distortion field at those points, remove them. And then we can say, okay, we've, we've, we've reduced or hopefully removed entirely all of the atmospheric turbulence, even for the stars that guy doesn't even know about. And you can kinda see in this image. Each of these little vectors, they kinda know about their neighbors. And you might expect that if you think about turbulent, what turbulence is, it's kinda blowing stuff across the sky and maybe you can kind of convince yourself that, Oh yeah, kinda makes sense that these, these vectors are kinda know about their neighbors. And that's what we mean by minimizing the correlated variance in this residual field. The fact that these, these vectors know about each other, these neighboring stars know about each other. That's what we're trying to minimize. And the way we're gonna do that is this is a bit of Gaussian process jargon, but I'll get into it soon. We're using a custom kernel based off of known atmospheric physics. So this is something that no one's ever done before. No one's ever written this particular kernel because it's very specific to atmospheric turbulence. And well, you'll see, I hope you find it interesting. And then finally, of course, you know any, any good model at the end you're going to say, okay, how well did we do? However, were we able to reduce the correlated variants actually. Okay, So Gaussian process regression, what is it for those who don't know? First of all, a Gaussian process is a collection. I'm just going to read it here. A collection of random variables, any finite number of which she has a joint Gaussian distribution. And it can also be thought of as a distribution over functions. So we can see here f of x is distributed according to GP Gaussian process. And just like a regular Gaussian, it accepts, it is determined by some mean, in this case a mean function and a covariance, in this case a covariance function. So you can see what we mean by it's a distribution over functions. Now this mean function we typically we take to be 0 and I don't understand the full the full motivation behind that. Well, I do, but I won't get into it. And this kernel function here is kind of where all of the, where all of the, the art in machine learning comes from. You could say, choosing this form of kernel function is what you're doing when you're choosing, your kernel function is you're choosing the relationship that your points, your data points have with one another. You're choosing how they co-vary, or you're, you're modeling how they co-vary with your choice of convenience function. And so this is very, very important getting this right. It turns out you can use a fairly bad kernel or a fairly generic kernel, I guess I could say, and still get results. And they'd be okay results. But choosing an optimal kernel can be a really big boon to reducing this atmospheric distortions. I'm sorry. Quick question, please. Yeah, absolutely. On the top line. Should we read that f of x to actually be f of x and x prime, or have you somehow integrated over x prime? Yes, so this is just x, so x and x prime. This kernel is, so when you evaluate this kernel, you're always evaluating it at pairs of points. But f of x is only. So this isn't, this isn't necessarily directly. The same this isn't the same notation as what I'm using for the actual residual field. But some function f of x is only depending on one argument. But when you have your kernel function, it's evaluated all pairs of arguments. And so x prime is just the other, the other member of each pair. Does that make sense? Why? I would if you're, if you're averaging over all possible values the second. But, but if not, then it seems to me it's a joint distribution that two points but, but go on. Maybe this is just to stop outline you're showing right now. It is an outline it, and essentially what's happening is that you can think of this as a mean. So you can think of this as the covariance, covariance matrix. So if you want to draw from an n-dimensional Gaussian, then you're going to need an n by n covariance matrix. And this is essentially in place for that. So at each, if you were to actually put this into code, then you would generate an n by n covariance matrix where each element of that matrix is evaluated using this kernel function. But okay, so I'll come back to more about the kernel later. So now I'm going to get into the GPR model, just the basics of it. So we have our inputs x, and these are just the DES, astrometric position. So when you put into your model, your model says, Okay, what do you want me to predict on? And you provide these DES positions. And then the targets is what we're trying to predict. So we have the yes, it's just the DES minus the Gaius. So these are just the residuals, the residual feel that I showed you before. And as I said, the kind of the whole crux of the model is the kernel, and that's the K of x. Here. I'm going to kind of glaze over this a little bit, but this is this joint distribution here where if you take your y values to be your known target values, and f star to be the values that you're predicting on. And this is distributed according to the Gaussian. It has mean 0. And this is, this is a block matrix. So each of these capital K is your is, is a matrix of points and each element is the result of evaluating your kernel function K at those two points. And so if you want to know what your residual field is, f star, and you want to know it at points x star, then this is what determines that behavior. And so the actual linear algebra for figuring this out, I'm not an expert in it, but it's really rather simple linear algebra here. If you look at the noisy case here, it's really just some matrix inversion which has a, which is a problem. And I'll come back to that later. But it's really just a matrix inversion of matrix multiplication to figure out your posterior predictive mean, which is, you know, what the models best guess for the interpolation is. Okay? So this is the model. And now let me come back to atmospheric turbulence. So this is the same image as before. And what Dr. Bernstein, my PI, what he discovered in previous work is essentially what's summarized in this image here. So he looked at the y six A1 data. This is just the DES, your six data. And he discovered that this atmospheric distortion has a coherence length of ten arcminutes. And what that means is that this, these, these vectors know about each other, but only up to about ten arcminutes after two stars separated by more than ten arcminutes, they don't really know about each other that much colloquial sense. And I'll come back to more coherence length later on. He also found the amplitude and patterns changed unpredictably. So what that means is that you can't take two consecutive exposures and try to say, okay, well maybe since these two exposure through taken right after one another, maybe they kinda know about each other. Maybe the atmospheric turbulence is kinda the same across those two exposures, but the answer is no. He also found that the atmospheric turbulence amplitude is clearly an isotropic and both amplitude and coherence length. So if you look at these, the, I'll talk more about the correlation function very soon. But if you look at the atmospheric turbulence errors and it has, it will have a major axis and a minor axis, meaning that the wind is going to be blowing strongly in one direction and not so strongly in the other direction. And so you're going to get some errors, some exposure There's will have very high errors in one direction and lower errors in the other direction. And so that's, that's what this distribution is showing here. So in the blue you have the major axis. You can see most exposures have around, you know, what is this? About seven or eight. Rms turbulence along the major axis. And then most exposures have about what does this four or five milli arcseconds along the minor axis. And so what this is showing us is that this is clearly anisotropic. And if you know anything about Gaussian processes and how they're implemented in common. Python packages. You'll know that a lot of kernels they give you right out of the box aren't an isotropic. And so that kinda means this theta kinda means that we need to, or may, maybe it's useful to treat our own our own curdle. Have a quick question. Yeah, I've only you're referring to atmospheric turbulence put which property of atmospheric turbulence is relevant? Is it the density or is it the velocity as a mass flux? I mean, I'll get into that soon. I'm actually, I'm not really an expert in exactly what's causing that, but I will or the exact mechanisms, the atmospheric turbulence. But I will go into shortly about the AR model of turbulence. Um, so just before I get into that, this is where Gaia DR2 astrometry comes in. Because Gaia has very, very low errors, as I said before, if it has less than one milli arcsecond RMS error for very bright stars, G less than 20 magnitude. And it also is an All Sky Survey and it has about one star per square arcminute, which is dense enough for us. It turns out that corresponds to about 10.5 thousand sources on a DES, DES focal plane. Okay? Now, something also I should mention is the fact that we've noticed that these residual fields are curl free. And so what you're seeing here is this, these, this divergence and curl is not the divergence of curl of the image you saw before. It's from a different one. But what you're seeing here is the divergence of some residual field and its curl. And you can see that the curl is practically nonexistent except for some noise. And I will quantify that later. But there's, we assume that these residual fields are curl free. Okay, anyway, so the point of this is that we say, Well if, if these residual fields or curl free, then maybe this, the x and y component of the residual field, maybe they actually know about each other. And so this allows us to create a more interesting Colonel than we otherwise would have if X and Y didn't know about each other. I'm specifically x and y can inform each other about how they co-vary. And so I'll talk about that soon. So this image was John. Hi, Dr. Bernstein. It's a beautiful image. And this is, this is essentially our model for what's happening in the atmosphere and how that relates to the residual field. So phi in this sense, Phi and this image is the, is the index of refraction integrated along the line of sight. So I won't get into this because I'm not an expert in it. But you might be able to convince yourself that as starlight comes in, these parallel rays come in and they hit this turbulent atmosphere. And because of the density and temperature variations in the atmosphere there, the index of refraction of the atmosphere is changing. And so the path that the starlight is taking is changing. And this is what causes this distortion that you capture on the ground. And you might be able to convince yourself. And again, I won't go into this, but you might go to convince yourself that this residual field is the gradient of this index of refraction integrate along this line of sight. And this is so useful because we see that this field is curl free. And if it's curl free, then it makes sense that that's a gradient of a scalar field because of course, the curl of a gradient is always 0. Okay, so now I'm going to talk a little bit more about our model again. So as I said, we're modeling, we're assuming here that, that this residual field is caused by fluctuations and the index of refraction integrate along the line of sight. Whoops, integrate along the line of sight, and we're calling that five. And our residual field is then the gradient of that scalar field. Now phi as well approximated as a Gaussian random field and it has some power spectrum. And you can see that below, but I'll get to that soon. And the point of Gaussian process interpolation is that if you guessing process interpolation is going to be optimal, it's going to be the optimal method if we choose the right power spectrum or alternatively, the right correlation function. Because I said, as I said before, choosing, your kernel function is making a choice about how your points know about one another, about how they co-vary. And so that is encapsulated in the power spectrum. And so I write here, if the turbulence is combined to a single layer in the atmosphere, then we expect a power spectrum of index fluctuations following this model. Again, I'm not, I'm not an expert in von Karman turbulence. But this is the, essentially this is the meat of the model. Here. This is Von Karman turbulence. This is a particular model of how the atmosphere works. And these two terms here have to do with the fact that the star field is being blown across it telescope aperture. And these are five model parameters, are five model hyperparameters that we're trying to optimize for. We have theta, which is the outer scale, is essentially the, the angular size of the turbulence patch. In a sense, we have the diameter D, which is the diameter of the telescope aperture at the height of the turbulence, we have the wind vector, which is just the direction that the wind is blowing. And we have the total variance or the amplitude of the power spectrum. Okay? So here are some two-dimensional two-point correlation functions. And let me draw your attention to the top left one here. This is, this is labeled raw and the reason is this is just the three-dimensional two-point correlation function of the, the residual field. And what you can see here is you can see, okay, first of all, this definitely looks and isotropic. You have this major axis I was talking about here, and you have this minor axis perpendicular to it. There's a lot to read into here. We don't go into this much in a paper and I won't go into this much now. But you can see there's a lot of interesting structure here. Maybe you can kind of read into this. There's seems to be, excuse me, seems to be like a circular thing happening here. And maybe this inner thing isn't aligned with this thing. And I don't know, there's a lot to go into here, but we don't really delve into that much. Now let me direct your attention to the bottom left image. This is again a two-dimensional two-point correlation function. But what we're doing here is we're saying, okay, we're just going to fit our power spectrum to this image. So we're not doing any Gaussian process regression yet. We're just saying, let's see if we can recreate this power spectrum. Sorry, this correlation function with our chosen power spectrum. And when you fit this using a simple least squares fit or Chi-squared fit or whatever. This is, what you get. It kinda looks the same. You can see it's kind of going the same direction. It's not as long and skinny as you might expect, but it looks kind of similar. For now. Even more weirdness comes from this bottom right image, what we're calling the optimize one comment model. This is fun. We are using Gaussian process regression to optimize over these hyperparameters. And the reason why we do this fitted step first is because it's much faster, computationally much cheaper. This optimized Von Karman model, it takes very long time to do, but the results are in theory better because you're actually doing Gaussian process regression. Pierre-francois get doesn't do this step. I believe they just do this fitted Von Karman model and they actually get similar results. But that's, I think another topic of conversation. So anyway, you might be wondering why the heck is this so circular? This doesn't look anything at all like this. And yeah, you'd be right. We don't fully understand why this is the case. We would expect that, okay, this is kind of elongated here. And we would kind of expect that is going to be even more elongated to match this. But that's not what we see at all. The curious thing is, is that when you say Okay, these, this is my kernel. And I'm going to use this to predict for my, for my validation set. And I'm actually going to do the method. I'm going to predict on these, to predict these residuals and then remove them from the data. And then I'm going to calculate the correlation function, a bat and see what happens. And what you get is the top right, you get no correlation. Pretty much just some noise around the image. And that's kind of interesting because this doesn't look anything like this yet. It seems to perform very well at reducing this correlated variance. This is kind of a mystery. This isn't a Scooby-Doo episode. I can't like I don't have the answer for you at the end of this talk unfortunately, about why these shapes are the way that they are, but that's the way they are. Okay, So now a bit more overview before I get into the results of the paper. So some non-standard GPR aspects. Again, we have this anisotropic kernel. This is not hard to include. We have these dimensions of the residual field Y are not independent with respect to one another. Again, we expect this curl free field and this is really interesting and potentially very useful because this x and y data can inform each other and hopefully make the model even better. We've derived this variant of the GPR formula which can enforce and exploit this. It forms that single two n by two and covariance matrix for an XY points, x, y points on the residual field. And so what this means is that everything takes a bit longer because you're dealing with a larger matrix. But the idea is hopefully it's a better model in the end. And I think this is quite important. This also allows for anisotropic measurement errors for Gaya. So guy will give a full covariance matrix for the astrometry that it provides. That wasn't the case for the DES data that I was working with, but and I'm not sure how LSST will handle that, but this is also another boon for this method. And now just to go over the procedure again. So what we do is we take this DES and Gaia data for all of our bright stars, for each of these exposures. And we calculate the residual field and of course we clip outliers before we do that. There are a lot of outliers in the DES data. Then we calculate psi of x, which is the correlation function. Of the residual field. And we fit this Von Karman model to psi Rong Zai role is just the, the correlation function of the raw data. And this is what I was talking about as the fitted, quote unquote fitted model. And then we execute the GP and we actually do the prediction and we say, okay, how good did that do? And we clip outliers again. And then we use the hyperparameters from this fit to jumpstart the optimizer for actual Gaussian process regression. So here we run an optimizer to minimize this psi resid arc, psi residuals. And essentially what this is is instead of using the built-in Gaussian process regression likelihood function. So instead of minimizing of ACH, which you kind of get for free out of the linear algebra from Gaussian process regression. So they've doing that we minimize what we actually care about, which is this correlated variants at small angular separations. And specifically, we chose fairly arbitrarily to minimize the correlation at separations less than about half an arc minute. And this gets us to our final kernel parameters to use for the model. Now, this is a really, really slow. So you might be familiar with matrix inversion and the fact that you can do better than this. But in general, matrix inversion is an n cubed computational complexity task. Our n in this case is about 8 thousand. So it's a, yeah, it's a 1000 cubed, is quite a large number. And so evaluating the GP is quite computationally intensive. And I think that if you were to do this on a large scale on all 80 thousand DES, exposures, you would really have to, to do this on some sort of super computing resource. But we've managed to only do about 50 to a 100 evaluations of the GP, which in my view is pretty impressive. And after that, you get the model. So we've ran this model on about 300 exposures. So let's look at some of the results. Okay, So here we go. As I say the results, the average of 300 exposure is spread over years, about five years. What you're seeing here is the angle averaged two-point correlation function. Let me draw your attention to the red curve here. This is the raw data. So if you were to just take the, the angle average two-point correlation function of the raw data. This is what you would get. You would see that it has an atmospheric distortion of about a 120 milli arc seconds squared. I'm at 0 lag and you see it has a correlation length of about 5.7 arcminutes. My correlation length, I mean specifically the angular separation at which the correlated variance at 0 lag drops by half. So that's kind of a good measure about the size of these turbulence patches. And, and so this red curve corresponds here to this top left raw diagram here. Raul plot here. So this is with all of the atmospheric turbulence still inside of it. Then if you take the fitted model, then the green here you can see we're actually able to reduce just from this. What I would call a naive approach to the problem. You're actually able to reduce the correlated variants by quite a lot down to about 25, then the arc second squared. So what does that a factor of five reduction in variance. And the correlation length is also reduced by quite a lot to down to 1.34 arcminutes. This is, this is quite good in my opinion. And the next step after this is to actually go back and use the proper optimized Von Karman model with the full Gaussian process regression machinery. And so that's the blue line you see here. So we're able to herbal to reduce the variance again by another factor of about two. So what is that about like 10 million Arctic and square root or there abouts. And so in general, we're able to, or an average over these about 300 exposures. We can reduce this correlated variants at 0 lag by about 12 times on average. This depends on your band, is depends on the particular exposure. But yeah, and in general, about 12 times reduction in variance and also very good reduction and correlation length as well. Okay, let's look at some more results too. So here is a different plot. Each point Q represents an exposure, and on the x-axis we have the original correlation at 0 lag. That's what we're calling psi naught. And so you can see here, this point is a wideband exposure. It has quite large error error bars. And I can talk about why the why bands have such large error bars in this plot later. But you can see here it started at about 300 milli arc seconds squared. And then after applying the GPR algorithm, we are able to reduce it to what's that about 55 million arc second squared. So that's just over a five times reduction. And you can see the no change line. This is essentially just y equals x. And so anything, any exposure that lies on this line in this plot means our GPR algorithm did nothing lies above it. That means it made it worse somehow and you can see it made it worse. But one of these over here, but for the vast majority, were doing better than five times reduction in variance. And for quite a lot we're doing more than 10 times. I believe the medium with some time somewhere around 11 or 12. And with that, with average being about 12. And so I'm going to, so this is this data. I'm going to show you a slightly different version of the same data over here. And of course we can come back to these plots. But instead of looking at the variance, we're looking at the just the RMS turbulence per component. So you can see this point here. It was an I-band exposure and it started at about 10 million arcseconds RMS per component. The star density of this, What's the star density of this exposure field was approximately 1.2 or 1.4 stars per square arcminute. And after putting it through the Gaussian process regression algorithm that we developed were able to reduce its RMS turbulence per component too about what does that 1.5 or so. And so you can see here for the majority were able to reduce the RMS per component by adult too. And many of them more than that. And you can see here just by looking at it, I don't think there's quite enough. Exposures to be able to say adequately that there is a strong dependence on star density. But obviously based off of the idea of the model, you should expect some some dependence on star density, some performance dependence on star density. And we do see that you can see the, the ones up here are reduced less and, and those have lower stellar density to begin with. And this group of red stars over here, or from a specific bunch of exposures that we analyzed because we wanted to fit an orbit to the trans-Neptunian object Eris. And that's what I'm going to go to next. So we're calling us a stringent test. We're calculating the residuals to this trans-Neptunian object called Eris. We're fitting its orbit afterwards as it moves a few degrees across, across the sky over about five years. So DES has more observations than this, but we took only from the RI and Z bands. And we have about 20 observations here. And after fitting an orbit to Eris, we were able to reduce the errors from about ten milli arcseconds RMS per component down to about five. So inline with the results that I showed you before, reducing be reducing the RMS per component by a factor of two. So this is quite interesting. I, we're able to actually actually do something. In a sense, I'm using this solar system object Eris. Okay? So this just about wraps it up. I'll just go over the key takeaways here. So again, for the turbulence induced variance, we're able to reduce from about seven milli arcseconds RMS down to two. This correlation length are about to reduce from or we were able to reduce from about 5.7 arcminutes down to about 1.2 are able to fit an orbit to Eris and get these errors down from ten milli arcseconds to five. We also have some room for improvement. This is where LSST and the Reuben Observatory comes back. As I said, you know, As t is in some ways a successor to DES. And we think that putting this method on the on the Alysa T data could be really quite useful and really quite helpful for the survey. Additionally, something that we didn't implement in this paper that I'm talking about. But something has also proposed in the appendix is a simultaneous solution for the turbulence and the proper motion. So in theory, you can say, okay, these stars are appearing and multiple exposures. Perhaps we can fit a proper motion to it and solve for this turbulence field at the same time. And maybe get better proper motions and better estimates of the turbulence at the same time. We didn't implement that, but that's something that someone else could do. Zoom only. So a German problem wants an octet stop. Yes, proper motions of the stars. So these, um, so yeah, I should explain quickly what proper motions are. Proper motions are the apparent motion or the motion of stars across the sky. And as you take pictures of stars over years and years, you can actually see that stars are actually moving. It's not, It's not much, but you can actually measure it. And to particular interest to many people is proper motions of stars in the Milky Way because this contains lots of information about the dark matter halo around our galaxy. It's very interesting stuff that I don't know a lot about. And what you can do. The idea with this method is that you can say, okay, these stars are moving across multiple exposures. Maybe we can kind of combine these exposures together somehow, knowing that the same stars are in the same exposure. And fifth or the proper motion and the turbulence at the same time. And so that's something that you could in theory do all that we haven't done yet. And the final room for improvement that we highlight is that the Von Karman kernel is not necessarily optimal. So as I said at the start of this talk, Pierre-Francois gets did the same thing. They didn't use the same method as we did. They just did the initial fit. They didn't do the full Gaussian process optimization, which is really quite slow. And so they were able to actually test There's a few thousand exposures. Now, they get similar results to this DES, data. However, it's important to mention that they have a better observing. I believe I'm right in saying that they have a better observing location at the Hyper Supreme Cam and Monica and they also have a larger diameter telescope. So I'm not really, I haven't investigated the exact comparison between our work, but definitely the uncommon kernel is not necessarily optimal. And so thank you very much for listening. That was my talk, please. I'm happy to answer any questions you have. Oops. Didn't actually mean to stop sharing. I thank you all for your your animated animated hands. So well, this is Bill again, I'm very interested in the turbulence modeling that you're doing because I study turbulence and I'm not going to expect to get too deep into that, but it was wondering about the anisotropy. You showed some pictures of correlations which prior to manipulating the data was obviously an isotropic and you talked about that quite a bit. Now, part of my question is, what are those two axes, those vertical and horizontal axes relative to the rotation of the telescope, relative to vertical, Is that what that is? So the separate suit separation axes are how are they defined? Right? So this is just so these are defined by the separation in terms of oh, that's a good question. So what I'm thinking is, I mean, maybe this is obvious, but if you, if you rotate the telescope, say 10 degrees relative to straight up, then there's one direction on the image that isn't rotated, and the other one that is kind of projected onto a plane by 10 degrees is that the dominant source of the anisotropy we see in the upper left there. I mean, how do you pick those two directions? You know, I'm I'm not entirely sure. I can. Yeah. I'm not entirely sure. I'm I'm really not the best person to ask about the atmospheric turbulence. As far as I know. These the separation here are relative to the axis of the telescope focal plane. Yeah. That's that's what that's what I was thinking. And you're assuming that the turbulence lies in planes that are parallel to the ground? Oh, I see what you're saying. Yeah. Yeah. As far as I know, yes, That's that's correct. Yeah. Yeah. Okay. Yeah. And in fact, you know, we we don't make a lot of assumptions about the turbulence except for the power spectrum. An encapsulated in that is the fact, that is the fact that the sinc squared here and this Bessel function factor, we're assuming here that this is, it's being blown across the aperture. And as long as I can think, it's been a few months since I worked on this. Sort of like I think that's the dominant assumption about the AMS for that we make. So I'm sorry, I don't want to dominate the questions, but one more quick one. Well, how are, how your exposure times compared to the correlation times of the turbulent somebody? Are you assuming that? Are they so instantaneous that you're modeling an instantaneous kind of picture of the turbulence or is your integration times spanning kind of another axis? And that is correlation, the correlation in time as well, right? Correlation in space, right, exactly. So we don't look into the correlation in time at all. However, that is something that I forgot to mention. All of this except for this image here. So this is a 30-second exposure on all of the work that we did. We worked on 90-second exposures. Now, I don't know the extent to which the atmosphere, typically the turbulence decode here is across 90 seconds. But yeah, we were working at 90-second exposures. Yeah. I would say that that probably depends on scale. The scale you're looking at. A smaller scale stuff. Clouds. Little feature is on clouds might decorrelate pretty substantial way. It might be seconds, but really big features might be sort of weather pattern, kind of timescales. We're ready to write minutes or hours or something. I'm just, I was just curious about how space and time might separately enter into this problem, right? Exactly. And you know, and that's another reason why we think we might see similar results on LSST because I believe this might be a lie, but I believe the LSST plans for 30-second exposures. And so these are with 90-second exposures with a smaller aperture than proven has. And so we think those effects like roughly cancel out. And so you see similar turbulent effects and hopefully similar results with LSST. Great, thanks a lot. Very interesting talk really. Thanks. Thank you very much. We should I call on you? Please? Will. Yeah. Great. So who did enjoy it that just a quick comments on the previous question. You floss my question. About the angle, your slide. Yes, what we saw in the list that you have one of those plots for every exposure. So you can zoom is data. If those directions, well, what should change? If it does change as somebody that coordinates with the telescope pointing to something like that, right? Yeah. You could zoom back out of the dates if you have infinite time to do the analysis to find that. Absolutely. My my my question That was actually much more mundane. The previous one, you mentioned, you mentioned execution time, high exposures and it sounded like with an n of 8 thousand stars, I presume. Yeah. That's about as much as you did. Roughly how long does it take to my exposure? You see we're looking at Yeah. So it so happens. So it's so it's so each evaluation of the GP, okay, So I was running on the University of Pennsylvania computing cluster, which is not a supercomputing know, but we did have access to, I believe I was using 12 CPUs. Python by default does unlock the gill for certain linear algebra operation. So it was using all of those cores. And for each evaluation of a GP, it would take about three to five minutes. Okay, so for 500, so typically, I think this is probably a little bit inaccurate, is more like 50, it's more like 75 to a 150 evaluations of GP. And so what it would take a few hours for each one to run. I don't know the extent. So number, the Python module number has changed a lot. And I would like to revisit this problem with that. I don't know if I'm going to actually be able to do that, but I would like to revisit this with number because the speed-ups there are insane. And if number supports the linear algebra methods that we use, then you could get a, a huge speed up with this. That's the reason I'm asking is because I'm adventure to see what happens when you run this framework on high density started regions. Regions when you have maybe a five to 10 or more, so many stars per square document that will even, you know, even more than that. I mean, in principle that you would think would give you that's a spatial reservation, but it is in classical era. Yeah, there's kind of a limit. So we did investigate looking at specific subsets of exposures and kind of what we, we didn't really look at this for very long. But there's the idea that you can say like, Okay, let's look at like this patch and then maybe like an overlapping patch and kind of put together and overlapping mosaic. Four regions that are incredibly dense because the thing is a DES, exposure could have a 150 thousand objects in it. And even more for elasticity. So, you know, trying to predict and a 150 thousand stars, totally, totally beyond the limits of anything that most people have access to. So yeah, definitely some sort of tiling thing would be probably the way to go. It might be that the length scale of the features you are trying to correct for maybe such that you didn't have the storm. I believe that maybe there's some way to select just the best measured to the cell to, to, to do as well is you need to get that surplus at the scale you're looking for. But yeah, absolutely. Because if you look at this, you kind of see like, you know what, you know what the contribution to the stars that are so far away as obviously. If you kind of yeah, exactly. You hit the nail on the head there. Federico. Oh, no, sorry. Actually, I just wanted to save. Will actually say, which is, I think you need to use all the styles. You can pick a random sample of your stars in it, in a star dense region. So long as you've covered all of the, all of the length scales that you're interested in. But while I'm at it. So the the utmost theory, the behavior of the atmosphere is actually quite localized, right? So the atmosphere behaves at Blanco would want that SVPs very well. That's why they put the telescope there. So I'm wondering what survey could you use to see what the improvement will be? An aside that has different mosquito coherence length and different lens and different just agenda and generally very different setup. Is there a survey that you can think of that could use use? Oh gosh, I don't know. You know, I, I think it's worthwhile. Whoever's in charge of managing the astrometry pipeline for Alice's Tea, It's worthwhile definitely comparing Pierre-Francois, it gets paper because they use, of course, the hyper sprung camera to that Mike. As far as I know, that is a better observing site. So as far as getting two baselines there than there you go. I don't I suppose. I don't know. I mean, my my impression of this is it's really not very complicated to deploy this on the data. I mean, if you have a lot of data, it gets complicated. But in my view, this can be put to as many surveys as you have. So in a sense of their surveys coming out of like Keck and around the Tucson area, then that will be another area too. But as for a specific location, I don't know if nobody else has another question. This is Bill again, sorry for dominating the questions, but I realized that I had another kind of really elementary question to ask. And that you are using a formalism, data science formalism that's based on jointly normal distributions. That Gaussian distributions with, with, with correlations. Have you actually check that your displacements or whatever your fundamental variables are that you're looking at, are distributed according to Gaussian distribution. Question. I didn't do that. I believe that leg work was done in Bernstein et al 2017. I could be wrong about that, but yeah, I I didn't do that, but I I don't know. Yeah. They've got some processes doesn't require you to have Gaussian distributions. Is that sorry. Yeah. And I'm not sure I heard the question right. But because you're using Gaussian process, that doesn't mean that the underlying distribution of the variable has to be Gaussian. It's the resulting join distribution that ends up being normal. Yeah. Yeah. Right. Okay. But what I mean by the 1 distributions. Or a subset of, a subset of the joint, the normal distribution. So it seems to me that if the displacement, vector, displacements are the things that are, you're, you're calling them the surrogates for the turbulence. It seems to me that those vector of displacements should have, for example, components that are close to a normal distribution in order to assume that that that multipoint distribution? Yes. Oh, okay. I see what you mean. Yes, I have. If if you're asking like, have I plotted the histogram of like the X and Y of these points that had this yet components of those displacements, right? Exactly, yes, No. They do fit of a Gaussian, uh, very well. Yeah. All right. Thanks. You can I ask I don't know if Tina can unmute herself, but he looks yellow. Them interesting question. Thank you. Oh, hi. Well, a very nice talk. Thank you. I was curious about how you approached fitting your model depending on the band that the interest in. So like what the wavelength range was, because the the, the angle of diffraction varies as a function of wavelength. So are you, are you fitting the model independently for each band? What does it, if you change the filters on the telescope or use different have different wavelengths, how to Chairman model change? That's a good question. To answer your last question, we didn't specifically well, okay, So this is, this is the extent of the comparison we did between different bands. Plotting the performance of different exposures in the band. To answer your first question. We don't do anything different for different exposures, but we do by nature of the fact that we have to fit this model for every exposure, we are necessarily getting different parameters for every single exposure in there, for every different band. Although we didn't look to compare the fitted parameters, we didn't try to come up with. So the problem that we're running into was that they didn't really seem to be any, any structure with the parameters that we're actually getting. And that kinda gets back to this problem is that these things don't really seem that physical. The parameters that we were getting should in theory explains something physical sky, but that's not really what we were seeing. So to that, to that effect, we didn't go into more detail about saying, okay, like what about like for the our band and the I band, like what are the parameters look like? We didn't really do that. But you can see here the Y band has higher areas in general, and so It's far infrared. Well, no, it's not technically far infrared, but I mean, it's, it's more infrared. And so it has higher errors and then the RI and Z bands are much more well behaved despite the fact that ions even there also infrared. So I don't know that there is some peculiarity there. I can also talk a little bit about the G band, which isn't pictured here. You would, You might think that the demand is like really good because that's kinda like the center of the center of human vision or whatever. You might expect to see that here. But what we found was that almost universally all of them had definitely worse than five times reduction. And many of them were made worse by the GP. So we don't really understand that. We kind of decided that is someone else's problem to look into why the G bands so weird. So I would just say that if is it reasonable to think that the effects of the turbulence are, we're, we're more significant in your model parameters like the changing and the turbulence between different exposures are more important than the effects of that, then the effect of what wavelength of light it is the same object. So you're seeing that the model parameter, that's why you're not seeing a correlation between the model parameters and the wavelength? I I want to say that, but I don't think I can say that because we really didn't do the any kind of analysis to support that. I guess what I'm trying to say is to tentatively, yes, we didn't really see any correlation there with the bands. But I want to be careful just because we didn't really do that analysis. And the reason why we didn't do lab analysis is because the kernel parameters always looks so messy and disparate. We didn't really have time or the, or the motivation to look into that further. Awesome. Thanks. And I suppose at that point you really wouldn't need the full then the null is that different noise characteristics. What are the different bands? Because they're quite dramatically. While. And there's also, there's also of course like on comparison to be done. I mean like the seeing conditions for every night are of course known. So it might be worthwhile to go back and actually compare the results that we have here to what the observant conditions art and trying to find patterns, but we didn't do that either. Will it tedious question does prevent another question that kids. Do you does your data allow you to explore whether and how the results depends on MHC? Yes. To pass through the right my glancing angle with the one on the right and I'm not sure. Yeah. The answer is no. I think that as I was just saying, I think that that's possible in theory to correlate that with the records of seeing on different days. But based off of the parameters we have here, I don't think so. No. Not just, not just from what we're doing here anyway. But you could, you could see your comments is just this. Plot them against us to see it is on the right. So to the right, exactly. Yeah. Yeah. But that and that kinda gets, I think the reason why we didn't explore that is because the parameter that we're getting, we're really just while we've really don't understand why this is so circular and this is so an isotropic and the parameters are all over the place. It could be. I know PR France, while it gets paper, suggests that the outer scale parameter is actually not important at all. So perhaps we could get rid of that and see more realistic parameters in the other three are the other four. So maybe that's a reason. So perhaps computation on the multi bands. I mean, perhaps it will be worth getting to to understand why you don't see a correlation. Why do you don't see it depends. It depends on the wavelength of the parameters because if you do that can speed up the computation a lot, right? You could say I'm going to optimize fully on my least noisy band and then I'm going to use a prior that comes from that optimization on the other bands. I think would be computationally appointment. Of course, if you actually find the correlation with the wave. The observations and different soups is actually exciting times due to that. I guess, I guess there would have to be, but I think that comes back to the, to the point that was found and Bernstein et al 2017, that there's really the patterns from, I think, I think it has to do with the fact that the turbulence or the exposure times are 90 seconds, which is quite long. And so after an exposure or even after a filter change, it might be too long to really say anything about the turbulence from one exposure to the next. But I mean, I don't think Dr. Bernstein was very optimistic about that, but I don't really know myself. You can use really old school SDSS data. And the observations are simultaneous and the different bands rate, then it will be interesting. But the thing is, they aren't, they drift scan the cross things. So you went across each filter in turn. So they're close together in time, but they're not actually simultaneous. And of course the errors will be so large that it's not going to have any value. Hsa. Well, okay, maybe we'll, we'll stop here cuz it's eight after we got a really great discussion. Thank you again, Willow for, you know, a really great talk. Thank you. Thank you very much for having me. Yeah, I'm going to stop the recording now.
Astronomy seminar - Willow Fortino
From John Gizis May 04, 2021
9 plays
9
0 comments
0
You unliked the media.
Talk title: Reducing Ground-Based Astrometric Errors with Gaia and Gaussian Processes
…Read more
Less…
- Tags
- Department Name
- Physics and Astronomy
- Appears In
Link to Media Page
Loading
Add a comment