Welcome everyone. So this is the data science community hour. And today we're pleased to have Mike, mike on the Lisa. He has his PhD from Electrical and Computer Engineering here at the University of Delaware. I was lucky enough to serve on a committee where he was pushing forward machine learning in the cybersecurity application domain. Looking at both ways to detect cybersecurity attacks, but also how people would then for the machine learning that was using the detection. So that's the second part of that is what he's going to talk to us today about adversarial machine learning. Where essentially you're trying to fool the machine learning. You're an adversary, just like you'd have its cybersecurity. So Mike's got a ton of experience and Professor Jay Scott and also on, on the line and together I think this will be an interesting talk, like always keep this informal. Be sure to put any questions in the chat. Interrupt if you need to unmake and we'll keep going. Thanks, Mike. Thanks Austin. You guys see any questions in the chat? I'm not seeing the chat at the same time, just let me know. So my name is Mike De Lucia from actually worked at the Army Research Labs. And I've been with the Army for almost 15 years now working in computer science and mostly specializing in cybersecurity type solutions. So let's talk a little bit about adversarial machine learning today. It's a little bit more slanted towards Machine Learning and with cybersecurity because that's just, that's my background. But I think it'll still transfer as well. So I'm looking at the problem here is you have adversaries that are trying to seek and circumvent the machine learning algorithms and trying to either cause some type, either classification in general, then you lose trust in the system. It also could be you're trying to do something that's targeted worries. Maybe in the cybersecurity domain, attacker is trying to make the somebody malicious look like it's benign and not be detected. So those attacks would go unnoticed. So on top right hand corner is one of the most famous adversarial machine learning pictures. Where you really have Panda picture. And then some noise is added to that. And it causes the machine learning algorithm to misclassified as you get in there. And so the bottom half is just showing an example in transferring it to the cybersecurity domain. So somebody's types of algorithms that, that had been developed for adversarial machine learning. Jin don't necessarily always translate to the cybersecurity domain. Lot of work that's been out there now. Most of it has been done in the image domain. Some of it's been done in spam classification. There's also been some limited work and now our domain as well. But very little work has been done in the plying adversarial machine learning and look, adding it from the perspective of network security. So the perspective when we're talking about it in the image domain is that normally we have some type of image here and visually represent that as a matrix of pixel intensities. And then so then we take that and we convert it to, and in this case, this is a called a raster scan feature vector where you take the rows of the matrix and then just keep concatenating it. So you just have a 1D vector. And so then if we're doing adversarial machine learning in this case, then we normally need as an input to either have the actual model itself being either the deep neural network or we may have to build a surrogate representation of it as the attack that in order to do these adversarial attacks. And then so what we're gonna do once we do that is then we're going to look at the features to the features of this input image here. And we're going to preserve the image pixels. Very little bets would be unnoticeable to the actual human. And so we end up keep doing that until we get to the point where this image is misclassified as something that we wanted as an attacker to be misclassified. And then some of the constraints in this area is that we want to make sure that we preserve that image. And distortion is kept small enough that we wouldn't notice it as a human. And then so now as we talk about, as we look at from a different perspective, not an image. We have a lot of different features here, which are a bit different than if we're looking at the image gates because these are more metadata, guess they would call them because they're not necessarily all standardized features. Like we have in the image mean where we have pixel intensities here, these are all different types of features that have different contexts and different meaning. They have different scale. Some of them are not necessarily like the continuous domain, so we have to convert them. So sometimes we have to do things like one-hot encoding to them in order to use them in feature, feature vectors. So it's quite different when we're talking about network packets as features. And then again, this is just a different representation at the network level. So instead of looking at it as a network packet, we're looking here as a network flows, there's different types. There's bomb, again, like metadata. They even pull out of these network flows, which is just the connection between two end hosts. So you can pull out different things like the number of bytes, the average number of bytes into arrived well packets on and so on. There's many more. So again, these are something that's not necessarily, they don't all have the same context or meaning. So again, needs, in this domain, you see that much different than if we're doing it in the image domain. So this is why some of the techniques and adversarial machine learning for images don't necessarily transfer directly to network security problems. So now I'm going to talk a little bit about different types of adversarial machine learning attack types. And there's four different types. There's evasion, poisoning, exploratory reverse engineering intrusion in fact. So I'll talk about each of these. So when we talk about the evasion attack, this is when the adversary is trying too perturb the malicious traffic at the time of day. It's often referred to as test-time, but it also can be at inference. So when you put this machine learning algorithm into production and you're trying to actually classify, in this case, we're trying to classify traffic as either being malicious or benign. Then this is when the attacker is trying to butter. In this case here, these, these packets and make it so that it's being misclassified as benign than their attack deeds. That particular detector that point. So this is, this is probably a more of a common attack that there's a lot of papers out there about evasion. And we can also think about a sort of analogous to what we've been doing in computer security for a long time when we're doing anomaly detection, things like that is the attackers always looking for ways to evade it. For now they're a doctor is a little more sophisticated because these algorithms, using machine learning as a little bit harder is not quite as straightforward as doing a traditional attack against the normal, traditional signature-based type of system. So another attack that's a little bit harder to do, something called the poisoning attack. But actually 1 second, let me go back one thing before. So the figure on the bottom, years. So the objective that we're trying to deal with that evasion attack, because we're trying to take this malicious packet here on the top. And we're trying to push that are written about over the boundary onto the side of the benign class so that it's misclassified at that point. So the other one. We're doing poisoning attack is where the attacker is inserting a bunch of mislabeled samples into the training data. And this is going to cause a misclassification because we're going to do is we're ever so slightly moving the boundary so that eventually it changes that decision boundary here in machine learning. So then some of these militia is malicious pieces here. They might end up becoming misclassified as being benign now because they're on the other side of the decision boundary that we've changed by adding in a certain number of mislabeled samples. And there's other ways that we can do this as well. We can do it by transfer learning where we can poison an original model. And then that models, this is poisoned at that point. Then transfer Askew, Any other models that are built off of that. And another thing with poisoning attacks is that it doesn't necessarily have to be, the label could be off. So happening when in the case of network security problems. In order to create these datasets, we need to collect traffic to have for training. So an attacker page says somewhere is that they knew that you're going to be doing traffic collection. And then they could insert small amounts of mislabeled data in there just to cause the misclassification when go to training time. Another example, I would say of this Bibi, even the most recent attack that we've heard about, the solar winds type of attack, where it's supply chain type of attack where you have poisoning, you could have is you have these training datasets. If they're not really safely secured, then perhaps an attacker can be able to insert these poisons samples into there and then pause these misclassifications from that. And so this also occurs during training time. So another type of attack is something called exploratory and reverse engineering attack. Say here what the attacker is doing is we have the network traffic classifier in this case. And what the attacker is gonna do is just keep sending samples to this classifier to output what the labels. So eventually what's going to happen is the attacker is going to build up his own surrogate dataset that will be used or building up Saturday classifiers. So this erudite ossify or represents. This original pacifier. And so then what can be done is so this could be for either the attacker wants to steal intellectual property, which the intellectual property would be that this classifier that some company built up, that the attacker then he takes that by doing this type of attack and is able to reproduce similar type of classifier. Therefore, he's essentially stealing intellectual property. The other type of thing is that you could also take this surrogate classifier. And now the attack that we can use that to do the evasion attacks on that we talked about earlier. So that then they can take that and any adversarial samples they create from running against this area classifier, then they could use that for the evasion attacks against the original pacifier. And so this one obviously is occurs during test time we're inference on corporate. And the other type of attack is something called the Trojan attack, which is very similar to the poisoning attack. We're again, we're manipulating the training data. But in this case, we're adding a, I guess we can call as a trigger something too, that you insert into the sample that when the attacker sends a sample of that type with that particular trigger in it, then it'll cause the misclassifications occurred. So like in this examples here, I have a packet and I have a certain pattern will gets that's in there. So anytime that the attacker sends this malicious traffic to the classifier with this particular pattern, bit, bit, bit pattern. Then it'll cause the classifier to classify it as being benign. And this type of attack also can be used in something called transfer learning, where you build the model. You build that original model or say maybe an attacker builds it, shares it, and then other people build off of that particular model to be something specific to the particular cats. And that since they're building off of model that's already being been poisoned, then that will cause this misclassification to be a lot happening. So again, this is something that's important where you want to be able to really protect the machine learning algorithm that you are being developed. And also you need to have a way to be able to verify if you're using a model that's been built by a third party and building from that, you need to have a way to verify that there's no intrusions in that particular model that I'm using. And this is actually a open research topic that there's many people in the community looking at that now, ways that you can verify these machine learning models to see if they'd been already poisoned. So we're looking at adversarial machine learning, just some terminology. You have your your samples as x and then you have your y, which is your label. And then this builds up your feature vector. And then from that, you can build up obviously your training dataset. And whereas d here is insides of your particular feature vector. So then when we think about terms, when we do talk about adversarial machine learning, generally we talk about the amount of knowledge that an adversary happens. So we think of that in terms of the training data set D. And then we also think about it in terms of the features or feature vector which vx here. Then we also think about it in terms of the actual machine learning algorithm that we're using, ads. And then the weights that parameter tries that particular machine learning algorithm w. So all these combined is different, varying levels of knowledge that the adversary could have. So we'll talk a little bit about some of the knowledge. So there's different types. So again, this is very similar to cyber security where you have different types of adversarial knowledge. So from a white box perspective, that means that the adversary has complete knowledge of the feature space and they know what machine learning algorithms been used. They also have the actual trained model. So that means that they know what all the parameters are and weights are. And they said that, well this point, they have everything that they need in order to conduct an attack. And the one thing that's a little bit different year, I would say, between when we're talking about cyber, we're talking about image recognition is the feature space. Because in the images, again, you have all of the pixels which are pixel intensities. So they all have the contact same contact, same meaning. Whereas in the network space, as we saw, there's so many different contexts reaching the space, each of the features. And there's so many different features that you can handle that the adversary has to know what features that you're using to do the detection on a network type of detector. So that's where the next type of level comes, where you have a gray box. And in this case, you have some limited knowledge. So the first one is the attacker has access to the feature space and the actual machine learning algorithm that's being used. So for example, if they're using a support vector machine in that, then they know what that algorithm is. And they also know what the feature spaces. They don't know. They don't have the dataset that was used for training and they also don't have the parameters that for the actual models. So in other words, a day don't have the actual model that was, that's being used. So in this case, the attacker can then do something where you do the exploratory type of attack and create a surrogate training dataset. So then they would have this training dataset. And they would also create from that surrogate training dataset, they will also create a surrogate classifier. So now that they would have access to the parameters in weights as well. So that's what they can do in the case of a gray box. And the second type of gray box. In this case, the attacker only has access to the the feature space, but doesn't have access to the dataset, the type of machine learning algorithm. So in the case if it's a SVM, They don't know that it's an SVM. They just know there's some machine learning there. And they don't have access to the parameters. So in this case, again, they can build up a surrogate model to get the data set, the weights. And they can also take, and they can build the surrogate model that may not be the same machine learning algorithm. So maybe instead of the SVM, their use deep learning to create this model here. And so it should give us a way to estimate what that original classifier looks like. And they can use that then to try to attack the original model. And there has been a lot of work in that area showing that you can transfer attacks that happened on one type of machine learning to a different type of machine learning algorithm. And the hardest type of with adversarial machine learning is a black-box attack. So in that case, the attacker has access to pretty much, I would say none of these parameters here. So again, in the case of the network, they wouldn't even have access to the feature space, so they wouldn't know what features or that particular algorithm is trained on. So it's just the only thing they can do is try to build up a surrogate dataset. So that's where it also gets a little bit harder because if you don't know the features too, perturb and keep changing, then it's hard to build up that dataset because you don't know how to change it to get different outputs. So there's been quite a bit of work, I'd say in the white box area and some in the gray box. There's also been some in the black box, but again, this is all been mostly in the image domain. So when we are looking at it from a network perspective, then we talk about the black-box case. So here's where it's a little bit different. On the network side is because the Adversary can observe what the inputs are two in this case. So to say We're, the attacker is trying to attack intrusion detection system that's using machine learning, then the attacker can send that particular network traffic to the intrusion detection system. But normally in production, the attacker is not going to see the output label saying that it was flagged as being malicious or being benign. Because that normally goes to a log file where the attacker wouldn't have access to that. So instead, the attacker would need to infer whether they're attack worked or didn't work so that they can take that and label that particular packet as being causing either an alert to happen because it was detected or it was fine and they got through. But the other thing too is as they do this and they tried to build up that dataset. Now there are also tipping their hand because at that point, the intrusion detection system that's detecting and perhaps flags on any of these. Now that alerts the defender that this type of thing is happening. So from an attacker standpoint of view, that's, that's not a good thing. So it's again, a lot harder in the case of a black box. Now there would be another case, that little black box where the attacker, maybe if the intrusion detection system is open source and the attacker gets a copy of that particular appliance and has it in there, say in their lab, then they can go ahead and they can attack it all they want. And then be able to view the logs and figure out and build up the the surrogate dataset and then eventually build up machine learning model and whatnot. In the case of the gray box. So the, again, they can observe the the input traffic and the output labels and then be able to build that surrogate classifier. So actually I'd say in both of these cases with a black box and the gray box, I would assume that the features are standard in and known by the attacker. Or in the case of like if it's open source, they either the there might be reports or there might be public information that shows what the features are that particular algorithms using an or. The attacker may be able to just reverse engineering the software and figure out what those features are if they have access to it. So when we're talking about the adversarial machine learning, we have the attacker model. So given the knowledge theta, we have the initial attack samples D sub c. And so these are the samples that are that we're going to try and perturb to create these adversarial along samples. So then the adversary can manipulate those attack samples here within constraints. So there's constraints that, that will have. Whereas you can, you can only do a certain number of things to the, in the case of the network, you can only do certain things because you can only, it has to bind by the protocols. Or the case where like an image, that constraint might be the distortion amount so that it's not perceptible to the human. So then knowing that so the adversary and manipulate see samples within the constraints and comes up with D sub z prime end. So at that point, then the adversary has a goal that can be defined with this objective function here. And then the optimal strategy is to find the best samples that obeyed that particular machine learning as misclassified at that point. And that would be, these are your optimal attack samples that the attacker would then use to cause misclassification in the model. So again, from a network perspective, you have, the attacker has different objectives. So the misclassification of the adversary network traffic being that's malicious, their goal would be to get it to be misclassified as benign and to evade detection. And so the secondary objective would be maybe instead the attacker wants to cause mistrust in the particular detection or cause it not to be used eventually. So what they would do is to create a large number of false positives. So at that point, the particular human that's using that system then would call it, would add mistrust and that system because it keeps creating false positives. So how would they do that? They would take and perturb the, the normal traffic. And then that causes the false positives so that it would be detected as an attack, but it's really not. And again, some of the constraints on how the features in network domain can be perturbed is it's a little bit more, It's not as straightforward here, it's little more ad hoc because these features are all hand-crafted. So there's a subject matter expert that looks at the attack and determined these are the best handcrafted features that we'll be able to detect this particular attack. So as a result of that, it's a little more ad hoc because each of them have different types of contexts. And then there's also some features that have to be able to satisfy. Mathematical and physical constraints as well. Because some, some of them are averages. You have, like total number of bytes or things like that. So in those cases, they need to satisfy the mathematical constraints. And then some of the physical constraints are, is if you're looking at timing, Well, you can't make something happen in less time if you're far away from something. So in that case, you can't reduce those types of features that are harder to change. And things like you can't make traffic on a, on a network because it's there, you can see it. So it's another physical constraint. And then also the attack is bounded by the network protocol standards. So you can't create a packet that violates the network protocols, otherwise it just won't work. And then the other thing is when we start introducing things like encryption in network security, then the use of that, then what the attacker is trying to do is use something called mimicry, where the attacker would take their malicious traffic and say, make it look like Facebook traffic. So that point, the analyst, the human analyst, can't view the actual payload of that packet because it's encrypted. So they can only go by some of these other features of timing or packet sizes and things like that. So in that case, it's much harder now on, now from a human perspective, you can't really tell the difference. It's hard to tell which is benign and which is malicious because it's not something a human can really easy, easily see some type of pattern like that. So if we look at a comparison between the image recognition and the network submit security domain on the left-hand side, again, as we kinda looked at earlier, where you have the image that's represented by the pixels. And then taking that created a feature vector. And you have the actual classifier if this is the white-box case. And then you perturb the features in the image here. And then eventually you come up with the adversarial sample that misclassified. So in this case it's a three that's misclassified as a seven here. So in the examples here is just show some of these pixels that you changed randomly. And then on the right hand side, with network security, you, as we say, this is a network flow and we have the features. So in this case maybe our features are the average bytes, total bytes and protocol. That would be this feature vector. And then we have the, if it's the white-box case, we have the actual classifier here. And then we take, and we want to perturb these features until it's being misclassified as benign by this classifier. So now the one difference here is that at this point we have a in feature space, we have a feature vector here that's misclassified. But now we need to take that and transform it into an actual network flow. So we may have to add some traffic to, to the network or we may have to you. We may have to change certain things of the packet or split it up into smaller sizes, different types of things in order to be misclassified. So the image recognition doesn't have this step because right here you have the image that the adversarial sample, whereas here it takes a little bit more extra work and I would say probably more difficult to take it from feature space to practical in reality. This is just some there's related work. Bowel the way back in 1999. There was, it wasn't called adversarial machine learning, but it was robust machine learning and the presence of malicious hours. So essentially it was, is there a machine-learning and it just didn't call it that, that point. And there's a lot of work. Bgo has a really good survey on all the different AML work. And again, there's, there's very little of it from the network perspective, but majority of it is image. There are some things that transfer leg you see there's stuff on PDF malware detectors. And then from the network security perspective, there are, there is some work out of Northeastern University that looks at whitebox attacks of AML against cyber analytics, which is just network security basically. And there's also some other work that's not really called adversarial machine learning. But essentially is what it is. Some work where they, they use, the authors use a generative adversarial network in order to take augment a piece of malware and be able to, in the case that the detector here and it was using things like the packet size is it was using timing of inter arrival times. So it took the, the generative adversarial network would take and figure out what is the optimal values for the features. And then the malware would change their features like the packet size, buys, segmenting the packets or different things like that. And then would take that and send it to the to see if it actually bypasses the intrusion detection system. So again, it's not called adversarial machine learning, but it is in practice. These are just some of the references there for that. But I had other things in there. Mike, That's excellent. Survey. I mean, I don't know cyberspace. I know on your committee, so maybe I'm a bit biased. I don't know. Those the habit that introduced to the lingo of cybersecurity. I think it's interesting to see it from a different perspective. And there was some more slides. I'm not sure what's going on here. So one of the things that I've seen, an adversarial machine learning and I know we have a variety of people here is the fact that there are good to, almost to explore. Chads probably going to hit me the topology of the label space, right? And in this case of only having two benign or malicious, did you simplify that down for us to understand? Or is there in fact, lots of different labels that you could assign traffic and cybersecurity. It depends. So some people look at it from the perspective of having different types of accounts. So by means of botnets or different types. So there could be several types. Or you can do the binary classification, which is just either it's malicious or benign. So there's varying. It's not like a standard that anybody does. But I would say actually from experience, the binary classifier works better than the case of trying with a bunch of different labels. Because usually we end up getting less, not as good performance by doing it that way. Thank you. Would it be, if I may, would it be conceivable to have wood having a label that is strictly correct or questionable, mess up things. So if you don't know, if you have benign correct, inaccurate labels and you maybe had a sense of the subset that is definitely correct. But you don't know what may be or may not be accurate bunch in that with the with the knot right ones, would that mess up things? Yeah. Yeah. I would say so. But they're as there's been lots of work where people are looking at. So having that type of class where it's either benign, malicious, or anomaly. So you don't know whether it's one or the other? And that's actually some of the research that that I'm working with some others on is looking at a lake. Should the classifier, if it doesn't have too much confidence in its label, then should be Classifier Output and say, Oh, that's malicious or benign or just should just say, I don't know. And some of the other like protections are looking at from ensemble type views. So where you have an ensemble, but it has to be, it's a consensus ensemble that looks at this particular problem from different perspectives. So I guess the example I could give that net network security, but in biometrics would be if you use retina scan and then use fingerprints. As there's two different modalities to verify that that person. Well, similarly, you can do the same type of thing in cybersecurity where you have different types of features with different contexts that you can look at. The network traffic from different perspectives to get a consensus from that ensemble to say, yeah, this is malicious or not. That makes sense. Can give you some certainty rather than having that gold standard of some subs, right? Interest rate. Yeah, the, the ensemble methods do very well in those aspects. And it's questionable if ever, you should deploy a single decision boundary model array, right, without a good uncertainty quantification. So I have more slides here that I can see. I have one more question, Mike. So when I was working in text mining, we were trying to organize document. Right. So in a way it's like these packets, right? You're trying to organize all the ones that are relevant and not relevant, but as malicious, relevant. And what was important for me is ensuring that the features for defining something relevant, simply features for defining that it was irrelevant. Because I want to kind of positive correlated features. Have people looked into that aspect of it where they constrain saved to the linear SVM, right? Every way is contributing positive to being malicious. Because that affects that interpretation you describe, right, where you kind of the bias at one way or the other. Like it might be more robust if you only allow features add positive correlation to define what is militia loss, I would say the only thing that would be that kind of was looking at was where I did in my dissertation was like looking at the subset of features where you would look at necessarily all of the features by like a smaller subset and then have each classifier and that ensemble having different features. So maybe something similar to that, I guess. Yeah. Yeah. Constraining the sign or the polarity seem, seemed to help make it much more interpretable, right? Rather than a mismatch, this helps in this herd SRE, and then you can end up just having most of the things saying one way or the other. You can really have a classifier that's only positive for one and another and you put them in an ensemble. So fun to talk about sometime when we get a chance like Sure. So I did, so I found the other rest of the slides. So though other thing that I did, so I did this actually this work and my dissertation was I did the, came up with the adversarial machine learning technique to work against network security dataset. In this dataset, we had it was composed then map scanning. So and map is something that attackers used for reconnaissance to determine what ports are open, what services are running on a machine. So what we did was this particular classifier was using three different types of features. So one of them was number of unsuccessful TCP flows and that that's indicative of scanning traffic because you normally would see you wouldn't see a lot of unsuccessful TCP connections because this case when you're doing scanning the ports, oh, there's a bunch of ports on the machine that are not open and you're scanning a bunch of them. So you're going to see a high number of unsuccessful TCP flows. The other one is a UDP traffic. So there's usually less UDP traffic on a machine. There's more TCP traffic because they're XOM, things like web browsers, majority and a lot of the things use TCP opposed to UDP. So you would see more UDP traffic from a scanner. And same thing with ICMP. So these are the three features that this classifier used. And so the idea that I came up with was that you have a benign dataset that has these, each sample has those three features. And then we take a scanning dataset. So, so this those three features are taken over an hour's time of traffic coming to a specific coast. And so that's what this scanning sample would be. And it would have these three those three features. So we look at that and we determine what's the nearest neighbor within the benign data set of the features. So I did it two different ways. One was I looked at the nearest neighbor. And then the way I looked at it is I took the 10 nearest neighbors and then I took the average of the features, determine what my optimal. Adversarial features should look like. So that's how I came up with this. This S star is the, the optimal adversarial sampled IR. And then how would an attacker take that and do that in practice, because we now have a new feature space, is that instead, the attacker would take and add traffic to it because those three features were just percentages. So we can change that if by if we add more successful TCP connections, more UDP traffic. And there's also ways that we can not do ICMP scanning, so we can take that out. But the other two features, we can change easily to make it a much lower percentage and look like it's benign. And so basically we, we try to match and mimicry these types of features here. So by doing that, when I took the data set and split it into for 80 percent training and 20 percent testing. The original classifier was and this is not not usual, but yeah, it was perfect because it was a 100 percent. And by doing that in this case, and actually there's I guess it was down to 40%, 47 percent accuracy. But in this case, the important part here is that these 57 scanning samples here were before the adversarial machine learning. They were predicted as being scanning. By now after I use my technique, all 57 of them are misclassified as being benign. So in this case this was a simpler classifier. But if you had many more features, it will be a little bit harder and complex to do this. And then so the other thing now is going to walk through is a really good torch example up on pi torches website. And it goes through how to generate an adversarial example. And so you actually create the image classifier and then you actually create the image, the actual perturbed images. So just to kinda, kinda, kinda little bit backtracking, maybe a little bit was that. So this is more of a practical example. It's a very simple classifier. And we kind of said, yeah, security and robustness is overlooked. Lot of times when we're designing machine learning algorithms. So we're not saying that machine learning is bad. It's just that there's things that we have to look at to make sure that our algorithms are robust. Because it's a yes. So I know that if, if you say that in a machine learning, that think that we're attacking machine learning, we're not, we're just saying that needs to be a little bit more robust because it had it. Machine-learning definitely has a lot of potential for things to make it better. And the other things is that so by just adding these perturbations and even other small amounts, we can drastically change that performance of the model. So even like what we just saw me, we completely, we basically made that model not even usable because you can't detect anything anymore. So in this case here we're just going to use a simple image classifier and a using the m-nest model as the AML tack. And then it uses something called the fast gradient sign attack. And this is well noon attack. So I think maybe I do have a slide on that. So first we'll talk about the threat model. Like, I don't think we have time to go too far. And I thought you're just one more figure because I think it's okay. Yes. Do you want me to just stop here? I can look these slides up linen. I wondered if you had a nice visualization of that fast grade it. Okay. So I do yeah, I could do that. Yeah. That's a good way that figures worth a thousand words. Whatever that aren't, I will skip to that bigger. The ideal way we have a nice mix of discussion and questions. So that's why I want to keep but they sure. Are you prepared quite a bit. Are you able to see that? Yep. Okay. So this is just the script that will run, will run through. And I'll just do it really fast with this way I can get to the actual images. But so this is all implemented in Py Torch and up on the website, you can download this code. You can. And you can also download the dataset and then be able to run through it on your own too. So essentially that's just the, the model there. So this is the actual attacks can be run. Might take a minute or two. Do you know the trick where you do run all what what yeah. If you go to sell. Yeah. Yeah, There's only one more left. I really enjoy and clicking, but I was just teaching district today because I got exasperated. Look at that, see that it's a one by one. Run this. It should come back with plots, but it might take a couple minutes on it. If you want to maybe ask questions while we're waiting for this. I had a question, but it's very, very high level readily because I'm definitely are an expert in this. But I wonder how much of this scam along the lines of the previous question, right? I wonder how much of what we're learning as we think about adversarial AI for CQ cybersecurity, specifically, how much of that ends up being a legacy that we can apply in different AI problems, such as difficult classification problems where we don't have in a deliberate attempt to derail as witches, life is hard and we don't know what we're doing. I would say not all of it necessarily transfers because C is most like different modalities are usually different with the tax. So I think you'll find most of it. It seems to be like very specific to the modality. Yeah, that's my impression. Do you think that is possible? Did that is because the field is still relatively new. And so it's they case-based like well out of it. Now, I don't think so. Because I'd say from the image domain that's been around for awhile, low, I might say while, but at least for the last four or five years, I would say let the baby. But I realized that criticized well spouse but it's still like infancy. But I would say in terms of like AI and cybersecurity, that's probably even more of an infancy. And then in like an image domain is much more established there yet. So I don't know if it would really transfer. I see Cesar. I don't know if he's on the call anymore. So Caesar was working on this for his master's thesis. And one thing he found is he applied kinda uncertainty quantification to reviews. So Yelp and everybody else wants to automatically characterize sentiment and these sorts of things. So I do think or undervalue yourself like a little bit. I think some of the tricks that you're noticing for traffic will definitely go to that sort of structured cases like tax, right? Where you're identifying certain strings or packets and saying, Oh, usually when you see this, it's that right? But if anyone uses kind of contradictory or satire or things like that, it's going to be a similar thing, right? Where you read it at a shallow level and you think one thing but then you really look at it deeper and realize, oh, that's actually the opposite label that I expected, Right? It's interesting that you say an uncertainty quantification because one of my colleagues actually that's, that says his niche thing. That's what he's worked on for a long time. And were thought about some future research to apply some of the uncertainty quantification to adversarial machine learning because that I think that would definitely help with trying to detect some of these attacks. To show how confident you are, how confident the model is in its decision. It's also possibly a good way for the psychologists to probe him and. All right, now let's do you have to change the text to somebody actually changes their opinion on something. That's a little bit too philosophical side. It's still running. But I mean, from the basic here you can see that so the epsilon is the amount of perturbation here. And so the first one is just no perturbation at all. So here you see that you have 98% accuracy. And then as you keep adding more and more perturbation, you see the accuracy goes down and down until eventually you get down to 20%. I think there might be like one more in there, but yeah, if it populates at some point, it should show some of the pictures and you'll see how like there, some of them are really distort it. It's like when you get down around, it's a probably 0.15 or point to the pictures really start to be distorted and noticeable to the naked eye that there's just something wrong with this picture. Any other questions? It didn't show that stop sign of an example. So that's good. So this is just basically telling, showing you the accuracy and how, when you have higher amounts of perturbation, how the accuracy just goes down. But here's the really nice picture. So you can see at the top, this is no perturbation at all. And then as you keep going down more and more, you really see that probably would say probably even around 0.2 or so. You kinda see editors, it's noticeable, and when you get down in the bottom, it's really noticeable. So these types of things obviously are not something that's good for adversarial machine learning because you can notice it says a human that there's something wrong there. So you probably said that and I showed up late, so it's probably my bad. But did you, did you discuss how the strategy for perturbation would impact this? Say, you know, I'm, I'm a physicist. So like the only real kind of perturbation that I think about is augmenting the noise in a Gaussian curve. So probably that doesn't have a lot of impact even if it's a loud noise, right? So this, so there are different types. This specific one is use something called fast gradient. And it uses, it takes advantage of the way the neural networks learn by the gradients. And instead of taking it to modify the weights of the neural network, and instead takes, uses that to the gradient to be able to modify the input values. So it's just modifying pixels based on that. Please ask me then, and I add it. I think it also involves just the polarity of the side. So it's like, I don't know what JPEGs and physics, but it's like rotten locker noise, adding a small positive or negative value to each. So, so that's kind of, I get that like clouding, right, because it's either subtracting a small, now, we're adding a small amount. Each one are leaving at the same, but it's patchy, right? So that is correlated by the Pj. It seems that's probably helpful it right? So yeah, if you were to break that correlation, it's unlikely that you'd get it to perform so poorly, right? It's, it's very, very are different types of other different types of attacks. This is just, this is one of the first type is the tax that had been developed by Ian Goodfellow from Google. And there's been many more that have been developed since then. What I will I'll stop the recording now so we can have more informal discussion. Thank you so much like that was really great. And like we're going to post these on the web. So it'll live in posterity.
Data Science Community Hour (March 18th, 2021): Adversarial Machine Learning, Dr. Michael De Lucia,
From Austin Brockmeier March 18, 2021
10 plays
10
0 comments
0
You unliked the media.
“Introduction to Adversarial Machine Learning” - Michael De Lucia, Army Research Lab (ARL)
We will introduce the concept of adversarial machine learning, a technique where carefully constructed or perturbed data instances, observations, or training data can be used to cause a machine learning model, such as an image classifier, to make widely inaccurate predictions.
Michael De Lucia (Ph.D. ECE, UD, ’20; MS & BS CS, NJIT, ‘05/’06, CSSA) is a Computer Scientist atthe U.S. Army’s Applied Research Laboratory (ARL), Aberdeen Proving Ground, MD, where he
researches computer and network security, and machine learning for network security applications. Dr. De Lucia is Adjunct Professor in ECE at UD, and was an NJIT Adjunct Instructor 2006-2011.
- Tags
- Appears In
Link to Media Page
Loading
Add a comment