Defenses against Adversarial Attacks
Institute of Technology”,
Abstract: As we know, in recent years with Rapid development in computer science and Hardware Technologies, artificial intelligence, big data, deep-learning has received much attention than ever before in research and industry. Everything is going to digitize nowadays. and data is an important part of it. Due to advances in machine learning and deep learning algorithms, Everyone started using machine learning and deep learning models in their products. so it’s important to make it robust against unexpected input. In traditional software engineering, we have different methods like code coverage, but which are not going to work well here. One approach is to test our software against different test cases. but in a task like an image classification, we can’t generate every test cases. One type of approach which commonly used to attack deep learning models is an adversarial attack. So in this paper, I am going to show different methods for attacks and to make the system robust against these attacks.
so as the number of machine learning and deep learning models increasing real-time scenarios. it’s necessary to make robust against different types of attacks. Except for RBF, all machine learning models are attacked easily .l.It is very easy to fool linear models. I am trying to focus more on Deep neural networks in this paper because it’s become popular in the recent year and more precisely about image attacks. this type of attack first, described in 2013 independently by two authors. one might thinking about why we need this type of safety in our models. most of the time it doesn’t matter but sometimes in security scenarios, we need to make our model more robust against this type of attack. like in a self-driving car, suppose We trained models for different traffic signs detection. if someone made an adversarial example to fool this type of picture then it might be gone wrong if our model is not robust against that. suppose someone replaces Original sign of stop by not parking. then it might be a possibility to happen accident. there are lots of cases like this. I shown one of this type of attack in the figure. so we will discuss more these cases later.
This Image I took from the paper . Here left image is not giving any confidence it as any traffic signs but after adding some noise which is not present in the main image (left) we can see some slight change in look but, thing is that when we pass it to our CNN model it gives high confidence it as stop sign.So here our car going to stop but actually, it’s not what we want.
II Different method to create adversarial examples
The Fast Gradient Sign Method
So in this method, we slightly modify our existing cost function. So we are going to put constraint like instead of change values of input by gradients multiply by learning rate, we just multiply sign off by some low constant and going to add it in weights. So actually we constraint on change of input image by that constant. So our original look of the image is not going to change very much. and at the same time, we modify our input which gave wrong output. So cost function for this method is in figure 2.
this method takes only a few steps to converge input image to adversarial example. when I trained on myself, most of the time it took only about 10 steps. one might have question why this input converge so fast than converging weights. So here simple reason is that input to output in neural networks are piecewise linear while weights to the output is not linear it’s a much more complex so it’s hard to train weights.
but the thing is that this attack is easily defended. but when we spent more computation on generating these attacks. then it will very difficult to defend. I am not going into every method for this type of attacks because there are lots of methods available. so now I am going to show abstractly.
we can classify this type of system into two types. the first type is attacker goes first and the second one is defender goes first. so in first method defender already know which type of attack is going to happen so it will train model such as that it correctly classify adversarial examples. So It’s not interesting because It’s just like data augmentation because in that we also doing same stuff like generating new data from existing one. but in most practical scenarios this method is not going to work because defender doesn’t know about attacker’s strategy. Now, second approach is defender goes first. So here defender is reactive.and this problem is extremely difficult to solve and it’s still unsolved problem.
the defender has to go first when we look at the performance of a defense for adversarial examples it’s important to think clearly about what our goals are for the new model, a lot of the time we see models that increase the error rate on the clean test set at the same time that they decrease the error rate on the adversarial test set. so how should we think about navigating this trade-off, well first off the trade-off is not necessarily fundamental there are actually cases where adversary trained models perform better on the clean test set then the then the original undefended model did but in most of the recent literature we’ve seen that if you try to give strong robustness to adversarial examples you usually lose a little bit of accuracy on the test set the way that I think we should think about this is to consider the composition of the actual test set that the model will encounter when we deploy it usually in the machine learning literature when we talk about the test set we’re referring to clean iid data that comes from the same distribution as the training set that’s probably not what the model is actually going to encounter when it’s deployed so the way that you should actually evaluate your model depends on what you think it will see at deployment time a lot of the time in the adversarial example literature we benchmark on the error rate on adversarial examples that’s the metric that you would care about if you expect your model to encounter an adversary on every single input that it encounters at deployment time that’s probably not realistic instead there will be an adversary present some portion of the time so one thing you might want to do is make a curve like I show here where on the x-axis you gradually increase the proportion of inputs that are adverse early examples rather than clean iid examples on the y axis you plot the accuracy of different models that you’re considering so here I plot the accuracy you have three different models top five accuracy on the image-net dataset the green curve is an undefended baseline model just an inception v3 network and then I consider two different defenses against adversarial examples one of them is the adversarial logic pairing model that we introduced recently it’s currently state of the art and imagenet the other one is the mixed PGD defense that it’s just another one of the bay to be included in our paper when we look at this trade-off curve we see that on the very left the baseline is better because on the clean data it has the best accuracy on the right adversarial logic pairing is the best because it has the highest accuracy on adversarial examples now let’s consider the alternative defense MGPD we see that mpgd is slightly better on the clean data and worse on the adversarial data from that we might think that mpgd navigates a trade-off between clean performance and adversarial performance but actually by making this plot we see that mpg d is not on the top of the trade-off curve at any point going across the the whole the whole sweep of proportion of examples that are adversarial so from this we can see that actually mpgd is not visiting a useful point in the trade-off space we can also see what kind of test set we would need to expect to have before we prefer to use the defense rather than to use the undefended baseline specifically we can see where the green curve intersects with the orange curve that’s where about seven point one percent of the examples are adversarial examples if fewer than this number of examples our adversarial examples then we actually prefer to use the undefended baseline because there just isn’t enough of an adversarial situation to justify this defense that reduces the test error the test accuracy to make the defense more widely usable we can do two different things we can either improve the accuracy on adversarial examples so that we gain more by trading off clean accuracy or we can increase its accuracy on the clean data so that there isn’t as much of a trade-off to be paid in the first place.
IV Defenses Methods:
we’d be trying to defend against what’s happened instead is that the best defenses under this metric are mostly based on directly optimizing the metric itself these adversarial training approaches they essentially directly train the model to do well on specifically the adversarial examples were going to benchmark on and the result is that the defenses don’t generalize beyond this one part particular threat model so for future directions in adversarial example research what I hope all of you will start working on one thing I think is really important is to work on what I call indirect methods if we think of adversarial training as a direct method where we write down the performance under a particular threat model and just optimize it directly that’s setting us up to fail to generalize out of one particular threat model instead I think we should think about what are the flaws and machine learning algorithms that lead to them performing badly in this threat model in the first place and try to address those flaws what are some methods that aren’t specifically designed to address the normal problem that still perform well in that benchmark some of the best methods that we have so far are logit paring where for two different inputs you regularize the logits to be similar in the paper on adversarial logic pairing we actually found that non adversarial logic pairing also reduces the error rate on adversarial examples if you for example pair of the logits of two different clean examples that can actually reduce your adversary rate another technique called label smoothing is where we train the model to output probabilities less than one most of the time with maximum likelihood training we just continuously maximize the probability of the correct class and we never tell the model to stop boosting its confidence even if it gets to the point where it thinks there’s less than a one in ten million chance that the example was labeled wrong label smoothing is where you say well instead of shooting for a probability one why don’t you shoot for a probability point nine and if you start to be more confident than that maybe don’t work on boosting your confidence on this particular example now that seems to result in models that perform better on adversarial examples another technique called logit squeezing is where you just regularize the logits to be small similar to label smoothing you’re asking the model to be less confident and to not extrapolate quite as wildly but these are all relatively weak defenses there are examples of things that are indirect and don’t have the threat model baked into them but they don’t perform nearly as well as adversarial training yet.
so this figure is an objective function for a logit-pairing method which is currently state of the art method for imagenet top 5%.
In the function, the first part is the same as general softmax classifier. the second part represents regularization which tries to minimize variance between original image logits and adversarial example logits.
So in this paper, we see a new problem to software engineering or more preciously software testing. I also described different types of possible attacks like a white box and black box attacks.And how they can create a problem in real life scenarios. So too our come we also see different methods, but most of the methods are direct methods which are solving one type of attack while not working with others.So it’s currently an active area of research how we can find a general method for defense it against adversarial attacks.
 Rogue Signs: Deceiving Traffic Sign Recognition with Malicious Ads and Logos [ arXiv:1801.02780v3 [cs.CR] ]
Adversarial Logit Pairing arXiv:1803.06373 [cs.LG]