### List of articles

- 1 introduce
- 2 White box attack
- 2.1 Biggio
- 2.2 Szegedy's limited-memory BFGS (L-BFGS)
- 2.3 Fast gradient sign method (FGSM)
- 2.4 DeepFool
- 2.5 Jacobian-based saliency map attack (JSMA)
- 2.6 Basic iterative method (BIM) / Projected gradient descent (PGD) attack
- 2.7 Carlini & Wagner′s attack (C&W′s attack)
- 2.8 Ground truth attack
- 2.9 other $l_{p}$ attack
- 2.10 Global attack (universal attack)
- 2.11 Space conversion attack (spatially transformed attack)
- 2.12 Unconstrained countermeasure sample

- 3 Physical world attack
- 4 Black box attack
- 5 Grey box attack
- 6 Poison attack
- reference

# 1 introduce

Compared with other fields , The generation of countermeasure samples in the image field has the following ** advantage **：

1） Real images and false images are intuitive to the observer ;

2） The structure of image data and image classifier is relatively simple .** primary coverage **： Take fully connected network and convolutional neural network as examples , With MNIST、CIFAR10, as well as ImageNet Based on the sample , The research is based on ** Avoid confrontation **, Including white box 、 Black box 、 Gray box , And physical attack image countermeasure sample generation .

# 2 White box attack

The attacker received a message $C$ With the victim sample (*victim sample*) $(x,y)$ after , The goal is to synthesize an image that is perceptually similar to the original image , But false images that may mislead the classifier to give wrong prediction results ：

$findx_{′}Satisfy∥x_{′}−x∥≤ϵ,exampleSuch asC(x_{′})=t =y,(1)$ among $∥⋅∥$ Used to measure $x_{′}$ And $x$ The dissimilarity of , Usually it is $l_{p}$ norm . Next, the main methods under this attack method are introduced .

## 2.1 Biggio

stay MNIST Generate countermeasure samples on the dataset , The attack target is the traditional machine learning classifier , Such as SVM and 3 Layer full connection neural network , And mislead the classifier by optimizing the discriminant function .

for example ** chart 1** in , For linear SVM, Its discriminant function $g(x)=<w,x>+b$. Suppose there is a sample $x$ Be correctly classified as 3. Then for this model ,biggio First generate a new sample $x_{′}$, It minimizes $g(x_{′})$ While maintaining $∥x_{′}−x∥_{1}$ Minimum . If $g(x_{′})<0$,$x_{′}$ Will be misclassified .

## 2.2 Szegedy’s limited-memory BFGS (L-BFGS)

It is applied to neural network for image classification for the first time , It seeks countermeasures by optimizing the following objectives ：

$ min ∥x−x_{′}∥_{2}s.t.C(x_{′})=tandx_{′}∈[0,1]_{m}. (2)$ The problem is solved approximately by introducing the loss function ：

$minλ∥x−x_{′}∥_{2}+L(θ.x_{′},t),s.t.x_{′}∈[0,1]_{m},(3)$ among $λ$ Is a scale parameter . Through adjustment $λ$, You can find one with $x$ Similar enough $x_{′}$, And mislead the classifier $C$.

## 2.3 Fast gradient sign method (FGSM)

Goodfellow Et al. Designed a one-step fast countermeasure sample generation method ：

$ x_{′}=x+ϵsign(∇_{x}L(θ,x,y)),Not a goalx_{′}=x−ϵsign(∇_{x}L(θ,x,t)),The goal ist (4)$ stay ** Target attack ** Under design , The problem can be solved by one-step gradient descent ：

$minL(θ,x_{′},t)s.t.∥x_{′}−x∥_{∞}andx_{′}∈[0,1]_{m}.(5)$ FGSM One reason for this is that it requires only one back propagation , therefore ** It is suitable for generating a large number of confrontation samples **, Its presence ImageNet Applications on, such as ** chart 2**.

## 2.4 DeepFool

Study classifiers $F$ Decision boundaries around data points , Try to find a path that goes beyond the boundaries of decision-making , Such as ** chart 3**, So as to misclassify the sample points $x$. for example , The misjudgment category is 4 The sample of $x_{0}$ To category 3, The decision boundary can be described as $F_{3}={z:F(x)_{4}−F(x)_{3}=0}$. Make $f(x)=F(x)_{4}−F(x)_{3}$, In every attack , It will use Taylor expansion $F_{3}={x:f(x)≈f(x_{0})+<∇_{x}f(x_{0})−(x−x_{0})>=0}$ To linearize the decision hyperplane , And calculate $ω_{0}$ To the hyperplane $F_{3}$ Orthogonal vector of $ω$. vector $ω$ It can be used as a disturbance to make $x_{0}$ Free from hyperplane . By moving $ω$, The algorithm will find that can be classified as 3 A counter sample of $x_{0}$.

DeepFool The experimental results show that , For general DNN Image Classifier , All the test samples are very close to the decision boundary . for example LeNet stay MNIST After training on the dataset , Just a little disturbance , exceed 90% All samples will be misclassified , This surface DNN The classifier is not robust to disturbances .

## 2.5 Jacobian-based saliency map attack (JSMA)

JSMA This paper introduces a scoring function based on calculation $F$ The method of Jacobian matrix , It iteratively operates the pixels that have the greatest impact on the model output , It can be regarded as a greedy attack algorithm .

In particular , The author uses Jacobian matrix $J_{F}(x)=∂x∂F(x) ={∂x_{i}∂F_{j}(x) }_{i×j}$ Come on $F(x)$ Respond to $x$ Change modeling when changing . Under the target attack setting , The attacker tried to misclassify the sample as $t$. therefore ,JSMA Repeatedly search and operate such pixels , Its increase / Reduction will lead to $F_{t}(x)$ increase / Reduce $∑_{j=t}F_{j}(x)$. The final classifier will be in the category $t$ Give up $x$ Higher scores .

## 2.6 Basic iterative method (BIM) / Projected gradient descent (PGD) attack

The method is FGSM The iterative version of , Under non target attack , Will iteratively generate $x_{′}$：

$x_{0}=x;x_{t+1}=Clip_{x,ϵ}(x_{t}+αsign(∇_{x}L(θ,x_{t},y)))(6)$ there $Clip$ Indicates that the received content is projected to $x$ Of $ϵ$ Neighborhood hypersphere $B_{ϵ}(x):{x_{′}:∥x_{′}−x∥_{∞}≤ϵ}$ Function of . step $α$ Usually set to a fairly small value , For example, make each pixel change only one unit at a time , The number of steps is used to ensure that the disturbance can reach the boundary , for example $step=alphaϵ +10$. If $x$ It's randomly initialized , The algorithm can also be called PGD.

BIM Enlightening to the sample $x$ Neighborhood $l_{∞}$ Search for the sample with the largest loss $x_{′}$, Such samples are also called “ The most confrontational ” sample ： When the disturbance intensity is limited , Such samples are the most aggressive , It is most likely to fool the classifier .** Finding such countermeasure samples will help to detect the defects of deep learning model .**

## 2.7 Carlini & Wagner′s attack (C&W′s attack)

C&W′s attack Used against in FGSM and L-BFGS Defense strategy on , The goal is to solve L-BFGS Minimum distortion disturbance defined in . Use the following strategy to approximate ** The formula 2**：

$min∥x−x_{′}∥_{2}+c⋅f(x_{′},t),s.t.x_{′}∈[0,1]_{m},(7)$ among $f(x_{′},t)=(max_{i=t}Z(x_{′})_{i}−Z(x_{′})_{t})_{+}$,$Z(⋅)$ Used to get softmax Network layer input before . By minimizing $f(x_{′},t)$ You can find one in the category $t$ The score is much higher than that of other classes $x_{′}$. Next, use linear search , Will find a place away from $x$ Current $x_{′}$.

function $f(x,y)$ It can be seen as about data $(x,y)$ Loss function of ： You can punish some labels $i$ Score of $Z(x)_{i}>Z(x)_{y}$ The situation of .C&W’s attack And L-BFGS The only difference is that the former uses $f(x,t)$ Instead of the cross entropy of the latter $L(x,t)$. The advantage of this is , When the classifier outputs $C(x_{′})=t$ when , Loss $f(x_{′},t)=0$, The algorithm will directly minimize $x_{′}$ To $x$ Distance of .

The authors claim that their method is one of the strongest attack strategies , It defeated many means of counterattack . therefore , This method can be used as DNN The reference point of safety detection , Or used to evaluate the quality of the countermeasure sample .

## 2.8 Ground truth attack

A tit for tat attack , To break the deadlock ,Carlini And others are trying to find the strongest attack , It is used to find the theoretical minimum distortion countermeasure sample . The attack method is based on an algorithm used to verify the characteristics of neural network , It will model parameters $F$ And data $(x,y)$ Coding is the subject of a class of linear programming systems , And by examining the sample $x$ The neighborhood of $B_{ϵ}(x)$ Whether there is a sample that can mislead the classifier $x_{′}$ To handle the system . By narrowing the neighborhood until it doesn't exist $x_{′}$, Well, due to the last search $x_{′}$ And $x$ There is minimal dissimilarity between , At this time $x_{′}$ It is called ** Basic facts against samples ** (*ground truth adversarial example*).

Ground truth attack It is the first time to seriously the robustness of accurate classifier . However , This method uses ** Module theory of satisfiability ** (*satisfiability modulo theories, SMT*) solver ( A complex algorithm for checking the satisfiability of a series of theories ), This will make it slow and unable to scale to large networks . Then there is work to improve its efficiency .

## 2.9 other $l_{p}$ attack

2.1–2.8 The main focus of the attack is $l_{2}$ or $l_{∞}$ Perturbations under constraints , Here are some other ：

1）One-pixel attack： And L-BFGS The difference is that the constraint uses $l_{0}$, The advantage is that you can limit the number of pixels allowed to change . The job shows , stay CIFAR10 On dataset , Just changing one pixel can make the training good CNN The classifier predicts more than half of the samples ;

2）Elastic-net attack (ENA)： And L-BFGS The difference is to use $l_{1}$ and $l_{2}$ Norm to constrain .

## 2.10 Global attack (universal attack)

2.1–2.9 The method is only for a specific sample $x$ The attack . This attack aims to mislead the classifier's results on all test sets , It attempts to find disturbances that meet the following conditions $δ$：

1）$∥δ∥_{p}≤ϵ$;

2）$R_{x∼D(x)}(C(x+δ) =C(x))≤1−σ$.

In the corresponding experiment , Successfully found a disturbance $δ$, bring ResNet152 The network ILSVRC 2012 On dataset $85.4%$ Our sample was attacked .

## 2.11 Space conversion attack (spatially transformed attack)

** The traditional adversarial attack algorithm directly modifies the pixels in the image , This will change the color intensity of the image **. Spatial transformation attack attacks by adding some spatial disturbances to the image , Translation distortion including local image features 、 rotate , And twist . Such disturbance is enough to avoid manual detection , Can also deceive the classifier , Such as ** chart 4**.

## 2.12 Unconstrained countermeasure sample

2.1–11 All of our work adds unobtrusive disturbances to the image , This work generates some unconstrained confrontation samples ： These samples need not look similar to the victim image , But an image that can fool the classifier and is legal in the eyes of the observer .

To attack the classifier $C$, Enhanced class confrontation generation network (AC-GAN) $G$ First of all, based on $c$ Noise like vector $z_{0}$ Generate a legal sample $x$. Then find one close to $z_{0}$ Noise vector $z$, It makes $G(z)$ Can mislead $C$. because $z$ In potential space with $z_{0}$ be similar , Output $G(z)$ Still have labels $y$, So as to achieve the purpose of attack .

# 3 Physical world attack

chapter 2 All attack methods in are applied in digital form , The attacked party directly provides the input image to the machine learning model . However , In some cases, this is not always the case , For example, using a camera 、 The situation of receiving signals from sensors or other sensors . In this case, still attack these systems by generating physical world confrontation objects ？ Such an attack exists , For example, stick stickers on road signs , This will seriously threaten the sign recognizer of autonomous vehicle . Such antagonistic objects are more destructive to the deep learning model , Because they can directly challenge DNN Many practical applications of , For example, face recognition 、 Autopilot, etc .

## 3.1 Exploration of confrontation samples in the physical world

For example, by checking the generated countermeasure image (FGSM、BIM) In natural transformation ( Such as changing the viewpoint 、 Light, etc ) Check whether “ steady ” To explore the feasibility of making physical confrontation objects . ad locum ,“ robust ” It means that the produced image is still antagonistic after conversion . To apply this transformation , First, print out the carefully made image , The test subjects were asked to take photos of these printouts with their mobile phones . In the process , The shooting angle or lighting environment is not limited , Therefore, the obtained photo is a sample converted from the previously generated countermeasure sample . Experimental results show that , After conversion , A large part of these confrontation samples , In especial FGSM Generated samples , Still against the classifier . These results show that the possibility of physical confrontation object can deceive the sensor in different environments .

## 3.2 Of road signs Eykholt attack

** chart 5** in , Fool the signal recognizer by pasting tape at the appropriate position of the signal sign . The author's attacks include ：

1）$The baseOnl_{1}$ The norm attack is used to roughly locate the disturbed region , Adhesive tape will be affixed behind these areas ;

2） In the rough positioning area , Using a $l_{2}$ The attack of norm generates the color of tape ;

3） Paste the tape of the specified color in the specified area . Such attacks confuse the auto drive system from different angles and distances .

## 3.3 Athaly Of 3D Against

A successful production of Physics 3D The work of the confrontation object is shown in the figure 6 Shown . Author use 3D Print to make antagonistic turtles . In order to achieve the goal , They implemented 3D Rendering technology . A given texture band 3D object , First, optimize the texture of the object , Make rendered images antagonistic from any point of view . In the process , It also ensures that disturbances remain antagonistic in different environments ： Camera distance 、 Light conditions 、 rotate , And the background . Find 3D After rendering the disturbance , They print 3D An instance of an object .

# 4 Black box attack

## 4.1 Model replacement

Attackers can only enter samples $x$ Tag information obtained after $y$ To carry out the attack . Besides , An attacker can have the following information available ：

1） Areas of classified data ;

2） The framework of classifier , for example CNN still RNN.

This work explores the mobility of countermeasure samples ： A sample $x_{′}$ If you can attack the classifier $F_{1}$, Then it can also attack and $F_{1}$ Classifiers with similar structures $F_{2}$. therefore , The author trained a replacement model $F_{′}$ To the victimization model $F$ To simulate , Then attack $F_{′}$ To generate countermeasure samples , Its ** Main steps ** as follows ：

1） Synthetic replacement training data set ： For example, in handwriting recognition task , Attackers can reproduce test samples or other handwritten data ;

2） Training replacement model ： The composite dataset $X$ Enter the victim model to get the tag $Y$, Then based on $(X,Y)$ Training DNN Model $F_{′}$. Attackers will be based on their own knowledge , Select one of the training models with the most similar structure to the victim model $F_{′}$;

3） Data to enhance ： Iterative enhancement $(X,Y)$ Pay equal attention to training $F_{′}$. This process will enhance the diversity of reprographic data and $F_{′}$ The accuracy of the ;

4） Attack replacement model ： Use existing methods such as FGSM To attack $F_{′}$, The generated countermeasure samples will be used to play $F$** You should choose how to attack $F_{′}$**？ A successful replacement model black box attack should be portable , Therefore, we choose attack methods with high mobility, such as FGSM、PGD, And momentum iteration attack .

## 4.2 ZOO： Black box attack based on zero order optimization

This method assumes ** The prediction confidence can be obtained from the classifier **, In this case, there is no need to establish replacement data sets and replacement models .Chen By adjusting $x$ To observe $F(x)$ Change in confidence , In order to get $x$ Relevant gradient information . Such as ** The formula 8** Shown , By introducing a disturbance small enough $h$, We can push the gradient information by outputting information ：

$∂x_{i}∂F(x) ≈2hF(x+he_{i})−F(x−he_{i}) .(8)$ ZOO What is more successful than the replacement model is that more prediction information can be used .

## 4.3 Efficient query black box attack

4.1-2 The method in requires multiple queries of the output information of the model , This is prohibited in some applications . therefore ** Improve the generation efficiency of black box attack countermeasure samples within a limited number of times ** It's necessary . For example, natural evolution strategy is introduced to obtain gradient information efficiently , It is based on $x$ Sample the query results , Then evaluate $F$ The gradient is in $x$ On the expectations of . Besides , They use genetic algorithm to search the neighborhood of the victim image for the countermeasure sample .

# 5 Grey box attack

The strategy of gray box attack , for example , Firstly, a model of interest is trained GAN, Then the countermeasure samples are generated directly based on the countermeasure generation network . The author believes that based on GAN The attack method can speed up the generation of countermeasure samples , And can get more natural and imperceptible images . Then this strategy is also used in the intrusion of face recognition system .

# 6 Poison attack

The existing discussions are carried out after classifier training ,** Poisoning attack generates countermeasure samples before training **： Generate some countermeasure samples and embed them into the training set , So as to reduce the overall accuracy of the classification model or affect the samples of specific categories . Usually , The attacker under this setting has the model structure for training poisoning data . Poisoning attacks are usually used in graph neural networks , This is because it requires specific graph knowledge .

## 6.1 Biggio stay SVM Poisoning attack on

Find such a sample $x_{c}$, After it is mixed with training data , Will lead to learned SVM Model $F_{x_{c}}$ There is a big loss on the validation set . Such an attack method is right for SVM It worked , However, for deep learning , Finding such a sample is difficult .

## 6.2 Koh The model explains

Koh and Liang A neural network interpretation method is introduced ： If the training sample changes , How the prediction results of the model will change ？ When only one training sample is modified , Their model can clearly quantify the change in the final loss , Without retraining the model . By finding the training samples that have a great impact on the model prediction , This work can naturally be used for poisoning attacks .

## 6.3 Poisonous frog (poison frogs)

Poison frog mixes a confrontation image with a real label in the training set , So as to achieve the purpose of wrong prediction test set . Give a tag as $y_{t}$ Target test sample $x_{t}$, The attacker first uses the tag $y_{b}$ Benchmark sample of $x_{b}$, And find... Through the following optimization $x_{′}$：

$x_{′}=xargmin ∥Z(x)−Z(x_{t})∥_{2}+β∥x−x_{b}∥_{2}(9)$ because $x_{′}$ And $x_{b}$ lately , Based on the training set $X_{train}+{x}_{′}$ The training model will put $x_{′}$ Forecast as $y_{b}$. Use the new model to predict $x_{t}$, The optimization goal will be forced closer $x_{t}$ And $x_{′}$ The predicted score of , the $x_{t}$ Forecast as $y_{b}$.

# reference

【1】**Adversarial Attacks and Defenses in Images, Graphs and Text: A Review**

thank