Counter attack and defense (1): counter sample generation in image domain

Inge2022-06-23 18:03:41

1 introduce

Compared with other fields , The generation of countermeasure samples in the image field has the following advantage
1) Real images and false images are intuitive to the observer ;
2) The structure of image data and image classifier is relatively simple .
primary coverage : Take fully connected network and convolutional neural network as examples , With MNIST、CIFAR10, as well as ImageNet Based on the sample , The research is based on Avoid confrontation , Including white box 、 Black box 、 Gray box , And physical attack image countermeasure sample generation .

2 White box attack

The attacker received a message C C C With the victim sample (victim sample) ( x , y ) (x,y) (x,y) after , The goal is to synthesize an image that is perceptually similar to the original image , But false images that may mislead the classifier to give wrong prediction results :
find x ′ Satisfy ∥ x ′ − x ∥ ≤ ϵ ,   example Such as C ( x ′ ) = t ≠ y , (1) \tag{1} \text{ find }x'\text{ Satisfy }\|x'-x\|\leq\epsilon,\ for example C(x')=t\neq y, find x Satisfy xxϵ,  example Such as C(x)=t=y,(1) among ∥ ⋅ ∥ \|\cdot\| Used to measure x ′ x' x And x x x The dissimilarity of , Usually it is l p l_p lp norm . Next, the main methods under this attack method are introduced .

2.1 Biggio

stay MNIST Generate countermeasure samples on the dataset , The attack target is the traditional machine learning classifier , Such as SVM and 3 Layer full connection neural network , And mislead the classifier by optimizing the discriminant function .
for example chart 1 in , For linear SVM, Its discriminant function g ( x ) = < w , x > + b g(x)=<w,x>+b g(x)=<w,x>+b. Suppose there is a sample x x x Be correctly classified as 3. Then for this model ,biggio First generate a new sample x ′ x' x, It minimizes g ( x ′ ) g(x') g(x) While maintaining ∥ x ′ − x ∥ 1 \|x'-x\|_1 xx1 Minimum . If g ( x ′ ) < 0 g(x')<0 g(x)<0, x ′ x' x Will be misclassified .

chart 1:Biggio Attack in SVM The diagram on the classifier

2.2 Szegedy’s limited-memory BFGS (L-BFGS)

It is applied to neural network for image classification for the first time , It seeks countermeasures by optimizing the following objectives :
min ⁡ ∥ x − x ′ ∥ 2 2 s.t. C ( x ′ ) = t  and  x ′ ∈ [ 0 , 1 ] m . (2) \tag{2} \begin{array}{l} & \min &\|x-x'\|_2^2\qquad \text{s.t.} C(x') = t\ \text{and }x'\in[0,1]^m. \end{array} minxx22s.t.C(x)=t and x[0,1]m.(2) The problem is solved approximately by introducing the loss function :
min ⁡   λ ∥ x − x ′ ∥ 2 2 + L ( θ . x ′ , t ) , s.t.  x ′ ∈ [ 0 , 1 ] m , (3) \tag{3} \min\ \lambda\|x-x'\|_2^2+\mathcal{L}(\theta.x',t), \qquad\text{s.t. }x'\in[0,1]^m, min λxx22+L(θ.x,t),s.t. x[0,1]m,(3) among λ \lambda λ Is a scale parameter . Through adjustment λ \lambda λ, You can find one with x x x Similar enough x ′ x' x, And mislead the classifier C C C.

2.3 Fast gradient sign method (FGSM)

Goodfellow Et al. Designed a one-step fast countermeasure sample generation method :
x ′ = x + ϵ  sign ( ∇ x L ( θ , x , y ) ) , Not a goal x ′ = x − ϵ  sign ( ∇ x L ( θ , x , t ) ) , The goal is t (4) \tag{4} \begin{aligned} &x'=x+\epsilon\text{ sign}(\nabla_x\mathcal{L}(\theta,x,y)),\qquad\text{ Not a goal }\\ &x'=x-\epsilon\text{ sign}(\nabla_x\mathcal{L}(\theta,x,t)),\qquad\text{ The goal is }t \end{aligned} x=x+ϵ sign(xL(θ,x,y)), Not a goal x=xϵ sign(xL(θ,x,t)), The goal is t(4) stay Target attack Under design , The problem can be solved by one-step gradient descent :
min ⁡ L ( θ , x ′ , t ) s.t.  ∥ x ′ − x ∥ ∞  and  x ′ ∈ [ 0 , 1 ] m . (5) \tag{5} \min\mathcal{L}(\theta,x',t)\qquad\text{s.t. }\|x'-x\|_\infty\text{ and }x'\in[0,1]^m. minL(θ,x,t)s.t. xx and x[0,1]m.(5) FGSM One reason for this is that it requires only one back propagation , therefore It is suitable for generating a large number of confrontation samples , Its presence ImageNet Applications on, such as chart 2.

chart 2: Just a few perturbations , Panda map will be misjudged

2.4 DeepFool

Study classifiers F F F Decision boundaries around data points , Try to find a path that goes beyond the boundaries of decision-making , Such as chart 3, So as to misclassify the sample points x x x. for example , The misjudgment category is 4 The sample of x 0 x_0 x0 To category 3, The decision boundary can be described as F 3 = { z : F ( x ) 4 − F ( x ) 3 = 0 } \mathcal{F}_3=\{ z:F(x)_4 - F(x)_3 = 0 \} F3={ z:F(x)4F(x)3=0}. Make f ( x ) = F ( x ) 4 − F ( x ) 3 f(x)=F(x)_4 - F(x)_3 f(x)=F(x)4F(x)3, In every attack , It will use Taylor expansion F 3 ′ = { x : f ( x ) ≈ f ( x 0 ) + < ∇ x f ( x 0 ) − ( x − x 0 ) > = 0 } \mathcal{F}_3'=\{ x:f(x)\approx f(x_0) + < \nabla_xf(x_0)-(x-x_0)>=0 \} F3={ x:f(x)f(x0)+<xf(x0)(xx0)>=0} To linearize the decision hyperplane , And calculate ω 0 \omega_0 ω0 To the hyperplane F 3 ′ \mathcal{F}_3' F3 Orthogonal vector of ω \omega ω. vector ω \omega ω It can be used as a disturbance to make x 0 x_0 x0 Free from hyperplane . By moving ω \omega ω, The algorithm will find that can be classified as 3 A counter sample of x 0 ′ x_0' x0.

chart 3: Decision boundaries

DeepFool The experimental results show that , For general DNN Image Classifier , All the test samples are very close to the decision boundary . for example LeNet stay MNIST After training on the dataset , Just a little disturbance , exceed 90% All samples will be misclassified , This surface DNN The classifier is not robust to disturbances .

2.5 Jacobian-based saliency map attack (JSMA)

JSMA This paper introduces a scoring function based on calculation F F F The method of Jacobian matrix , It iteratively operates the pixels that have the greatest impact on the model output , It can be regarded as a greedy attack algorithm .
In particular , The author uses Jacobian matrix J F ( x ) = ∂ F ( x ) ∂ x = { ∂ F j ( x ) ∂ x i } i × j \mathcal{J}_F(x)=\frac{\partial F(x)}{\partial x}=\left\{ \frac{\partial F_j(x)}{\partial x_i} \right\}_{i\times j} JF(x)=xF(x)={ xiFj(x)}i×j Come on F ( x ) F(x) F(x) Respond to x x x Change modeling when changing . Under the target attack setting , The attacker tried to misclassify the sample as t t t. therefore ,JSMA Repeatedly search and operate such pixels , Its increase / Reduction will lead to F t ( x ) F_t(x) Ft(x) increase / Reduce ∑ j ≠ t F j ( x ) \sum_{j\neq t} F_j(x) j=tFj(x). The final classifier will be in the category t t t Give up x x x Higher scores .

2.6 Basic iterative method (BIM) / Projected gradient descent (PGD) attack

The method is FGSM The iterative version of , Under non target attack , Will iteratively generate x ′ x' x
x 0 = x ; x t + 1 = C l i p x , ϵ ( x t + α  sign ( ∇ x L ( θ , x t , y ) ) ) (6) \tag{6} x_0=x; x^{t+1}=Clip_{x,\epsilon}(x^t+\alpha\text{ sign}(\nabla_x\mathcal{L}(\theta,x^t,y))) x0=x;xt+1=Clipx,ϵ(xt+α sign(xL(θ,xt,y)))(6) there C l i p Clip Clip Indicates that the received content is projected to x x x Of ϵ \epsilon ϵ Neighborhood hypersphere B ϵ ( x ) : { x ′ : ∥ x ′ − x ∥ ∞ ≤ ϵ } B_\epsilon(x):\{ x':\|x'-x\|_\infty\leq \epsilon \} Bϵ(x):{ x:xxϵ} Function of . step α \alpha α Usually set to a fairly small value , For example, make each pixel change only one unit at a time , The number of steps is used to ensure that the disturbance can reach the boundary , for example s t e p = ϵ a l p h a + 10 step=\frac{\epsilon}{alpha}+10 step=alphaϵ+10. If x x x It's randomly initialized , The algorithm can also be called PGD.
BIM Enlightening to the sample x x x Neighborhood l ∞ l_\infty l Search for the sample with the largest loss x ′ x' x, Such samples are also called “ The most confrontational ” sample : When the disturbance intensity is limited , Such samples are the most aggressive , It is most likely to fool the classifier . Finding such countermeasure samples will help to detect the defects of deep learning model .

2.7 Carlini & Wagner′s attack (C&W′s attack)

C&W′s attack Used against in FGSM and L-BFGS Defense strategy on , The goal is to solve L-BFGS Minimum distortion disturbance defined in . Use the following strategy to approximate The formula 2
min ⁡ ∥ x − x ′ ∥ 2 2 + c ⋅ f ( x ′ , t ) , s.t.  x ′ ∈ [ 0 , 1 ] m , (7) \tag{7} \min \|x-x'\|_2^2+c\cdot f(x',t),\qquad\text{s.t. }x'\in[0,1]^m, minxx22+cf(x,t),s.t. x[0,1]m,(7) among f ( x ′ , t ) = ( max ⁡ i = t Z ( x ′ ) i − Z ( x ′ ) t ) + f(x',t)=(\max_{i=t}Z(x')_i-Z(x')_t)^+ f(x,t)=(maxi=tZ(x)iZ(x)t)+, Z ( ⋅ ) Z(\cdot) Z() Used to get softmax Network layer input before . By minimizing f ( x ′ , t ) f(x',t) f(x,t) You can find one in the category t t t The score is much higher than that of other classes x ′ x' x. Next, use linear search , Will find a place away from x x x Current x ′ x' x.
function f ( x , y ) f(x,y) f(x,y) It can be seen as about data ( x , y ) (x,y) (x,y) Loss function of : You can punish some labels i i i Score of Z ( x ) i > Z ( x ) y Z(x)_i>Z(x)_y Z(x)i>Z(x)y The situation of .C&W’s attack And L-BFGS The only difference is that the former uses f ( x , t ) f(x,t) f(x,t) Instead of the cross entropy of the latter L ( x , t ) \mathcal{L}(x,t) L(x,t). The advantage of this is , When the classifier outputs C ( x ′ ) = t C(x')=t C(x)=t when , Loss f ( x ′ , t ) = 0 f(x',t)=0 f(x,t)=0, The algorithm will directly minimize x ′ x' x To x x x Distance of .
The authors claim that their method is one of the strongest attack strategies , It defeated many means of counterattack . therefore , This method can be used as DNN The reference point of safety detection , Or used to evaluate the quality of the countermeasure sample .

2.8 Ground truth attack

A tit for tat attack , To break the deadlock ,Carlini And others are trying to find the strongest attack , It is used to find the theoretical minimum distortion countermeasure sample . The attack method is based on an algorithm used to verify the characteristics of neural network , It will model parameters F F F And data ( x , y ) (x,y) (x,y) Coding is the subject of a class of linear programming systems , And by examining the sample x x x The neighborhood of B ϵ ( x ) B_\epsilon(x) Bϵ(x) Whether there is a sample that can mislead the classifier x ′ x' x To handle the system . By narrowing the neighborhood until it doesn't exist x ′ x' x, Well, due to the last search x ′ x' x And x x x There is minimal dissimilarity between , At this time x ′ x' x It is called Basic facts against samples (ground truth adversarial example).
Ground truth attack It is the first time to seriously the robustness of accurate classifier . However , This method uses Module theory of satisfiability (satisfiability modulo theories, SMT) solver ( A complex algorithm for checking the satisfiability of a series of theories ), This will make it slow and unable to scale to large networks . Then there is work to improve its efficiency .

2.9 other l p l_p lp attack

2.1–2.8 The main focus of the attack is l 2 l_2 l2 or l ∞ l_\infty l Perturbations under constraints , Here are some other :
1)One-pixel attack: And L-BFGS The difference is that the constraint uses l 0 l_0 l0, The advantage is that you can limit the number of pixels allowed to change . The job shows , stay CIFAR10 On dataset , Just changing one pixel can make the training good CNN The classifier predicts more than half of the samples ;
2)Elastic-net attack (ENA): And L-BFGS The difference is to use l 1 l_1 l1 and l 2 l_2 l2 Norm to constrain .

2.10 Global attack (universal attack)

2.1–2.9 The method is only for a specific sample x x x The attack . This attack aims to mislead the classifier's results on all test sets , It attempts to find disturbances that meet the following conditions δ \delta δ
1) ∥ δ ∥ p ≤ ϵ \|\delta\|_p\leq\epsilon δpϵ;
2) R x ∼ D ( x ) ( C ( x + δ ) ≠ C ( x ) ) ≤ 1 − σ \mathbb{R}_{x\sim D(x)}(C(x+\delta)\neq C(x))\leq1-\sigma RxD(x)(C(x+δ)=C(x))1σ.
In the corresponding experiment , Successfully found a disturbance δ \delta δ, bring ResNet152 The network ILSVRC 2012 On dataset 85.4 % 85.4\% 85.4% Our sample was attacked .

2.11 Space conversion attack (spatially transformed attack)

The traditional adversarial attack algorithm directly modifies the pixels in the image , This will change the color intensity of the image . Spatial transformation attack attacks by adding some spatial disturbances to the image , Translation distortion including local image features 、 rotate , And twist . Such disturbance is enough to avoid manual detection , Can also deceive the classifier , Such as chart 4.

chart 4: Space conversion attack

2.12 Unconstrained countermeasure sample

2.1–11 All of our work adds unobtrusive disturbances to the image , This work generates some unconstrained confrontation samples : These samples need not look similar to the victim image , But an image that can fool the classifier and is legal in the eyes of the observer .
To attack the classifier C C C, Enhanced class confrontation generation network (AC-GAN) G \mathcal{G} G First of all, based on c c c Noise like vector z 0 z^0 z0 Generate a legal sample x x x. Then find one close to z 0 z^0 z0 Noise vector z z z, It makes G ( z ) \mathcal{G}(z) G(z) Can mislead C C C. because z z z In potential space with z 0 z^0 z0 be similar , Output G ( z ) \mathcal{G}(z) G(z) Still have labels y y y, So as to achieve the purpose of attack .

3 Physical world attack

chapter 2 All attack methods in are applied in digital form , The attacked party directly provides the input image to the machine learning model . However , In some cases, this is not always the case , For example, using a camera 、 The situation of receiving signals from sensors or other sensors . In this case, still attack these systems by generating physical world confrontation objects ? Such an attack exists , For example, stick stickers on road signs , This will seriously threaten the sign recognizer of autonomous vehicle . Such antagonistic objects are more destructive to the deep learning model , Because they can directly challenge DNN Many practical applications of , For example, face recognition 、 Autopilot, etc .

3.1 Exploration of confrontation samples in the physical world

For example, by checking the generated countermeasure image (FGSM、BIM) In natural transformation ( Such as changing the viewpoint 、 Light, etc ) Check whether “ steady ” To explore the feasibility of making physical confrontation objects . ad locum ,“ robust ” It means that the produced image is still antagonistic after conversion . To apply this transformation , First, print out the carefully made image , The test subjects were asked to take photos of these printouts with their mobile phones . In the process , The shooting angle or lighting environment is not limited , Therefore, the obtained photo is a sample converted from the previously generated countermeasure sample . Experimental results show that , After conversion , A large part of these confrontation samples , In especial FGSM Generated samples , Still against the classifier . These results show that the possibility of physical confrontation object can deceive the sensor in different environments .

3.2 Of road signs Eykholt attack

chart 5 in , Fool the signal recognizer by pasting tape at the appropriate position of the signal sign . The author's attacks include :
1) The base On l 1 be based on l_1 The base On l1 The norm attack is used to roughly locate the disturbed region , Adhesive tape will be affixed behind these areas ;
2) In the rough positioning area , Using a l 2 l_2 l2 The attack of norm generates the color of tape ;
3) Paste the tape of the specified color in the specified area . Such attacks confuse the auto drive system from different angles and distances .

chart 5: Stick adhesive tape on traffic signal signs

3.3 Athaly Of 3D Against

A successful production of Physics 3D The work of the confrontation object is shown in the figure 6 Shown . Author use 3D Print to make antagonistic turtles . In order to achieve the goal , They implemented 3D Rendering technology . A given texture band 3D object , First, optimize the texture of the object , Make rendered images antagonistic from any point of view . In the process , It also ensures that disturbances remain antagonistic in different environments : Camera distance 、 Light conditions 、 rotate , And the background . Find 3D After rendering the disturbance , They print 3D An instance of an object .

chart 6:3D Against

4 Black box attack

4.1 Model replacement

Attackers can only enter samples x x x Tag information obtained after y y y To carry out the attack . Besides , An attacker can have the following information available :
1) Areas of classified data ;
2) The framework of classifier , for example CNN still RNN.
This work explores the mobility of countermeasure samples : A sample x ′ x' x If you can attack the classifier F 1 F_1 F1, Then it can also attack and F 1 F_1 F1 Classifiers with similar structures F 2 F_2 F2. therefore , The author trained a replacement model F ′ F' F To the victimization model F F F To simulate , Then attack F ′ F' F To generate countermeasure samples , Its Main steps as follows :
1) Synthetic replacement training data set : For example, in handwriting recognition task , Attackers can reproduce test samples or other handwritten data ;
2) Training replacement model : The composite dataset X X X Enter the victim model to get the tag Y Y Y, Then based on ( X , Y ) (X,Y) (X,Y) Training DNN Model F ′ F' F. Attackers will be based on their own knowledge , Select one of the training models with the most similar structure to the victim model F ′ F' F;
3) Data to enhance : Iterative enhancement ( X , Y ) (X,Y) (X,Y) Pay equal attention to training F ′ F' F. This process will enhance the diversity of reprographic data and F ′ F' F The accuracy of the ;
4) Attack replacement model : Use existing methods such as FGSM To attack F ′ F' F, The generated countermeasure samples will be used to play F F F
You should choose how to attack F ′ F' F? A successful replacement model black box attack should be portable , Therefore, we choose attack methods with high mobility, such as FGSM、PGD, And momentum iteration attack .

4.2 ZOO: Black box attack based on zero order optimization

This method assumes The prediction confidence can be obtained from the classifier , In this case, there is no need to establish replacement data sets and replacement models .Chen By adjusting x x x To observe F ( x ) F(x) F(x) Change in confidence , In order to get x x x Relevant gradient information . Such as The formula 8 Shown , By introducing a disturbance small enough h h h, We can push the gradient information by outputting information :
∂ F ( x ) ∂ x i ≈ F ( x + h e i ) − F ( x − h e i ) 2 h . (8) \tag{8} \frac{\partial F(x)}{\partial x_i}\approx\frac{F(x+he_i)-F(x-he_i)}{2h}. xiF(x)2hF(x+hei)F(xhei).(8) ZOO What is more successful than the replacement model is that more prediction information can be used .

4.3 Efficient query black box attack

4.1-2 The method in requires multiple queries of the output information of the model , This is prohibited in some applications . therefore Improve the generation efficiency of black box attack countermeasure samples within a limited number of times It's necessary . For example, natural evolution strategy is introduced to obtain gradient information efficiently , It is based on x x x Sample the query results , Then evaluate F F F The gradient is in x x x On the expectations of . Besides , They use genetic algorithm to search the neighborhood of the victim image for the countermeasure sample .

5 Grey box attack

The strategy of gray box attack , for example , Firstly, a model of interest is trained GAN, Then the countermeasure samples are generated directly based on the countermeasure generation network . The author believes that based on GAN The attack method can speed up the generation of countermeasure samples , And can get more natural and imperceptible images . Then this strategy is also used in the intrusion of face recognition system .

6 Poison attack

The existing discussions are carried out after classifier training , Poisoning attack generates countermeasure samples before training : Generate some countermeasure samples and embed them into the training set , So as to reduce the overall accuracy of the classification model or affect the samples of specific categories . Usually , The attacker under this setting has the model structure for training poisoning data . Poisoning attacks are usually used in graph neural networks , This is because it requires specific graph knowledge .

6.1 Biggio stay SVM Poisoning attack on

Find such a sample x c x_c xc, After it is mixed with training data , Will lead to learned SVM Model F x c F_{x_c} Fxc There is a big loss on the validation set . Such an attack method is right for SVM It worked , However, for deep learning , Finding such a sample is difficult .

6.2 Koh The model explains

Koh and Liang A neural network interpretation method is introduced : If the training sample changes , How the prediction results of the model will change ? When only one training sample is modified , Their model can clearly quantify the change in the final loss , Without retraining the model . By finding the training samples that have a great impact on the model prediction , This work can naturally be used for poisoning attacks .

6.3 Poisonous frog (poison frogs)

Poison frog mixes a confrontation image with a real label in the training set , So as to achieve the purpose of wrong prediction test set . Give a tag as y t y_t yt Target test sample x t x_t xt, The attacker first uses the tag y b y_b yb Benchmark sample of x b x_b xb, And find... Through the following optimization x ′ x' x
x ′ = arg min ⁡ x ∥ Z ( x ) − Z ( x t ) ∥ 2 2 + β ∥ x − x b ∥ 2 2 (9) \tag{9} x'=\argmin_x\|Z(x)-Z(x_t)\|_2^2+\beta\|x-x_b\|_2^2 x=xargminZ(x)Z(xt)22+βxxb22(9) because x ′ x' x And x b x_b xb lately , Based on the training set X t r a i n + { x } ′ X_{train}+\{x\}' Xtrain+{ x} The training model will put x ′ x' x Forecast as y b y_b yb. Use the new model to predict x t x_t xt, The optimization goal will be forced closer x t x_t xt And x ′ x' x The predicted score of , the x t x_t xt Forecast as y b y_b yb.


【1】Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Similar articles