Counter attack and defense (2): counter strategy against samples

Inge2022-06-23 18:04:05

1 introduce

In order to ensure the security of neural network algorithm , Different types of countermeasures have been put forward one after another :
1) Gradient shielding / confusion (gradient masking/obfuscation): A considerable number of attackers use the gradient information of the classifier to attack , Therefore, masking or confusion gradients can also confuse attackers ;
2) Robustness optimization (robust optimization): Heavy training DNN Classifier can enhance its robustness , So that it can correctly predict the countermeasure samples ;
3) Against sample testing (adversarial examples detection): Learn the distribution of raw data , Thus, the countermeasure sample is detected and prohibited from being input into the classifier .

2 Gradient shielding / confusion

2.1 Defensive distillation (defensive distillation)

Distillation is used to reduce DNN Network scale technology , One for fighting FGSM、L-BFGS attack, perhaps DeepFool The method of attack Main steps as follows :
1) Design softmax Temperature of T T T, Based on the training set ( X , Y ) (X,Y) (X,Y) Training network F F F, It's about T T T Of softmax The function is defined as follows :
s o f t m a x ( x , T ) i = e x i T ∑ j e x j T , where  i = 0 , 1 , … , K − 1 (1) \tag{1} softmax(x,T)_i=\frac{e^{\frac{x_i}{T}}}{\sum_je^{\frac{x_j}{T}}}, \text{where }i=0,1,\dots,K-1 softmax(x,T)i=jeTxjeTxi,where i=0,1,,K1(1) 2) Calculation F ( X ) F(X) F(X) Of s o f t m a x softmax softmax score , And calculate the temperature T T T Under the s o f t m a x softmax softmax score ;
3) Service temperature T T T Under the F ( x ) F(x) F(x) and X X X Training distillation model F T ′ F_T' FT;
4) take F T ′ F_T' FT The model corresponds to T T T Designed as 1, Write it down as F 1 ′ F_1' F1, Then predict the test set containing countermeasure samples X t e s t X_{test} Xtest.
The reason for this is by setting a larger T T T, s o f t m a x softmax softmax Your input will be larger . For example setting T = 100 T=100 T=100 when , sample x x x And its neighborhood points x ′ x' x Logical output of Z ( ⋅ ) Z(\cdot) Z() The difference will be a hundred times larger , among Z ( ⋅ ) Z(\cdot) Z() Used to get s o f t m a x softmax softmax The input of . And when setting T = 1 T=1 T=1 when , F 1 ′ F_1' F1 The output of will become similar to ( ϵ , ϵ , … , 1 − ( m − 1 ) ϵ , ϵ , … , ϵ ) (\epsilon,\epsilon,\dots,1-(m-1)\epsilon,\epsilon,\dots,\epsilon) (ϵ,ϵ,,1(m1)ϵ,ϵ,,ϵ) In the form of , among ϵ \epsilon ϵ For computers, it is infinitely close to 0 Number of numbers . This will Make the score of the target output class close to 1, This makes it difficult for attackers to find F 1 ′ F_1' F1 Gradient information .

2.2 Dispersion gradient (shattered gradients)

Protect the model by preprocessing data : Add a non smooth or non differentiable preprocessor g ( ⋅ ) g(\cdot) g(), And based on g ( X ) g(X) g(X) Training models f f f. classifier f ( g ( ⋅ ) ) f(g(\cdot)) f(g()) About x x x It's nondifferentiable , This will lead to the attacker's failure . For example, thermometer code (thermometer encoding) Image vector x i x_i xi Discretization into l l l Dimension vector τ ( x i ) \tau(x_i) τ(xi), for example l = 10 l=10 l=10 when , τ ( 0.66 ) = 1111110000 \tau(0.66)=1111110000 τ(0.66)=1111110000, Finally, we train based on these vectors DNN Model . Other methods include clipping 、 Compress , And minimizing the total variance . These methods are Block the smooth connection between model input and output , This makes it difficult for attackers to find gradient information ∂ F ( x ) / ∂ x \partial{F(x)}/\partial x F(x)/x.

2.3 Random gradient (stochastic/randomized gradients)

By randomization DNN To confuse the attacker . For example, training a classifier s = { F t : t = 1 , 2 , … , k } s=\{F_t:t=1,2,\dots,k\} s={ Ft:t=1,2,,k}, sample x x x The evaluation link is randomly selected s s s A model in to predict labels y y y. Since the attacker does not know which classifier is used , So the probability of being attacked is reduced . Other operations include randomly discarding some nodes in the network , Change the size of the image and 0 fill .

2…4 Gradient explosion and disappearance (exploding & vanishing gradients)

PixelDefend and Defense-GAN Before classification, the generation model is used to project the potential countermeasure samples onto the benign data manifold , This will lead to the final classification model is a very deep neural network . The reason for the success of this method is , The cumulative product of the partial derivatives of each layer will lead to the gradient ∂ L ( x ) ∂ x \frac{\partial\mathcal{L}(x)}{\partial{x}} xL(x) Extremely small or extremely large , This will prevent the attacker from accurately locating the countermeasure sample .

2.5 Gradient confusion or masking methods are not safe

The disadvantage of this method is , Can only confuse the attacker , Instead of eliminating confrontation samples . for example C&W′s attack Broke through the defensive distillation ,2.2-4 Methods have also been exploded one after another .

3 Robustness optimization

change DNN To improve the robustness of the model , Study how to learn the model parameters to make the desired prediction of potential confrontation samples . The main of this type of method concerns lie in :
1) Learning model parameters θ ∗ \theta^* θ To minimize the average confrontation loss :
θ ∗ = arg min ⁡ θ ∈ Θ E x ∼ D max ⁡ ∥ x ′ − x ∥ ≤ ϵ L ( θ , x ′ , y ) , (2) \tag{2} \theta^*=\argmin_{\theta\in\Theta}\mathbb{E}_{x\sim\mathcal{D}}\max_{\|x'-x\|\leq\epsilon}\mathcal{L}(\theta,x',y), θ=θΘargminExDxxϵmaxL(θ,x,y),(2) 2) Learning model parameters θ ∗ \theta^* θ To maximize the average minimum disturbance distance :
θ ∗ = arg max ⁡ θ ∈ Θ E x ∼ D min ⁡ C ( x ′ ) ≠ y ∥ x ′ − x ∥ . (3) \tag{3} \theta^*=\argmax_{\theta\in\Theta}\mathbb{E}_{x\sim\mathcal{D}}\min_{C(x')\neq y}\|x'-x\|. θ=θΘargmaxExDC(x)=yminxx.(3) A robust optimization algorithm should have a priori knowledge about its potential threats , That is, against space D \mathcal{D} D, Then the defender establishes the targeted classifier of these attack means . For most related work , The goal is to defend based on minimal l p l_p lp ( especially l ∞ l_\infty l and l 2 l_2 l2) Countermeasure samples generated by norm perturbation , This is also the focus of this section .

3.1 Regularization method (regularization methods)

Some early research on defensive antagonism focused on the use of robust DNN Has certain properties to resist the counter sample . for example ,Szegedy Et al. Believe that a robust model should still be stable when the input is distorted , I.e. constraints Lipschitz Constant to impose the output of the model “ stability ”. The training of these regularizations can sometimes heuristically help the model become more robust
1) Penalty layer Lipschitz constant (penalize layer’s Lipschitz constant): When Szegydy They first found DNN When dealing with the vulnerability of the counter sample , They also add some regularization to the surface to make the model more stable . It is recommended to add... Between any two network layers Lipschitz Constant L k L_k Lk
∀ x , δ , ∥ h k ( x ; W k ) − h k ( x + δ ; W k ) ∥ ≤ L k ∥ δ ∥ . (4) \tag{4} \forall x,\delta,\qquad\|h_k(x;W_k)-h_k(x+\delta;W_k)\|\leq L_k\|\delta\|. x,δ,hk(x;Wk)hk(x+δ;Wk)Lkδ.(4) In this way, the output of the network will not be easily affected by the slight disturbance of the input .Parseval The network makes the confrontational risk of the model correctly depend on L k L_k Lk
E x ∼ D L a d v ( x ) ≤ E x ∼ D L ( x ) + E x ∼ D [ max ⁡ ∥ x ′ − x ∥ ≤ ϵ ∣ L ( F ( x ′ ) , y ) − L ( F ( x ) , y ) ∣ ] ≤ E x ∼ D L ( x ) + λ p ∏ k = 1 K L k , (5) \tag{5} \begin{aligned} \underset{x \sim \mathcal{D}}{\mathbb{E}} & \mathcal{L}_{a d v}(x) \leq \underset{x \sim \mathcal{D}}{\mathbb{E}} \mathcal{L}(x)+\\ & \underset{x \sim \mathcal{D}}{\mathbb{E}}\left[\max _{\left\|x^{\prime}-x\right\| \leq \epsilon}\left|\mathcal{L}\left(F\left(x^{\prime}\right), y\right)-\mathcal{L}(F(x), y)\right|\right] \leq \\ & \underset{x \sim \mathcal{D}}{\mathbb{E}} \mathcal{L}(x)+\lambda_{p} \prod_{k=1}^{K} L_{k}, \end{aligned} xDELadv(x)xDEL(x)+xDE[xxϵmaxL(F(x),y)L(F(x),y)]xDEL(x)+λpk=1KLk,(5) among λ p \lambda_p λp Is the of the loss function Lipschitz Constant . The formula surface in the training process , By punishing each hidden layer L k L_k Lk, It can reduce the confrontation risk of the model and continuously increase the robustness of the model . The follow-up has been extended to semi supervised and unsupervised defense .
2) The partial derivative of the penalty layer (penalize layer′s partial derivative): For example, a deep contraction network is introduced to regularize training . The deep contraction network shows that the penalty for the partial derivative of each layer is increased in the standard back propagation framework , It can make the change of input data not cause the output of each layer to change greatly . therefore , It is difficult for the classifier to give different predictions for the disturbed data samples .

3.2 Confrontation training (adversarial training)

1) be based on FGSM Do confrontation training :

3 Robustness optimization

4 Against sample testing


【1】Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Similar articles