对抗样本生成技术分析与实现2
字数 2436 2025-08-20 18:18:10
对抗样本生成技术分析与实现
1. 对抗样本在网络空间安全中的应用
对抗样本技术在网络安全多个领域有广泛应用:
1.1 恶意软件检测规避
- 通过对恶意软件样本添加微小扰动,使其在特征空间中发生微小变化
- 可绕过基于机器学习的恶意软件检测引擎
- 可生成新的、具有对抗性的恶意软件变种,增加检测难度
1.2 入侵检测系统欺骗
- 构造对抗性网络流量欺骗入侵检测系统
- 导致系统无法识别真正的入侵行为
- 可诱使系统产生误报或漏报,影响安全防护效果
1.3 物联网设备攻击
- 向智能设备发送对抗样本误导传感器或控制系统
- 可实现对设备的非授权控制
- 可攻击智能家居系统中的语音助手、智能摄像头等设备
1.4 网站安全威胁
- 生成高度欺骗性的钓鱼邮件
- 诱导用户点击恶意链接或下载恶意软件
- 绕过基于机器学习的反钓鱼系统
2. PGD (Projected Gradient Descent) 攻击方法
2.1 理论基础
PGD方法来自论文《Towards Deep Learning Models Resistant to Adversarial Attacks》,核心思想是基于鲁棒优化理论:
-
采用鞍点(min-max)问题框架:
- 内部最大化问题:找到使损失最大的对抗样本
- 外部最小化问题:寻找使期望损失最小的模型参数
-
形式化定义:
- 数据分布D产生样本对(x,y)
- 鲁棒风险函数ρ(θ) = E_{(x,y)~D}[max_{δ∈S} L(θ,x+δ,y)]
- S为允许的扰动集合,定义攻击范围
-
关键特点:
- 明确安全保证而非仅提高特定攻击鲁棒性
- 使用范数球形式化敌手操作能力
- 模型需要更大容量以抵御强攻击
2.2 实现方法
2.2.1 基本PGD实现
class PGDAttack(Attack, LabelMixin):
def __init__(self, predict, loss_fn=None, eps=0.3, nb_iter=40,
eps_iter=0.01, rand_init=True, clip_min=0, clip_max=1,
ord=np.inf, l1_sparsity=None, targeted=False):
# 初始化参数
self.predict = predict
self.eps = eps
self.nb_iter = nb_iter
self.eps_iter = eps_iter
self.rand_init = rand_init
self.clip_min = clip_min
self.clip_max = clip_max
self.ord = ord
self.targeted = targeted
self.loss_fn = loss_fn if loss_fn else nn.CrossEntropyLoss(reduction="sum")
def perturb(self, x, y):
# 验证输入
x, y = self._verify_and_process_inputs(x, y)
# 初始化扰动
delta = torch.zeros_like(x)
delta = nn.Parameter(delta)
if self.rand_init:
delta.data = self.rand_init_delta(delta.data, x, self.eps, self.ord)
# 迭代攻击
rval = self.perturb_iterative(x, y, delta)
return rval.data
2.2.2 L∞范数PGD攻击
class LinfPGDAttack(PGDAttack):
def __init__(self, predict, loss_fn=None, eps=0.3, nb_iter=40,
eps_iter=0.01, rand_init=True, clip_min=0, clip_max=1,
targeted=False):
super().__init__(predict=predict, loss_fn=loss_fn, eps=eps,
nb_iter=nb_iter, eps_iter=eps_iter,
rand_init=rand_init, clip_min=clip_min,
clip_max=clip_max, ord=np.inf, targeted=targeted)
2.2.3 L2范数PGD攻击
class L2PGDAttack(PGDAttack):
def __init__(self, predict, loss_fn=None, eps=0.3, nb_iter=40,
eps_iter=0.01, rand_init=True, clip_min=0, clip_max=1,
targeted=False):
super().__init__(predict=predict, loss_fn=loss_fn, eps=eps,
nb_iter=nb_iter, eps_iter=eps_iter,
rand_init=rand_init, clip_min=clip_min,
clip_max=clip_max, ord=2, targeted=targeted)
2.3 攻击效果
- 可将大熊猫图像误分类为开关(L∞范数)
- 可将大熊猫图像误分类为猪(L2范数)
3. Momentum Iterative Attack (动量迭代攻击)
3.1 理论基础
来自论文《Boosting Adversarial Attacks with Momentum》:
-
核心思想:
- 在迭代过程中引入动量项稳定更新方向
- 帮助算法逃离局部最优解
- 增强对抗样本的迁移性
-
数学形式化:
- 初始化:选择输入x和标签y,设定扰动大小ε,迭代次数T,衰减因子μ
- 迭代过程:
- 计算当前样本x_t的梯度∇_x L(x_t, y)
- 更新动量向量:g_{t+1} = μ·g_t + ∇_x L(x_t*, y)/||∇_x L(x_t*, y)||_1
- 更新对抗样本:x_{t+1}^* = clip_{x,ε}(x_t* + α·sign(g_{t+1}))
- 输出:最终对抗样本x_T*
-
关键优势:
- 提高白盒和黑盒攻击成功率
- 可攻击具有防御机制的模型
- 通过集成多个模型进一步增强迁移性
3.2 实现方法
3.2.1 基本动量迭代攻击
class MomentumIterativeAttack(Attack, LabelMixin):
def __init__(self, predict, loss_fn=None, eps=0.3, nb_iter=40,
decay_factor=1.0, eps_iter=0.01, clip_min=0, clip_max=1,
targeted=False, ord=np.inf):
# 初始化参数
super().__init__(predict, loss_fn, clip_min, clip_max)
self.eps = eps
self.nb_iter = nb_iter
self.decay_factor = decay_factor
self.eps_iter = eps_iter
self.targeted = targeted
self.ord = ord
self.loss_fn = loss_fn if loss_fn else nn.CrossEntropyLoss(reduction="sum")
def perturb(self, x, y):
# 验证输入
x, y = self._verify_and_process_inputs(x, y)
# 初始化扰动和动量
delta = torch.zeros_like(x)
g = torch.zeros_like(x)
delta = nn.Parameter(delta)
# 迭代攻击
for _ in range(self.nb_iter):
if delta.grad is not None:
delta.grad.detach_()
delta.grad.zero_()
img_adv = x + delta
outputs = self.predict(img_adv)
loss = self.loss_fn(outputs, y)
if self.targeted:
loss = -loss
loss.backward()
# 更新动量
g = self.decay_factor * g + delta.grad / torch.norm(delta.grad, p=1)
# 根据范数类型更新扰动
if self.ord == np.inf:
delta.data += self.eps_iter * torch.sign(g)
elif self.ord == 2:
delta.data += self.eps_iter * g / torch.norm(g, p=2)
delta.data = torch.clamp(delta.data, -self.eps, self.eps)
delta.data = torch.clamp(x + delta.data, self.clip_min, self.clip_max) - x
return x + delta.data
3.2.2 L2范数动量攻击
class L2MomentumIterativeAttack(MomentumIterativeAttack):
def __init__(self, predict, loss_fn=None, eps=0.3, nb_iter=40,
decay_factor=1.0, eps_iter=0.01, clip_min=0, clip_max=1,
targeted=False):
super().__init__(predict=predict, loss_fn=loss_fn, eps=eps,
nb_iter=nb_iter, decay_factor=decay_factor,
eps_iter=eps_iter, clip_min=clip_min,
clip_max=clip_max, targeted=targeted, ord=2)
3.2.3 L∞范数动量攻击
class LinfMomentumIterativeAttack(MomentumIterativeAttack):
def __init__(self, predict, loss_fn=None, eps=0.3, nb_iter=40,
decay_factor=1.0, eps_iter=0.01, clip_min=0, clip_max=1,
targeted=False):
super().__init__(predict=predict, loss_fn=loss_fn, eps=eps,
nb_iter=nb_iter, decay_factor=decay_factor,
eps_iter=eps_iter, clip_min=clip_min,
clip_max=clip_max, targeted=targeted, ord=np.inf)
3.3 攻击效果
- 可将大熊猫图像误分类为迷宫(基本动量攻击)
- 可将大熊猫图像误分类为猪(L2范数)
- 可将大熊猫图像误分类为泡沫(L∞范数)
4. SPSA (Simultaneous Perturbation Stochastic Approximation) 攻击
4.1 理论基础
来自论文《Adversarial Risk and the Dangers of Evaluating Against Weak Attacks》:
-
核心思想:
- 使用随机扰动估计梯度
- 不需要直接计算模型梯度
- 适用于非微分或难以微分的模型
-
数学形式化:
- 初始化:选择初始图像x0和扰动大小δ
- 迭代过程:
- 随机采样扰动向量v1,...,vn ~ {-1,1}^d
- 计算目标函数变化:f(x_t + δv_i) - f(x_t - δv_i)
- 估计梯度:g_t ≈ (1/nδ) Σ [f(x_t + δv_i) - f(x_t - δv_i)] v_i
- 更新图像:x_{t+1} = clip_{x,ε}(x_t + α·sign(g_t))
- 输出:最终对抗样本x_T
-
关键优势:
- 可绕过梯度掩蔽等防御机制
- 适用于黑盒攻击场景
- 计算效率较高
4.2 实现方法
4.2.1 辅助函数
def linf_clamp_(dx, x, eps, clip_min, clip_max):
# 裁剪梯度到[-eps, eps]范围
dx_clamped = batch_clamp(eps, dx)
# 计算新的对抗样本并裁剪到有效范围
x_adv = torch.clamp(x + dx_clamped, clip_min, clip_max)
# 更新梯度
dx.data = x_adv - x - dx.data
return dx
def _get_batch_sizes(n, max_batch_size):
# 计算批次大小
num_batches = n // max_batch_size
remainder = n % max_batch_size
batch_sizes = [max_batch_size] * num_batches
if remainder > 0:
batch_sizes.append(remainder)
return batch_sizes
4.2.2 SPSA梯度估计
def spsa_grad(predict, loss_fn, x, y, delta, nb_sample, max_batch_size):
with torch.no_grad():
# 初始化梯度
grad = torch.zeros_like(x)
# 扩展输入
x = x.expand(max_batch_size, *x.shape[1:])
y = y.expand(max_batch_size, *y.shape[1:])
def f(xvar, yvar):
return loss_fn(predict(xvar), yvar)
# 分批处理
batch_sizes = _get_batch_sizes(nb_sample, max_batch_size)
for batch_size in batch_sizes:
x_ = x[:batch_size]
y_ = y[:batch_size]
# 生成随机扰动
vb = torch.randint_like(x_, low=0, high=2).float() * 2 - 1
v_ = vb * delta
# 计算正向和负向扰动
df = f(x_ + v_, y_) - f(x_ - v_, y_)
df = df.view(-1, *([1] * (x.dim() - 1)))
grad_ = df / (2 * delta) * vb
grad[:batch_size] += grad_
grad = grad / nb_sample
return grad
4.2.3 SPSA扰动生成
def spsa_perturb(predict, loss_fn, x, y, eps, delta, lr, nb_iter,
nb_sample, max_batch_size, clip_min, clip_max):
# 初始化扰动
dx = torch.zeros_like(x)
dx = nn.Parameter(dx)
# 创建优化器
optimizer = optim.Adam([dx], lr=lr)
for _ in range(nb_iter):
optimizer.zero_grad()
# 计算梯度
grad = spsa_grad(predict, loss_fn, x + dx, y, delta,
nb_sample, max_batch_size)
dx.grad = -grad # 负梯度用于最大化损失
optimizer.step()
# 裁剪扰动
dx.data = linf_clamp_(dx.data, x, eps, clip_min, clip_max)
x_adv = torch.clamp(x + dx.data, clip_min, clip_max)
return x_adv
4.2.4 L∞范数SPSA攻击
class LinfSPSAAttack(Attack, LabelMixin):
def __init__(self, predict, eps=0.3, delta=0.01, lr=0.01,
nb_iter=20, nb_sample=128, max_batch_size=64,
targeted=False, loss_fn=None, clip_min=0, clip_max=1):
super().__init__(predict, loss_fn, clip_min, clip_max)
self.eps = eps
self.delta = delta
self.lr = lr
self.nb_iter = nb_iter
self.nb_sample = nb_sample
self.max_batch_size = max_batch_size
self.targeted = targeted
self.loss_fn = loss_fn if loss_fn else MarginalLoss(reduction="none")
def perturb(self, x, y):
x, y = self._verify_and_process_inputs(x, y)
if self.targeted:
loss_fn = lambda x, y: self.loss_fn(x, y)
else:
loss_fn = lambda x, y: -self.loss_fn(x, y)
x_adv = spsa_perturb(self.predict, loss_fn, x, y, self.eps,
self.delta, self.lr, self.nb_iter,
self.nb_sample, self.max_batch_size,
self.clip_min, self.clip_max)
return x_adv
4.3 攻击效果
- 可将大熊猫图像误分类为窗户
5. 参考文献
- Madry A, Makelov A, Schmidt L, et al. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv:1710.06081
- Dong Y, Liao F, Pang T, et al. Boosting Adversarial Attacks with Momentum. arXiv:1802.05666
- Uesato J, O'Donoghue B, van den Oord A, et al. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks. arXiv:1802.05666
- Li B, Zhou H. Adversarial Example Attacks Toward Android Malware Detection. Springer, 2017
- Apruzzese G, Conti M. SpacePhish: The Evasion-space of Adversarial Attacks using Phishing Websites. IEEE S&P 2021