Mathematical statistics and learning | |
Adversarial examples in random neural networks with general activations | |
article | |
Andrea Montanari1  Yuchen Wu1  | |
[1] Stanford University | |
关键词: Adversarial example; neural network; | |
DOI : 10.4171/msl/41 | |
来源: European Mathematical Society | |
【 摘 要 】
A substantial body of empirical work documents the lack of robustness in deep learning models to adversarial examples. Recent theoretical work proved that adversarial examples are ubiquitous in two-layers networks with sub-exponential width and ReLU or smooth activations, and multi-layer ReLU networks with sub-exponential width. We present a result of the same type, with no restriction on width and for general locally Lipschitz continuous activations. More precisely, given a neural network f( ⋅ ;θ)f(\,\cdot\,;\mathbf{\theta})f(⋅;θ) with random weights θ\mathbf{\theta}θ, and feature vector x\mathbf{x}x, we show that an adversarial example x′\mathbf{x}'x′ can be found with high probability along the direction of the gradient ∇xf(x;θ)\nabla_{\mathbf{x}}f(\mathbf{x};\mathbf{\theta})∇xf(x;θ). Our proof is based on a Gaussian conditioning technique. Instead of proving that fff is approximately linear in a neighborhood of x\mathbf{x}x, we characterize the joint distribution of f(x;θ)f(\mathbf{x};\mathbf{\theta})f(x;θ) and f(x′;θ)f(\mathbf{x}';\mathbf{\theta})f(x′;θ) for x′=x−s(x)∇xf(x;θ)\mathbf{x}' = \mathbf{x}-s(\mathbf{x})\nabla_{\mathbf{x}}f(\mathbf{x};\mathbf{\theta})x′=x−s(x)∇xf(x;θ), where s(x)=sign(f(x;θ))⋅sds(\mathbf{x}) = \operatorname{sign}(f(\mathbf{x}; \mathbf{\theta})) \cdot s_ds(x)=sign(f(x;θ))⋅sd for some positive step size sds_dsd.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202307150000700ZK.pdf | 586KB | download |