AI Researchers Advance Noise Injection Defences With New 2022–2025 Techniques
Image Credit: Jacky Lee
Researchers in artificial intelligence are continuing to investigate noise-injection strategies as a way to strengthen AI models against adversarial examples — carefully engineered, often imperceptible perturbations that can cause neural networks to misclassify inputs.
Noise-based defences typically involve adding controlled Gaussian or adaptive noise to inputs, activations or weights during training or inference. The approach builds on long-standing evidence that random noise can act as a regulariser, reducing a model’s sensitivity to tiny, malicious perturbations.
What Noise Injection Involves
Adversarial examples first gained widespread attention through a 2013 study by Christian Szegedy and colleagues at Google, which showed that imperceptible pixel-level changes could reliably fool deep neural networks. This discovery catalysed various defensive strategies, including noise-injection-based methods.
A widely cited early method is Parametric Noise Injection (PNI), introduced in a 2018 arXiv preprint and later presented at CVPR 2019 by Adnan Siraj Rakin, Zhezhi He and Deliang Fan. PNI injects trainable Gaussian noise into network weights or activations, integrating this into an adversarial training framework. The parameters governing the noise distribution are optimised via a robust min–max objective, enabling models to retain stronger performance under white-box attacks such as Projected Gradient Descent (PGD).
Later work extended this approach. Adaptive Noise Injection (AdaNI) — published in the journal Computer Vision and Image Understanding in 2024 (with an online DOI dated 2023) — introduces feature-dependent, non-uniform noise. This design allows noise intensity to be allocated based on feature relevance, improving robustness while maintaining clean accuracy.
The rapid progress in diffusion models has inspired a parallel branch of adversarial purification techniques. These include DiffPure (ICML 2022), the Adversarial Diffusion Bridge Model (ADBM), first released as a 2024 preprint and later published at ICLR 2025, and MANI-Pure (2025). These methods inject noise at inference time, using forward diffusion steps to mask adversarial perturbations and reverse-diffusion processes to reconstruct cleaned samples. MANI-Pure further applies frequency-aware, magnitude-adaptive noise, focusing on high-frequency regions where many adversarial attacks tend to concentrate.
Together, these approaches cover both training-time defences and inference-time purification, addressing different points in the model pipeline.
Background and Development
Adversarial vulnerabilities became more widely recognised after the introduction of the Fast Gradient Sign Method (FGSM) in 2014, which demonstrated how single-step gradient-based perturbations could reliably fool image classifiers. This was followed by a surge of research into robust optimisation, most notably the influential adversarial training formulation proposed by Athalya Madry and colleagues in 2017.
Noise-injection techniques emerged as complementary strategies rather than replacements for adversarial training. PNI showed that combining trainable noise with PGD-based adversarial training could outperform earlier PGD-only defences on CIFAR-10. AdaNI extended this by adjusting noise intensity based on feature importance, improving robustness with less impact on clean performance.
Diffusion-based defences further broadened the landscape. Their purification pipelines rely on the inherent noise-injection processes of diffusion models: a forward step adds controlled Gaussian noise to overwhelm adversarial signals, and a reverse step denoises the sample. Because these techniques use pre-trained diffusion models, they often avoid retraining the classifier itself, making them appealing in practice.
How the Techniques Work
Training-Based Defences (PNI, AdaNI)
PNI injects Gaussian noise sampled from a learnable distribution into network layers. Training follows a min–max structure: an inner maximisation generates adversarial or noise-amplified perturbations, while an outer minimisation optimises the model for accuracy and robustness. AdaNI builds on this by using feature-importance-aware noise, injecting stronger noise into more vulnerable or high-impact feature regions.
Inference-Time Purification (DiffPure, ADBM, MANI-Pure)
Incoming inputs are processed through a forward diffusion step, introducing Gaussian noise to obscure adversarial patterns. A reverse denoising process, powered by a pre-trained diffusion model, reconstructs a cleaner sample. MANI-Pure enhances this with frequency-domain analysis, adjusting noise strength based on magnitude spectra to prevent over-smoothing benign features.
Impact and Comparisons
Studies show that noise-injection-based defences can achieve higher clean accuracy than traditional adversarial training while maintaining competitive robust accuracy on benchmarks such as CIFAR-10. PNI and similar approaches demonstrated state-of-the-art robustness against PGD attacks during their initial evaluations.
Diffusion-based purification methods have delivered strong empirical performance, particularly on high-resolution datasets such as ImageNet, where denoising-based defences historically struggled. These methods focus on empirical robustness rather than formal guarantees.
By contrast, randomized smoothing, a certified defence that averages predictions over many noisy samples, provides mathematical robustness guarantees but often at the cost of clean accuracy. Recent hybrid methods that combine diffusion purification with randomized smoothing frameworks show improved certified robustness over traditional smoothing-only baselines.
From a resource perspective, methods based solely on noise augmentation or purification can be less computationally demanding than full multi-step PGD adversarial training, which requires iterative attack generation. However, noise-injection techniques that incorporate adversarial training (such as PNI and AdaNI) do not significantly reduce training cost.
Noise-augmented training has also been observed to generalise well to natural corruptions, aligning with broader findings that Gaussian or structured noise improves performance on datasets such as CIFAR-10-C and ImageNet-C.
Future Research Directions
As AI models are increasingly deployed in safety-critical fields such as autonomous driving, robotics and medical imaging, researchers expect hybrid, multi-layered defences to become more prominent, combining adversarial training, adaptive noise, frequency-domain analysis and diffusion-based purification.
Current studies also focus on improving computational efficiency for deployment on edge devices and strengthening protection against black-box and transfer-based attacks.
While no defence fully eliminates adversarial risk, targeted noise-injection strategies remain an active and promising direction in improving the reliability of modern AI systems.
Source: arXiv, Science Direct, PMLR, ECVA, ACM Digital Library, Neurips, UCL, Harvard, CEUR
We are a leading AI-focused digital news platform, combining AI-generated reporting with human editorial oversight. By aggregating and synthesizing the latest developments in AI — spanning innovation, technology, ethics, policy and business — we deliver timely, accurate and thought-provoking content.
