πŸš€ Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

Welcome! This repository hosts the official implementation of our paper, "Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks."

Paper link: arxiv.org/abs/2504.01308


🌟 What’s New?

We propose state-of-the-art solutions to enhance the robustness of Vision-Language Models (VLMs) against Gaussian noise and adversarial attacks. Key highlights include:

  • 🎯 Robust-VLGuard: A pioneering multimodal safety dataset covering both aligned and misaligned image-text pair scenarios.

    RobustVLGuard

  • πŸ›‘οΈ DiffPure-VLM: A novel defense framework that leverages diffusion models to neutralize adversarial noise by transforming it into Gaussian-like noise, significantly improving VLM resilience.

    DiffPure-VLM


✨ Key Contributions

  • πŸ” Conducted a comprehensive vulnerability analysis revealing the sensitivity of mainstream VLMs to Gaussian noise.
  • πŸ“š Developed Robust-VLGuard, a dataset designed to improve model robustness without compromising helpfulness or safety alignment.
  • βš™οΈ Introduced DiffPure-VLM, an effective pipeline for defending against complex optimization-based adversarial attacks.
  • πŸ“ˆ Demonstrated strong performance across multiple benchmarks, outperforming existing baseline methods.

Downloads last month
14
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Jarvis1111/InternVL2-8B-RobustVLGuard

Finetuned
(12)
this model

Dataset used to train Jarvis1111/InternVL2-8B-RobustVLGuard