Adversarial Reprogramming of Neural Networks

No cover — Adversarial Reprogramming of Neural Networks (2018, Arxiv)

Published Nov. 29, 2018 by Arxiv.

(1 review)

Deep neural networks are susceptible to \emph{adversarial} attacks. In computer vision, well-crafted perturbations to images can cause neural networks to make mistakes such as confusing a cat with a computer. Previous adversarial attacks have been designed to degrade performance of models or cause machine learning models to produce specific outputs chosen ahead of time by the attacker. We introduce attacks that instead {\em reprogram} the target model to perform a task chosen by the attacker---without the attacker needing to specify or compute the desired output for each test-time input. This attack finds a single adversarial perturbation, that can be added to all test-time inputs to a machine learning model in order to cause the model to perform a task chosen by the adversary---even if the model was not trained to do this task. These perturbations can thus be considered a program for the new task. We demonstrate adversarial reprogramming on …

Deep neural networks are susceptible to \emph{adversarial} attacks. In computer vision, well-crafted perturbations to images can cause neural networks to make mistakes such as confusing a cat with a computer. Previous adversarial attacks have been designed to degrade performance of models or cause machine learning models to produce specific outputs chosen ahead of time by the attacker. We introduce attacks that instead {\em reprogram} the target model to perform a task chosen by the attacker---without the attacker needing to specify or compute the desired output for each test-time input. This attack finds a single adversarial perturbation, that can be added to all test-time inputs to a machine learning model in order to cause the model to perform a task chosen by the adversary---even if the model was not trained to do this task. These perturbations can thus be considered a program for the new task. We demonstrate adversarial reprogramming on six ImageNet classification models, repurposing these models to perform a counting task, as well as classification tasks: classification of MNIST and CIFAR-10 examples presented as inputs to the ImageNet model.

1 edition

Jacob T. reviewed Adversarial Reprogramming of Neural Networks by Gamaleldin F. Elsayed

The first "RCE" against ML that I came across

5 stars

I have sent this paper to a number of people over the years from when it first came out, I am surprised there is less attention to this type of attack, despite being a white-box model. This is the first class of attack that lets the attacker reprogram an image classification model to perform an attacker-determined task (e.g., turning an image classifier into a counter task).

Reviewing this paper 5 years after its release, it still stands up, and I see there is a small field of work in this lineage that includes similar attacks against NLP classifiers. I would count this paper as the starting point for this class of attack, which is an impressive and high-impact field.