Malware for Neural Networks: Let’s Get Hacking!
Posted by robkraft on March 24, 2017
I don’t intend to infect any artificial intelligence systems with malware. But I do intend to provide an overview of the techniques that can be used to damage the most popular AI in use today, neural networks.
With traditional hacking attempts, bad actors attempt to plant their own instructions, their own computer code, into an existing software environment to cause existing software to behave badly. But these techniques will not work on neural networks. Neural networks are nothing more than a big collection of numbers and mathematical algorithms that no human can understand well enough to alter in order to obtain a malicious desired outcome. Neural networks are trained, not programmed.
But I am not implying that damage cannot be done to neural networks, or that they can’t be corrupted for evil purposes. I am implying that the techniques for malware must be different.
I have identified five types of malware, or perhaps I should say five techniques, for damaging a neural network.
The simplest technique for changing the behavior of an existing neural network is probably to transplant the existing neural network with a new one. The new, malicious, neural network presumably would be one that you have trained using the same inputs the old one expected, but the new one would produce different outcomes based on the same inputs. To successfully implement this malware, the hacker would first need to train the replacement neural network, and to do so the hacker needs to know the number of input nodes and the number of output nodes, and also the range of values for each input and the range of results of each output node. The replacement neural net would need to be trained to take the inputs and produce the outputs the hacker desires. The second major task would be to substitute the original neural network with the new neural network. Neural networks accessible to the Internet could be replaced once the hacker had infiltrated the servers and software of the existing neural network. It could be as simple as replacing a file, or it could require hacking a database and replacing values in different tables. This all depends on how the data for the neural network is stored, and that would be a fact the hacker would want to learn prior to even attempting to train a replacement neural network. Some neural networks are embedded in electronic components. A subset of these could be updated in a manner similar to updating firmware on a device, but other embedded neural networks may have no option for upgrades or alterations and the only recourse for the hacker may be to replace the hardware component with a similar hardware compare that has the malicious neural network embedded in it. Obviously there are cases where physical access to the device may be required in order to transplant a neural network.
If a hacker desires to damage a neural network, but is unable or unwilling to train a replacement neural network, the hacker could choose the brute force technique called the lobotomy. As you might guess, when the hacker performs a lobotomy the hacker is randomly altering the weights and algorithms or the network in order to get it to misbehave. The hacker is unlikely to be able to choose a desired outcome or make the neural network respond to specific inputs with specific outputs, but the random alterations introduced by the hacker may lead the neural network to malfunction and produce undesirable outputs. If a hackers goal is to sow distrust in the user community of a specific neural network or of neural networks in general, this may be the best technique for doing so. If one lobotomy can lead a machine to choose a course of action that takes a human life, public sentiment against neural networks will grow. As with a transplant, the hacker also needs to gain access to the data of the existing neural network in order to alter that data.
Of the five hacking techniques presented here I think that paraphasia is the most interesting because I believe it is the one a hacker is most likely to have success with. The term is borrowed from psychology to describe a human disorder that causes a person to say one word when they mean another. In an artificial neural network, paraphasia results when a saboteur maps the response from the neural network to incorrect meanings. Imagine that Tony Stark, aka Iron Man, creates a neural network that uses face recognition to identify each of the Avengers. When the neural network inputs send an image of Captain America through the neural network layers, the neural network recognizes him, and then assigns the label “Captain America” to the image. But a neural network with paraphasia, or I should say a neural network that has been infected with paraphasia, would see that image and assign the label of “Loki” to it. Technically speaking, paraphasia is probably not accomplished by manipulating the algorithms and weights of the neural networks. Rather, it is achieved by manipulating the labels assigned to the outputs. This makes it the most likely candidate for a successful neural network hacking attempt. If I can alter the software consuming the output of a neural network so that when it sees my face it doesn’t assign my name to it, but instead assigns “President of the United States” to it, I may be able to get into secret facilities that I would otherwise be restricted from.
Open and Closed Networks
The first three hacking techniques could be applied to neural networks that are open, or that are closed. A closed neural network is a network that no longer adjusts its weights and algorithms based on new inputs. Neural networks embedded in hardware will often be closed, but the designers of any neural network may choose to close the neural network if they feel it has been trained to an optimal state. An open neural network is a network that continues to adjust its weights and algorithms based on new inputs. This implies that the neural network is open to two additional forms of attack.
Many neural networks we use today continue to evolve their learning algorithms in order to improve their responses. Many voice recognition systems attempt to understand the vocalizations of their primary users and adapt their responses to produce the desired outcomes. Some neural networks that buy and sell stocks alter their algorithms and weights with feedback from the results of those purchases and sales. Neural network designers often strive to create networks that can learn and improve without human intervention. Others attempt to crowdsource the training of their neural networks, and one example of this you may be familiar with is captcha responses that ask you to identify items in pictures. The captcha producer is probably not only using your response to confirm that you are a human, but also to train their neural network on image recognition. Now, imagine that you had a way to consistently lie to the captcha collection neural network. For dramatic effect, let’s pretend that the captcha engine showed you nine images of people and asked you to click on the image of the President of the United States. Then imagine that, as a hacker, you are able to pick the image of your own face millions of times instead of the face of the President. Eventually you may be able to deceive the neural network into believing that you are the President of the United States. Once you had completed this brainwashing of the neural network, you could go to the top secret area and the facial recognition software would let you in because it believed you to be the President. I am not saying that brainwashing would be easy. I think it would be really difficult. And I think it would only work in the case where you could programmatically feed a lot of inputs to the neural network and have some control over the identification of the correct response. For best results, a hacker might attempt to use this technique on a neural network that was not receiving updates through a network like the Internet, but was only receiving updates from a local source. A neural network running inside an automated car or manufacturing facility may operate with this design. Brainwashing is similar to paraphasia. The difference is that in brainwashing, you train the neural network to misidentify the output, but in paraphasia you take a trained neural network and map its output to an incorrect value.
Like a lobotomy, the overstimulation technique only allows the hacker to cause mischief and cause the neural network to make incorrect choices. The hacker is very unlikely to achieve a specific desired outcome from the neural network. Overstimulation can only occur on poorly designed neural networks and essentially these are neural networks that are subject to the overfitting flaw of neural network design. A neural network that is not closed and designed with an inappropriate number of nodes or layers could be damaged by high volumes of inputs that were not examples from the original training set.
Layers of difficulty
To all you aspiring hackers, I also warn you that our neural networks are getting more complex and sophisticated every day and I think this makes it even more difficult to hack them describing the techniques mentioned here. The deep learning revolution has been successful in many cases because multiple neural networks work in sequence to produce a response. The first neural network in the sequence may just try to extract features from the incoming sources. The identified features are the output of the first network and these are passed into a second neural network for more grouping, classification, or identification. After that these results could be passed on to another neural network that makes responses based upon the outputs of the previous neural network. Therefore, any attempted hack upon the process needs to decide which of the neural networks within the sequence to damage.
I am not encouraging you to try to introduce malware into neural networks. I am strongly opposed to anyone attempting to do such things. But I believe it is important for system engineers to be aware of potential ways in which a neural network may be compromised, and raising that awareness is the only purpose of this article.