Creating a GAN in Tensorflow for Anomaly Detection
In my previous post, we discussed how anomaly detection is performed and how it can be implemented using deep learning methods. Today, we’re going to take a look at one of those methods, specifically the generative adversarial network (GAN), and see how it can be used for anomaly detection within logs. Because we discussed the theory behind it in the previous post, we’re going to focus more on the implementation of a GAN in Tensorflow, a deep learning programming framework used within artificial intelligence development.
First, we need to construct the generator and the discriminator. The generator is responsible for producing fake data, and the discriminator is responsible for distinguishing the fake data produced by the generator from the actual training data, where the actual training data would be your log sources. Let’s take a look at the generator:
Without diving too deep into the technical details, this specific GAN has 6 layers (denoted by the generator.add(Dense…) lines), not including the final layer. Each layer has nodes associated with it that are responsible for performing the mathematical calculations for learning how the data in your log sources “looks”. GANs can be customized as you see fit, and you can modify various values associated with them, including the number of layers, the number of nodes, as well as additional values.
Now let’s take a look at the discriminator:
The discriminator has 5 layers with its own number of nodes to learn how to distinguish the fake data produced by the generator from the real data directly from the training data. Discriminators can also be customized as you see fit, and you can modify various values associated with them, including the number of layers, the number of nodes, as well as additional values.
Putting the generator and discriminator together forms the entirety of the GAN:
And finally, here is a snippet of the code that performs part of the training process:
The first 3 lines are responsible for gathering data from both the generator and the actual data set and putting them together. The lines under # Train discriminator are responsible for training the discriminator to distinguish the fake generator values from the real data. Finally, the lines under # Train generator are responsible for making the generator better at creating fake data for the discriminator to distinguish.
The best part of performing anomaly detection using deep learning is that it can be used within your cybersecurity program. It can be used as a tool for threat hunting as well as user entity behavior analytics (UEBA). If you’re interested in viewing the entirety of the code, you can go to the QFunction GitHub to see how it was used on Windows logs to detect anomalies. You should also check out this post on the issues with SIEMs and UEBA solutions and why you should consider using custom made AI for your anomaly detection purposes. Also, if you’re interested in implementing this technology within your SIEM or cybersecurity program, contact QFunction to see how we can help!