Creating an Autoencoder in Tensorflow for Anomaly Detection
In previous posts, we looked at how anomaly detection can be performed with deep learning methods and how a GAN can be implemented in Tensorflow. Today we’re going to look at an implementation of a simple autoencoder created in Tensorflow. Let’s take a look at a code snippet of an autoencoder below:
This is a very basic autoencoder. You’ll notice that there are 7 layers to the autoencoder, with the first 4 layers having 128, 64, 32, and 16 nodes respectively. The decreasing number of nodes in each layer acts as the “compression” of the data down to the latent space, where the latent space is the layer of the 16 nodes. This is the layer that learns the most important parts of the data and what it expects the data to look like. The last 3 layers, the layers with 32, 64, and 128 nodes, then take the data in the latent space and attempt to “decompress” it back to its original form of 128 nodes. This is why the first and last layers of this model have the same number of nodes. In essence, you have an hourglass-looking model that can be visualized like the following:
The training process involves sending the autoencoder large amounts of data for it to learn. After the training process, the latent space has an idea of how any data given to it should look, and the autoencoder should do a reasonable job of decompressing data it’s used to seeing back to its original form. However, if the autoencoder is given data with which it’s not familiar, then it’ll have a hard time decompressing the data back to the original form. The difference between the decompression of the data against the original data can be measured, and if it falls outside an established threshold (which you can set), then you have an anomaly on your hands.
The best part of performing anomaly detection using deep learning is that it can be used within your cybersecurity program. It can be used as a tool for threat hunting as well as user entity behavior analytics (UEBA). If you’re interested in viewing the entirety of the code, you can go to the QFunction GitHub to see how it was used on Windows logs to detect anomalies. You should also check out this post on the issues with SIEMs and UEBA solutions and why you should consider using custom made AI for your anomaly detection purposes. Also, if you’re interested in implementing this technology within your SIEM or cybersecurity program, contact QFunction to see how we can help!