Automated Threat Hunting With Splunk and AI
Automating Threat Hunting With Artificial Intelligence
With the rapid rise of artificial intelligence tools within our daily lives and careers, it’s no surprise that cybersecurity tools are utilizing AI to better secure computing environments. These days, you can’t find a modern cybersecurity defense tool that doesn’t utilize AI in some form. EDR solutions, cloud security solutions, and SIEMs are some of the most popular cybersecurity tools that utilize some kind of AI in their offerings.
I do believe that the use of AI in cybersecurity is not going anywhere. In fact, its use will only continue to grow as the volume of security data to analyze increases and cybersecurity teams continue to be understaffed. Simply put, most cybersecurity teams do not have the manpower to investigate every possible cybersecurity anomaly. AI, when implemented properly, can help bring those hidden security threats to the surface and can help prevent analysts from looking for needles in haystacks that would normally require larger, more established cybersecurity teams that have the resources and manpower to burn on activities like these.
However, the biggest challenge for companies and businesses using AI is the monetary investment required to use it. Whether you train your analysts to utilize AI or buy an outside tool to do it for you, you will have to spend some money to use it. As someone who has used Splunk quite a bit in his cybersecurity career, I wanted to find a free threat hunting with Splunk solution that can be used to implement AI to help cybersecurity teams automate the threat hunting process and produce high fidelity results that analysts can use for their own threat hunting activities.
Fortunately, I came across the Splunk Data Science and Deep Learning (DSDL) app. The DSDL app allows you to create your own AI models, train the models with your own Splunk data, and deploy the models in production where you can run new data through your trained model for whatever purpose you are trying to accomplish. I obviously chose to use the app for threat hunting purposes. The only investment required to use the app is Splunk itself and having some training in creating AI models. I personally recommend the SANS SEC595 course. While pricey, it’s definitely worth the investment.
The Splunk DSDL app works by creating a development environment using virtualization software of your choosing (either Docker or Kubernetes). The idea is create your AI model within the development environment using typical AI tools like TensorFlow, PyTorch, pandas, etc. Once created, you can train it using typical Splunk Machine Learning Toolkit syntax. I personally did not have great results from training the model within Splunk because the training took way too long due to the amount of data with which the model was trained. Instead, I trained it outside of DSDL and imported the trained model into the development environment. You can then deploy a dedicated container containing your trained model against which you can run your Splunk data to gain some insights into the data.
I managed to create an AI model that determines whether a command executed on the domain controller within my lab is normal or anomalous. Interestingly enough, it accurately detects commands that are not commonly executed on the domain controller. In the next section I’ll show the approach that I used in order to create this AI.
Threat Hunting on Domain Controllers Using Deep Learning
This section assumes that the domain controllers are already logging to Splunk. In order to create an AI to perform automated threat hunting on domain controllers, we will follow five steps:
Understand the problem space
Gather the appropriate data for the AI
Create and train the AI using the data
Test the AI
Future Considerations
Understand the Problem Space
Any kind of threat hunting requires the appropriate logs. You can’t hunt what you can’t see. In my case, I want to see any anomalies that occur on the command line of my domain controller. My hypothesis is that if an attacker compromises a domain controller, they will most likely have some kind of command line access to it. Therefore, I need to see logs of what happens on my domain controller’s command line. I can perform this by enabling command line logging on my domain controller via GPO, the steps for which can be found on Microsoft’s site here.
For my fellow penetration testers and red teamers, I realize that a lot of attacks don’t occur directly on the command line (PowerShell, C#, direct syscalls, etc), but this is just a starting point, so bear with me.
Gather the Appropriate Data for the AI
Now that I have the domain controller creating logs for anything executed on the command line, I can now see what command line logs my domain controller is producing within Splunk:
I will focus on the following four fields for each command line log: the account domain, the account name, the actual command line log, and the creator process name. Here are the reasons why I’m focusing on each of these four fields:
Account_Domain – if a command is executed by a user from a domain that does not normally show up on the domain controller, then this is an anomaly and I want to know about it
Account_Name – if a command is executed by a user that does not normally log into the domain controller to run commands, then this is an anomaly and I want to know about it
Process_Command_Line – if a command is executed on a domain controller that normally does not execute on my domain controller, then this is an anomaly and I want to know about it
Creator_Process_Name – if a process is spawning a child process that normally does not occur on my domain controller, then this is an anomaly and I want to know about it
Create and Train the AI Using the Data
With the command line logs ready to go, I can now create the AI needed in order to perform automated threat hunting. The approach for creating the AI is simple: I need to create an AI within Tensorflow that learns what is “normal” command line activity for the domain controller so that it can tell me what falls outside this range of “normal” with some degree of statistical certainty. To create this, I’m going to utilize deep learning. Specifically, I’m going to create a neural network that implements what’s known as an autoencoder. Autoencoders are able to reconstruct the most frequently observed characteristics of learned data. In my instance of the autoencoder, I will train it with a week’s worth of command line logs from my domain controller. Once it’s trained, it will know the “normal” characteristics of how a command line log from my domain controller is supposed to look. I can then send new command line logs from Splunk to the autoencoder where it will assign a “score” to each log, with a higher score meaning a higher likelihood that the command line log is an anomaly. Keep in mind that this explanation of how the autoencoder works is very high level and does not cover the technical details of its implementation.
The Splunk DSDL app requires that you implement the fit() and apply() methods which are responsible for training your model and executing your model respectively. However, I will say that training the model within the DSDL app using the Splunk Machine Learning Toolkit commands is near impossible due to the volume of data that’s needed to train the autoencoder properly. I had to train the autoencoder outside of Splunk using my Nvidia GPU and then import it back into Splunk. I had to open up a support ticket to understand exactly how to do this, but the process they recommended does work, nonetheless. Once the model was trained and imported back into DSDL, I was able to implement the apply() function which was responsible for executing the model and assigning a “score” to an event, which in this case would be how anomalous the command line log appears to be.
Test the AI
Finally, it’s time to test the autoencoder. We’ll perform the test by running Active Directory reconnaissance commands that are not frequently executed on the domain controller. I’ll execute some basic reconnaissance commands using the net.exe utility and dsquery as well as a basic PowerShell command on the command line for good measure. With these commands successfully executed, I can view them within Splunk:
Now I’ll run the events through the AI and have it assign the events a score. I’ll also run other events through the AI as a baseline. Notice how the reconnaissance commands and the Powershell command score much higher than the baseline events under the anomaly_score_0 column:
I can now automate this search, therefore automating the threat hunting process, using Splunk’s capabilities for scheduled searches and real time searches.
Future Considerations
New tactics, techniques, and procedures arise all the time within cybersecurity. Even with the knowledge contained within the MITRE ATT&CK framework, it’s almost impossible to defend against every kind of attack that can hit a domain controller. Sophisticated attacks can also completely bypass Endpoint Detection and Response (EDR) solutions, meaning that while installing them on domain controllers greatly increases their defenses, they do not provide a perfect defense, especially when the attacks may come from legitimate users. Having an AI to routinely check your logs can provide some peace of mind when it comes to defending your domain controllers. What’s nice is that this autoencoder approach can be used for all kinds of logs (firewall, cloud providers, etc) for anomaly detection.
Something to keep in mind is that as new software is installed on the domain controller and new users access it, the domain controller may start generating new logs that are not anomalies/threats but are not known by the AI. Therefore, this AI will need to be updated periodically as new events establish a new “normal” for the domain controller. This retraining of the AI should be done on some regular cadence, whether that is quarterly or annually. Knowing what is normal for your environment is critical for performing useful threat hunts.
If you’d like to see this kind of automated threat hunting implemented in your Splunk environment, feel free to contact QFunction and schedule a consultation!