Threat Hunting Network Connections Using Zeek and AI

Threat hunting over the network always poses a challenge. With the vast amount of inbound and outbound connections that occur within any organization, it’s critical to be able to hunt for threats that happen over the wire. Malicious network connections often involve weird ports, large numbers of packets or bytes, beaconing, as well as long standing network connections. Luckily, there’s a way to exercise your organization’s hunting capabilities. The team at Active Countermeasures release threat hunting challenges that test your network threat hunting capabilities to see whether your existing processes can detect these kinds of threats. The challenge we’ll be addressing today involves locating a command and control (C2) beacon that occurs over a network tunnel. We will be using AI in order to analyze network connections to find the threat in the logs.

First, we’ll need to collect the logs. The challenge provides you with PCAPs of the network activity as well as 24 hours of Zeek logs. Most organizations collect network logs of sorts similar to Zeek, so we’ll use the Zeek logs for this challenge. Let’s take a look at the logs, specifically the conn.log file, to see what it contains:

Zeek Network Logs

As we can see, Zeek contains a lot of great information in the conn.log file. Each log entry contains information about the various properties of the network connection. For this challenge, we will focus on the following properties:

  1. id.resp_p - the destination port of the network connection

  2. duration - how long the connection was opened

  3. Orig_bytes - how many bytes were sent from the host to the destination

  4. resp_bytes - how many bytes were sent from the destination to the host

  5. orig_pkts - how many packets were sent from the host to the destination

  6. resp_pkts - how many packets were sent from the destination to the host

Because we’re looking for anomalous behavior in the logs, we need to establish the “normal” for the logs. This can be done using artificial intelligence, specifically using a GAN in Tensorflow which was covered in this post. Because the GAN is the same, we’ll focus more on how we represent the Zeek data properties:

Representing the Zeek data in Tensorflow

We’ll represent each connection as a 10-dimensional tensor, where 5 of the dimensions represent the destination port, and the remaining 5 dimensions represent the duration, orig_bytes, resp_bytes, orig_pkts, and resp_pkts. The destination port is represented as 5 dimensions because the highest port number is 5 digits (65535), and we’re looking for any port numbers that normally don’t show up in the logs. The rest of the dimensions are treated as regular numbers, as we’re establishing the “normal” ranges of values for these properties. The values of each dimension will be scaled between 0 and 1 so that it works with the GAN anomaly detector.

Once the conn.log data is loaded and trained on the GAN, we’ll have a discriminator that acts as an anomaly detector for the conn.log data. We’ll feed it the conn.log data one more time for the discriminator to figure out the anomalies. We’ll use basic zscore system that determines how far each of the entries fall from the median. The higher the score, the further the log is from the median, signifying an anomalous entry.

Basic ZScore code

From here, we now have a Zscore of each log. After going through the ZScores, the majority of them fall between 1-3 as expected. However, a ZScore of 10 appeared for the final entry:

We have a log entry that falls well outside each of the established properties, including an anomalous port, a long connection duration, a large number of bytes involved with the connection, and a large number of packets involved with the connection. Research for vulnerabilities associated with this port (11601) show Ivanti exploitation as something likely to have happened here.

Overall, this was a great exercise in threat hunting and AI. The challenge can be found on Active Countermeasure’s post here, and the code involved with this challenge can be found on the QFunction Github.

Previous
Previous

How User Behavior Anomaly Detection Can Protect Your Organization

Next
Next

Automated Threat Hunting Advantages