Talking to Your SIEM Using a Large Language Model

Using Large Language Models

As you probably already know, AI investment has been primarily directed to the area of large language models. Large Language Models, also known as LLMs, provide the means of having AI “understand” natural language. This “understanding” results from the LLM being trained on very large datasets of text where the LLM learns how to respond to natural language given to it. Please note that the LLM doesn’t “understand” natural language like humans, but rather “predicts” the best response to natural language given to it based off the LLM’s training.

LLMs present an interesting future to the way we interact with and create software. Instead of having to conform to a user interface or an established way of interacting with software, why not prompt an LLM with your request and have the LLM interact with the software for you? Instead of having to manually comb through data sources for some search you’re performing, why not have the LLM perform the search for you and present the results in whatever manner you choose? Instead of having to invest in an entire computer science or software engineering education, why not have the LLM write the code for you?

While the previous sentence of having LLMs write code for you may seem hyperbolic, it’s exactly where big tech companies like Nvidia see the future of LLM and AI development in general. QFunction is not a fan of the “abandon computer science and software engineering” idea, but it’s understandable as to why the idea exists in the first place. There’s a reason why millions of dollars are being put into LLM development, and it’s because these kinds of game changing developments are well within the realm of possibility. For example, the field of agentic AI is gaining steam, which consists of the field of autonomous agents performing actions on the behalf of humans without the need for human intervention. Can you guess the foundations on which agentic AI is based? That’s right: large language models.

Function Calling via LLMs

People have different ideas about what LLMs can perform. The majority of people have their exposure to LLMs dictated by ChatGPT and whatever the news puts out as the latest developments in LLMs. However, what a lot of people don’t fully comprehend is what can separate a general LLM from a more specialized LLM (think agentic AI) that can act upon whatever determinations it makes. This separation is dictated by the use of function calls.

Function calls, from a computer programming point of view, occur when a program branches off from its main execution in order to run some previously established lines of code. These lines of code, also known as a function, can take input and can produce output as dictated by how the function was coded. From an LLM point of view, function calls are what allow LLMs to interact with the outside world. If you want an LLM to tell you today’s weather in Tokyo, the LLM can make a function call to an online weather service and provide the name of the place to the online weather service that you want to get the weather for (in this case, Tokyo). The service will provide today’s weather in Tokyo to the LLM, and the LLM will present it back to you. The LLM did not know the weather in Tokyo, so it branched off to run some previously established lines of code, where the lines of code are the online weather service. This may not seem like a huge deal, but the fact that the LLM had to perform the following steps in order to get Tokyo’s weather:

Accept the natural language prompt of “What is today’s weather in Tokyo?”
Parse the prompt in order to understand the service that needs to be called (the online weather service)
Supply the name of the city (Tokyo) in the correct format to the online weather service
Submit the results back to the user

The LLM can having the ability to perform all of these steps is a significant achievement and cannot be understated. While beyond the scope of this post, something to keep in mind is that these developments in LLMs will most likely act as the foundations of AI agents performing autonomous tasks, which will be quite the spectacle.

NexusRaven LLM

With a better understanding of LLMs and function calls, we can now move to the LLM responsible for enabling this more advanced capability: the NexusRaven LLM. Contrary to popular belief, not all large language models need to act like ChatGPT. In fact, there are special purpose LLMs geared towards certain purposes, such as AI agents and function calling. The NexusRaven LLM is a specialized function calling LLM that can translate natural language into the proper function calls needed by an application. The maintainers of the NexusRaven LLM at NexusFlow.ai describe the goal of the NexusRaven LLM as one that advances “open source models for copilots and agents”. More information about NexusRaven can be found on the NexusFlow blog here. We will be using it to demonstrate how to add LLM capabilities to everyone’s favorite cybersecurity tool: the SIEM.

How to Add an LLM to your SIEM

In order to integrate the LLM into your SIEM, you’ll need the following ingredients:

A functional SIEM with API capabilities. All modern SIEMs (Splunk, SumoLogic, etc.) should already have this, so as long as you have a modern SIEM, this should work. This demonstration will use the Splunk API.
An account on HuggingFace where you can deploy models on dedicated inference points. Specifically, you will deploy the <NexusRaven model> on an Nvidia A100 endpoint. While you may have the ability to run the NexusRaven model locally, running the model on most systems without GPUs is too slow, making it easier to run the model elsewhere.
Money. As of the date of this post, it costs $4 an hour to run the Nvidia A100 with 1 GPU continuously. QFunction recommends that you configure the model to scale-to-zero after 1 hour with no activity.
A Jupyter Notebook in order to interface with your SIEM and the NexusRaven model. This is the easy part.

First, you’ll need to run the NexusRaven model. You can go to the NexusRaven model page on HuggingFace and click on Deploy -> Inference Endpoints (dedicated). You will need to choose Amazon Web Services for the Instance Configuration as well as the Nvidia A100 with 1 GPU. Be sure to set the Automatic Scale-to-Zero in order to save yourself from accidentally running up your HuggingFace bill. After setting everything up, click Create Endpoint in order to start the instance hosting the model:

*Instantiating the NexusRaven Model Using HuggingFace*

With the instance running, you can now create the function call that will hit your SIEM’s API in order to search data within your SIEM. Below is the implementation of the function named splunk_search_windows_system that will do exactly this within Splunk:

*Creating a function within Python that launches a search within Splunk via the API*

This function collects the logs from a specific Windows system over a certain timeframe. It takes three arguments: the name of the Windows system, the earliest time to retrieve the logs, and the latest time to retrieve the logs. Remember that the NexusRaven LLM will be responsible for taking natural language (containing the arguments for the function) and translating it into the proper function call which we will see in a bit. In order for the NexusRaven LLM to do this, it needs to know information about the function we created. We will give it information about the function (also known as the function signature) as well as information about the arguments and ultimately what the function does:

*Describing the splunk_search_windows_system function*

Notice how the function is described in plain English. NexusRaven will take this description as well as the query and form the proper function call out of it.

Now we need to establish how to interact with the NexusRaven LLM. Fortunately, a couple of lines of code are all that is needed to perform this. Be sure to replace the API_URL and the Authorization header with the URL to your own NexusRaven LLM and authorization token from HuggingFace, respectively:

Defining how to interact with the NexusRaven model on HuggingFace

Finally, we have everything we need to demonstrate how to use natural language to talk to the Splunk SIEM. We will ask the NexusRaven LLM to give us the logs for a specific Windows system (WIN-HDJETKALNBL) between two dates (February 3rd-4th):

*Using natural language for the NexusRaven LLM to translate to a function call*

Notice how the output of the final statement contains the function call needed in order to query Splunk. Not only did NexusRaven feed the arguments to the function and format it correctly, but it even formatted the dates in the way needed by the function. All that’s needed is to execute the resulting function call:

*Running the resulting function call in order to execute the Splunk search*

As you can see, we have successfully spoke to an LLM in order to interact with a SIEM.

Keep in mind that there are drawbacks using this approach. As it’s currently coded, you will need to establish a function call for every use case that you may have with the NexusRaven LLM. That being said, there are ways that you can generalize the function call to make it more flexible as needed, especially if you have specific use cases in your organization. This can be useful for organizations that want non-cybersecurity personnel to interact with a SIEM without requiring them to learn the SIEM’s search language. It also opens up new avenues for SOAR that were not possible before. Regardless, the limit to this functionality is your own creativity with programming.

Conclusion

It’s inevitable that LLMs will play a larger role in cybersecurity going forward. This was just a basic example of the merging of AI and cybersecurity in the form of LLMs, and there will no doubt be work in this field for the foreseeable future. As always, the code for this can be found on the QFunction GitHub here. If you’re interested in enhancing your SIEM with this kind of AI, contact QFunction to see how we can help you explore AI in your cybersecurity practice. If you’re interested in getting consultation for your SIEM, check out our QFunction SIEM Services! And if you’re a small business looking to implement AI in your cybersecurity practice, check out this post on how AI works within small business cybersecurity!