Splunk data models are a security professional’s best friend in terms of alerting, investigation, and audit. Splunk ES has an entire suite of baked-in correlation searches, but I want to talk about models a bit.
If you don’t know what Splunk is, hey, stop and go check out their free demo. I’ve never made a dime from Splunk as of this writing, but I like their software and wish Kibana were as functional for security purposes.
Splunk Data Models: Definition
It’s important to start with what a data model is (for our purposes), at its root: a saved correlation search, and we’re particularly interested in root events, as root events can be accelerated. Acceleration lets an analyst cache vast quantities of data for searching very quickly compared to normal Splunk searching.
Say you have a new security device FooAppliance. You might create a new data model named FooAppliance with a Root Event named AllAlerts under it defined as:
host=foo-lan-host.local sourcetype=foo-type source=foo-log
Save the model, and access this search from Splunk normally as:
| from datamodel:FooAppliance.AllAlerts | search
I break search commands out per line so that I can add more terms and adjust during an investigation. This collation of large number of search terms and sources into a single model has value in terms of simplifying access to data for people, but the real value here is the ability to extract data under the model.
Splunk Data Models: Child objects
You can create Child objects to the root event AllAlerts, such as a child search that has an added constraint:
action=Blocked
Child searches are useful for data extraction fields that aren’t always present, but exist under a certain set of circumstances.
Splunk Data Models: Data Extraction
Splunk uses an auto-extraction methodology that works well for common data formats, but can miss things.
I find a solid understanding of RegEx is critical to building useful extraction from sets. Testing is required with RegEx, something you can do in a normal search window with the rex command. Splunk’s regex is PCRE with a few quirks around the field naming, if you’re wondering. Learn to love the non-greedy .*? evaluation!
I strongly recommend enabling the automatically extracted date_mdate, date_month, date_.. fields by default for every model. They can make testing and investigation nicely uniform across all models.
After you’ve set up the field extractions, we can move into where Splunk shines: data acceleration. Enable data acceleration in the data model window, which will generate your extracted fields across the selected time frame. An accelerated data set is a collection of extracted data, which necessarily uses more storage.
Splunk Data Models: tstats
Use tstats to search through accelerated model metadata, as it is very fast. Here’s an example search of our FooAppliance data model using tstats, our child node Blocked, and a other extracted fields:
| tstats `summariesonly` values(FooAppliance.src_ip) as src_ip values(FooAppliance.dst_ip) as dst_ip values(FooAppliance.alert_name) as alert_name values(FooAppliance.alert_type) as alert_type count from datamodel=FooAppliance.AllAlerts where nodename=AllAlerts.Blocked by AllAlerts.hostname, AllAlerts.date_mday, AllAlerts.date_hour | `drop_dm_object_name("AllAlerts")`
This should give you a summary of these extracted data fields, over any accelerated length of time you want. This search groups by hostname, day of month, and hour, but obviously you can set up any criteria you like. Warning: if you group by an empty field in your model, it won’t show up, which can be a problem. Search and group by populated fields.
Splunk Data Models: Extend!
You can also subsearch (.. or whitelist, query, and so on) by adding on terms using the where command:
| where match(src_ip,"192.168.1.") | where NOT (match(hostname,"^SecurityScanner$") AND match(date_hour,"^3$")) | where ..
And so on. Once you’re comfortable with your results, you can save your newly made tstats search into a search. Since it’s very fast, you can set it up to run regularly without stressing out your search heads too much.