Agentic Threat Hunting

Disclaimer: Opinions expressed are solely my own and do not reflect the views or opinions of my employer or any other affiliated entities. Any sponsored content featured on this blog is independent and does not imply endorsement by, nor relationship with, my employer or affiliated organisations.

If you have been watching the AI SOC space, you already know what comes right after it. Agentic threat hunting. AI threat hunting. Pick your favorite name, every vendor has one now. It is the second wave, and it is getting loud.

So before we judge it, let me back up and talk about what threat hunting actually is, why people love it, and why automating it is trickier than the demos make it look.

A bit of history

Threat hunting got popular around 2010, back when everyone was chasing APTs. It was the buzzword of the moment, and honestly it was one of the cooler things you could put on your plate as a practitioner. You were not just clearing a queue. You were going after the stuff that slipped past the alerts.

To be a threat hunter you needed two things. First, deep knowledge of your environment. Second, you had to know your SIEM and your query language cold. That second part is where most of the real work lives.

Product Updates !

Main Product Launch

Most security teams cannot tell you which active campaigns their stack would catch today. Not because they lack data. Because converting threat intelligence into validated detections, continuously and at scale, has never been automated. Until now.
Mars connects directly to your SIEM, EDR, identity, and cloud via API. No data ingestion. No tool replacement. It maps active attacker TTPs to your environment and generates production-ready hunts and detections automatically. Your team goes from 5 hunts a month to 50. Built by people who designed attacks like the ones targeting your stack right now.

Introducing Vibe Hunting

Exaforce takes both the hypothesis grind and the query grind off your plate. Threat news articles are ingested automatically as they surface, with IOCs extracted and matched against your environment before any analyst opens a browser tab. And when a hunt does need human intuition, vibe hunting lets analysts express hypotheses in plain language while Exabots handle the querying, correlation, and enrichment.
By the time your CISO asks "are we impacted?", Exaforce has already answered the question

The two kinds of hunts

There are two flavors.

Reactive hunts start from an incident. You have a TTP or an indicator, and you go look for the same activity across the rest of the environment. Did this land anywhere else? How far did it spread?

Proactive hunts start from a hypothesis. Usually that hypothesis comes from threat intel. You read about a technique or an actor, you ask "could this work against us," and then you go check. From there you take it wherever the data leads.

The hardest part in both cases is the same: writing the queries. You need to understand your SIEM, every log source feeding it, and how the data is structured. Then you need to be good at reading and parsing what comes back. That is the skill that takes years to build.

What you get out of it

Outcomes depend on the type of hunt.

Reactive hunts usually give you blast radius. You find out where else the activity showed up, and you spot gaps in your detections or your coverage along the way.

Proactive hunts go one of two ways. Either you find something and it becomes an incident, or you find nothing actionable and the work feeds into detection engineering so they can build new detections. This is why threat hunting and detection engineering are joined at the hip. A hunt that does not feed detections is half a hunt.

So, agentic threat hunting. Is it any good?

Here is the part you came for.

Agentic threat hunting uses LLMs and AI agents to run the hunt for you. And yes, a lot of the hard parts can be automated now. For real, not just on a slide.

An agent can:

Process threat intel and build a threat profile for your org
Turn that into hypotheses worth checking
Write the SIEM (or EDR/NDR/XDR) queries
Run the hunt and hand you the results
Suggest coverage gaps and new detection rules off the back of it

For reactive hunts it works the same way. Hand it an indicator and it runs the queries and does the deeper analysis for you.

So the tech is real and it is useful. But the demo is the easy 20 percent. The hard 80 percent is everything that makes a hunt actually work in your environment. Here is what I would push any vendor on.

Question 1: Does it actually know your org?

A hunt is only as good as the context behind it. The agent needs to know your infrastructure, your security stack, your crown jewels, and how you operate. Without that, the hypotheses are generic and the queries point at the wrong things. A hunting agent that does not know your environment is just running someone else's playbook against your logs.

This lines up with a point Anton Chuvakin made at last year's Gartner Security and Risk Management Summit. He put hunting against top-tier nation-state adversaries on the short list of things he does not expect AI to handle any time soon. The easy hunts automate well. The hard ones still need a human who knows the terrain.

Question 2: Is it tuned for your SIEM and your tooling?

This one is non-negotiable. The agent has to be trained and tailored for whatever you run, SIEM, EDR, NDR, XDR. It needs to write correct queries in your query language, not generic pseudo-SQL it picked up off the internet.

From what I have used and built, the thing that separates a good hunting agent from a bad one is simple. Before it writes anything, it should run two quick checks. One, pull the latest log sources. Two, pull the latest schemas. That way it is not guessing at field names, not querying a source you turned off last month, and not burning steps on failed queries.

This is a known failure mode, not me being paranoid. LLMs invent table and field names that sound right but do not exist in your data. Research on enterprise query agents calls this schema hallucination and lists it as one of the main error classes. If the agent does not ground itself in your current schema first, it will hand you a perfectly formatted query against columns you do not have.

Question 3: Can it handle the output without drowning?

Anyone who has written a bad SIEM query knows the pain. You burn your license, you slow the SIEM to a crawl, or you get back a million rows you cannot do anything with.

Agents have all of that plus one more problem: the context window. A bad query that returns a wall of raw logs fills that window fast, and then the agent loses the plot. So the queries cannot just be correct, they have to be efficient. Aggregate before you pull raw events. Select only the fields that matter. Build the query so the SIEM does the stats work and hands back something the agent can actually reason over.

This is not theoretical. Practitioners building these agents already follow the same discipline: run aggregations before pulling raw documents, filter to only the fields you need, use tested query templates instead of building from scratch, and cap how many queries the agent can fire per cycle. The pattern is the same every time. Keep it lean, or the agent chokes.

Question 4: Does it know what normal looks like for you?

Oran Yitzhak pushed me on this one, and he is right. It is not just your stack. It is your behavior. A hypothesis needs a baseline, and the baseline is what is normal in your environment. Without it, the agent produces threats that are technically plausible but operationally meaningless. That is not a hunt. That is a false positive factory with a nicer UI.

This is the part that separates "knows your org" from "knows your tooling." You can wire an agent into the right SIEM with the right schemas and it will still waste your time if it has no sense of what your users, services, and machines do on a normal Tuesday.

The cascading error problem

One more thing worth saying. In a chain like this, mistakes compound. A weak hypothesis leads to a bad query, which gives garbage output, which the agent then reasons over and confidently gets wrong. Each step looks fine on its own. The end result is nonsense delivered with a straight face.

And the confident part is what keeps me up at night. A weak hypothesis does not just produce a bad query. It produces a confident wrong answer. In DFIR a confident wrong answer is worse than no answer at all. It moves people in the wrong direction with conviction, and nobody pushes back, because the AI said so.

One more way to think about it. Agents make good hunters faster. They also make mediocre hunters loudly mediocre. That is useful signal if leadership is watching for it. Most leadership is not watching that metric yet.

That is why the human stays in the loop. Not to write every query, but to sanity-check the hypothesis going in and the verdict coming out. The agent should also show its work: which sources it hit, which queries it ran, what it found and what it did not. "No results" is itself a finding, and a good hunting agent should tell you whether it found nothing or simply could not look.

So where does that leave us?

Agentic threat hunting is real, and it is useful for the repetitive, query-heavy grind. It can take the parts that used to eat your afternoon and give them back. But it is not a hunter in a box. It needs your context, your schemas, tight queries, and a human checking the start and the end of every hunt.

Get those right and it earns its keep. Skip them and you have an expensive agent writing beautiful queries against logs that do not exist.

Same rule as always. It does not replace the hunter. It makes a good hunter faster, and it makes a bad setup fail quicker.