Measuring ROI of AI agents in security operations

Disclaimer: Opinions expressed are solely my own and do not reflect the views or opinions of my employer or any other affiliated entities. Any sponsored content featured on this blog is independent and does not imply endorsement by, nor relationship with, my employer or affiliated organisations.

If you've been around this blog for a while, you know we love cutting through the noise. Last time we geeked out over the shift from rule-based playbooks to adaptive AI agents. Today we’re diving into something even messier: how do we measure ROI and KPI when it comes to AI tools within Security Operations?

Spoiler alert: It’s not just about how many alerts they auto-close.

Yeah, I get it, everyone wants clean dashboards, KPIs, some might still want that pew pew cyber map (yes, they are still a thing: https://threatmap.checkpoint.com/).

But here’s the uncomfortable truth: closing alerts doesn’t mean your SOC is getting smarter. It just means you’re sweeping faster. And with a whole AI and Agentic staff, it's no longer just about how fast you process alerts. Yes, you still want to show that you now close alerts in 15 minutes rather than 1 hour, but in the end, I don't care if the Autonomous (AI) SOC solution closes the alert in 10 seconds or 60 seconds. Now we are in an era of trust; I'm interested in how accurate the analysis was, why it involved a human, why it closed a false positive, and whether it fed back to detection engineering for improvement.

What really matters is whether your AI agents are helping you improve over time (please don't measure how they replace you over time, that won't work and we are not there yet; remember when SOAR promised it would automate and replace analysts? Ten years later, we still didn't win that metric).

So, because I like to invent terms (because in cybersecurity we will never stop doing that, and I realized I'm not the one who will change it), I’ve mapped out how to measure if your AI tools are actually pulling their weight across the entire security incident lifecycle , from preparation all the way through to learning and adapting. I’m calling this framework the PICERL Index for Autonomous (AI) SOC. PICERL (that’s P-I-C-E-R-L) stands for Preparation, Identification, Containment, Eradication, Recovery, and the critical Learning phase. The whole idea behind this Index is to break down when to use which metric, so you can see if your AI SOC is truly getting smarter, not just faster.

This edition is sponsored by Prophet Security

How’s AI Really Being Used in the SOC?

We’re running a quick 8-10 minute survey to find out how teams are actually using AI in the SOC. The first 100 qualified respondents get a $50 Amazon gift card, and everyone gets early access to the full report.

Preparation Phase : Where Engineering Sets the Stage

Alright, let's be brutally honest: start here or don’t bother starting at all. If your AI doesn’t have decent visibility, good context, or sharp detection logic to chew on, it’s basically just guessing with a very expensive, straight face. You’re setting it up to fail.

Log Source Coverage - Are you even logging the right stuff? You can’t investigate what you can’t see, and neither can your AI. This isn't about vanity metrics; it's about answering the hard questions before you blame the AI for missing something:

What critical parts of your environment are actually monitored? And what’s still in the dark ages?
What telemetry genuinely supports threat detection versus what’s just noise or good for a leisurely investigation after the fact
Where are the glaring data gaps that are basically inviting attackers to waltz through your threat coverage blind spots? If your AI is working with data scraps, what miracle are you actually expecting? Garbage in, AI-powered garbage out.

Time-to-value - Since this is a blog post on the return on AI investment, time-to-value should not be ignored, and is often one of the first metrics that will become self-evident. It can also set the stage for all other metrics to either falter or shine.

Speed of initial deployment: The clock starts the moment you pay for an AI tool (or sign the initial POV agreement), and every delay in deployment is a negative for the return on your investment. And these delays are the canary in the coal mine for the AI’s future, often speaking to an architectural challenge.
Speed of additional integrations: Deployment and integration challenges have been the proverbial bane of many SOC automation solutions. AI tools need access to organizational data (SIEM, cloud, datalakes, data stores, case mgmt, workflow tools, etc.) to be effective. Building and maintaining these integrations will be a key factor in determining the return on the AI investment.

Detection Engineering Metrics - Yeah, these are the old-school classics, but they’re still terrifyingly relevant. If your human-built detections suck, your AI is just going to automate that suckage at machine speed.

True/False Positive Rates: Still the undisputed king. How many actual incidents are your detections catching (True Positives) versus how much digital chaff and time-wasting noise are they spewing out (False Positives)? If your AI is trying to drink from a firehose of false positives from your existing detection layer, guess what its "intelligent" decisions will be based on? More noise.
MITRE ATT&CK Coverage: This isn't about collecting ATT&CK techniques like Pokémon cards for your next "cyber bingo" presentation. It's about ruthlessly assessing if your detections give you real-world visibility into how attackers actually operate. Are you covered against common (and not-so-common) TTPs, or are you just hoping for the best?
Detection Effectiveness Over Time: Are your detections getting sharper, stagnating, or (God forbid) getting dumber as your environment and the threat landscape morph? If this is degrading, especially when you're feeding these detections into an AI, your AI is essentially learning from a C-student. Not a recipe for genius.

SOP Efficiency - Your Standard Operating Procedures, those lovingly crafted (or ancient and dusty) playbooks. Are they actually helping your human analysts, or are they a bureaucratic nightmare they secretly ignore?

How many steps are still painfully manual versus smoothly automated?
More importantly, are analysts consistently skipping steps or going off-script because the SOP is out of touch with reality? If your human team finds your playbooks useless, what divine intervention makes you think an AI will magically make them effective? This is about asking: are these playbooks even fit for human consumption, let alone as a reliable script for your AI?

Automation/Orchestration Usage - This is where the AI rubber starts to meet the SOC road, often via SOAR or some preliminary AI tooling. Sure, measure how many alerts your automation touches and the theoretical time saved – that’s the shiny dashboard stuff.

But the real gold is in the "oops" metrics: how often does that fancy automation fall flat on its face, requiring a human to rush in and clean up the digital mess? High failure or rollback rates here are your early warning sign that the AI (or the processes it's trying to follow) isn't ready for prime time.

Identification Phase

This is where the true grit of your AI P.I.C.E.R Index starts to show. Can your AI actually separate the signal from the overwhelming noise?

MTTD/MTTA (Mean Time to Detect / Mean Time to Acknowledge) — Yeah, these old dogs still hunt, and they're foundational. How fast do you spot something potentially malicious (Detect)? And how quickly does someone – or something – officially notice it (Acknowledge)?

But here's the twist: AI is blurring these lines. "Acknowledge" used to be a human analyst begrudgingly clicking a button. Now, your AI might be the one giving the nod, or it might detect, acknowledge, and even decide on next steps in one seamless, sub-second blur. Are you measuring this new reality, or are you stuck in the last decade?

Mean Time to Triage (MTTT) — Okay, I might be evangelizing this one a bit, but hear me out, because it's crucial for AI. This measures how lightning-fast (or embarrassingly slow) your AI moves from the moment an alert lands in its lap (ingestion) to making that critical first-pass decision: "Is this junk? Is this a five-alarm fire? Or does this need a closer, human look?"

If you’re only tracking MTTA, you’re missing the AI's internal "thinking" and sorting time. If your AI takes an hour to triage an alert a human could nail in 5 minutes (or vice-versa!), that tells you a hell of a lot about its efficiency, doesn't it?

Auto-Closed Alerts — Ah, the siren song of green dashboards! Everyone loves to see alerts automatically closed because it looks like efficiency. But let's get real: you need both volume and precision here.

A high auto-close rate is fantastic... unless your analysts are constantly diving back in, muttering "Nope, AI, you totally blew that one," and reopening those same alerts. That high-reversal scenario isn't progress; it’s just digital whack-a-mole, creating more work and eroding trust. Track this auto-close-to-reversal ratio like your job depends on it – because it might.

Escalation Rate — A truly smart AI isn't a know-it-all; it knows when to raise its hand and ask a human for help. But this metric cuts both ways.

How often is the AI right to escalate something that genuinely needs human eyeballs?
And how often is it just getting spooked by shadows and crying wolf, flooding your team with non-issues? If your AI is the digital boy who cried wolf, your analysts will tune it out. If it never cries wolf, are you sure it's not silently missing the whole damn pack?

AI Decision Accuracy — This is where the trust is forged – or shattered into a million cynical pieces. Can it actually tell the good from the bad, the threat from the trivial? You have to break it down:

True Positive Accuracy: When there's a real baddie, can the AI reliably sniff it out and call it correctly? This is table stakes.
False Positive Accuracy: Is your AI smart enough to dismiss the endless torrent of benign alerts and operational BS without breaking a sweat? This is where a lot of AI falls down.
Roll these (and maybe other factors) into an overall AI Confidence Score. Is your AI getting more sure-footed and reliable over time, or is it still tripping over its own digital shoelaces with every other alert?

Feedback Loop Metrics — Think of this as the AI's ongoing "schooling" by your seasoned human experts. How often are your analysts giving the AI an 'attaboy' for a good call, versus a 'nope, try again, kiddo,' or adding crucial context the AI missed?

More importantly, is this valuable feedback actually being used to make the AI smarter? Is it being systematically fed back into your detection rules, your playbooks, or even into retraining the underlying model? If not, you're just shouting helpful advice at a very expensive, unlistening wall.

Explainability Time — If your AI makes a call, how long does it take your (very expensive) human analyst to decipher why the AI did what it did? This isn't just about transparency; it's about operational speed.

If the AI's "show your work" is faster, clearer, and more trustworthy than an analyst digging through raw logs from scratch, you’ve got a genuine productivity win. If its decisions come out of a black box like cryptic decrees from an oracle, good luck building the trust needed for real automation.

Containment & Eradication

Alright, let's be clear before we get too excited about AI taking over in the Containment & Eradication phase. While fully autonomous response is the ultimate goal for some, and AI can be incredibly fast at identifying what needs to be contained or eradicated, I strongly advise caution before letting AI autonomously execute widespread or irreversible actions. Direct human oversight or extremely well-vetted, trusted automation playbooks should still be involved for high-impact changes.

AI's immediate and immense strength in this phase is accurately identifying and preparing the right response actions with precision, pinpointing exactly what to block, what to isolate, or what process to terminate. It should then primarily trigger your robust, pre-approved automation (the kind with built-in safety mechanisms and comprehensive logging) or alert your skilled human responders to perform the final execution, particularly for critical system changes. The AI provides high-quality intelligence and the recommended action; the execution still needs a reliable, controlled mechanism.

With that critical point about controlled execution made, let's look at metrics that measure how AI contributes to making this process fast and accurate:

MTTI/MTTR (Mean Time to Isolate/Mean Time to Remediate) - How fast can your AI identify and help initiate the lockdown (Isolate) or the cleanup (Remediate)? Even if it's handing off to another system or a human for the final decision on major actions, AI's speed in reaching that 'go/no-go' point and preparing the necessary steps is what we're heavily measuring here. If your AI is permitted to take direct (but carefully scoped and validated!) actions, this metric then also captures its raw execution speed. The objective is to drastically shrink the window of compromise.

Containment Accuracy - This is absolutely crucial, whether AI is merely suggesting the containment action or has the authority to initiate the command itself. Did it target the right infected host for isolation, or did it misidentify based on superficial data and attempt to affect critical infrastructure unnecessarily? Did it precisely identify the malicious process, or did it interfere with a legitimate critical business application? Precision here is what prevents your security response from turning a nasty incident into a self-inflicted business catastrophe. "Oops" doesn't cut it when you're talking about containment.

Recovery

This is the "getting back to normal" phase, the part everyone wants to rush through but is critical for resilience.

Mean Time to Recover — How long does it take to get affected systems fully cleaned up, restored, and back to business-as-usual operation? Your business definitely notices this one, even if your security team is already chasing the next fire.

AI might not be racking new physical servers for you just yet (though, give it a few years!), but it damn well better be helping to document the entire chaotic episode: what happened, what actions were taken (by humans and AI), what the impact was, and what was learned. This feeds directly into making the next recovery faster and less painful.

Learning Phase - Continuous Improvement & Trust

Alright, let's dig into some more advanced concepts. Pulling from some interesting research (and a healthy dose of common sense, frankly), if you're serious about an AI SOC that doesn't just stagnate, these advanced ideas deserve a prime spot in your PICERL Index strategy:

Model Drift Tracking — Is your AI's expensive brainpower actually getting sharper over time, or is it slowly degrading into digital mush as the real-world environment, your business, and attacker techniques change while the model stays static? A model trained six months or a year ago might be dangerously clueless against today's evolving threats. If you're not tracking this, your once-genius AI might now be a well-meaning but ultimately ineffective digital paperweight.

Escalation-to-Accuracy Ratio — This metric tells you how much your human team can actually trust what the AI decides to flag for them. Of the alerts the AI does escalate, what percentage are actually critical, correctly identified, and warranting human intervention? You want few escalations, but those few should be high-fidelity, spot-on alerts. High accuracy on a small number of escalations? That’s the AI sweet spot. That’s an AI your team will listen to, not one they'll eventually learn to ignore.

Explainability Score — Going beyond just "Explainability Time," how good are the AI's explanations for its decisions? Are they consistently clear, accurate, and genuinely useful to your analysts? This might be more qualitative, maybe even a rubric-based score your analysts contribute to. Do the explanations help them understand the 'why' and learn? Bonus points if your analysts start saying, "Yeah, I can see why the AI flagged that; it makes sense." That's when you know you're building real synergy and trust, not just throwing tech at a problem.

Adversarial Robustness — Let's be blunt: the bad guys aren't stupid, and they aren't going to play nice with your shiny new AI. They will actively try to fool it, evade it, or even poison its data inputs. How well does your AI hold up when poked, prodded, or fed deliberately deceptive inputs designed to make it misclassify threats? Frankly, this is prime territory for your red team to have some fun and earn their keep. If your AI crumbles or goes haywire at the first sign of intelligent opposition, it’s not much of a cyber defender, is it.

AI P.I.C.E.R Index Metrics

I’ve created a dedicated page where you can get an overview of all the metrics.
Check it out: https://reports.cybersec-automation.com/picerl-index-ai-soc

Resources & Inspiration

Some of the thinking in this blog was shaped by excellent resources across the industry. If you want to go deeper into the metrics, philosophy, and practical guidance for evaluating AI in security ops, here are some must-reads:

Final Thoughts: Build a System That Thinks and Learns

Look, the PICERL Index I've laid out here isn't some magic formula I'm trying to sell you for your next board presentation. Forget generating more charts just to prove your AI isn't expensive shelfware. My core point is this: as an industry, we have to get serious about measuring the things that genuinely tell us if these AI systems are actually pulling their weight, truly learning, and justifying their often-hefty price tags in a real-world SOC.

And let's be brutally honest for a moment – a truth many vendors conveniently gloss over when they’re pushing their latest 'AI-powered miracle' – real value from AI in security isn't found in how many alerts it can auto-close per minute. That’s just sweeping the digital floor faster. Frankly, I don't give a damn about that kind of speed if the SOC itself isn't getting demonstrably smarter or more effective as a result.

What I care about, and what I believe you should be laser-focused on, is whether these sophisticated (and let's face it, complex) AI tools actually help us cut through the oppressive noise and make sense of the chaos. Can we genuinely trust the decisions they make? Are they actually freeing up our (already overworked and often burnt-out) human analysts to do the proactive hunting and strategic thinking that requires human ingenuity, not just brute-force processing power? If your AI isn't actively contributing to a SOC that learns, adapts, and demonstrably improves its defenses, ideally week over week, then it risks becoming just another expensive cog in an already overburdened machine, making us feel more efficient while we're still fundamentally struggling against the tide.

So, here’s my challenge, or perhaps just my strong opinion: let’s collectively ditch the 'dashboard bingo' and stop chasing vanity metrics that make us look busy but don't prove we're any better. It's high time we put our AI investments under a harsher, more critical lens, focusing squarely on whether they're fostering security operations that can actually evolve.

Building a genuinely smarter SOC isn't about blindly throwing money at the next shiny AI tool that promises to solve all our problems. It’s about fostering a culture of critical thinking, asking uncomfortable questions, and rigorously measuring genuine progress. And that, unfortunately, is a hell of a lot harder but infinitely more valuable than just counting how many alerts your AI auto-closed by lunchtime.

Still haven’t taken the survey?

Prophet Security is running a short, 8-10 minute survey to better understand how security teams are using AI in the SOC today. The first 100 qualified respondents get a $50 Amazon gift card and early access to the anonymized results.

Take the survey today

Now a closer look at the company behind the survey

Vendor Spotlight: Prophet Security

Prophet Security is redefining what it means to bring AI into the SOC with purpose and precision. At the center is Prophet AI, an agentic AI SOC Analyst that comes pretrained out of the box and ready to plug into your environment. No months-long onboarding. No brittle logic trees.

How Prophet AI works

Unlike traditional automation platforms that rely on playbooks or manual tuning, Prophet AI works autonomously from the moment an alert is triggered. It mirrors the investigative process of a seasoned analyst, asking the right questions, pulling relevant context, and delivering full investigations from day one.

Plans: Prophet AI builds a dynamic investigation plan for every alert, identifying the key questions needed to determine if it's a true threat or benign.

Investigates: It pulls and correlates evidence from SIEM, EDR, IAM, cloud, and more, so your team doesn’t have to. Analysts can dive deeper and ask additional questions, no pivoting or copy-pasting required.

Responds: Each alert is closed with a clear verdict, context, and next steps. True positives get remediation guidance. False positives yield insights detection engineers can use.

Adapts: Prophet AI learns from your team’s feedback, improving its performance and adapting to your org’s unique threat landscape.

Transparency by design

Every conclusion Prophet AI reaches is traceable and explainable. You can see what it did, why it did it, and how it got there. It’s a trust-building exercise.

Fast time to value

Want to see it in action? Most teams spin up a POV in under 30 minutes to evaluate Prophet AI’s real-time performance on their own alerts.

If you're ready to transform your SOC with AI that delivers real, measurable impact, Prophet Security provides a clear, fast, and effective way forward. Request a demo today to see it in action.

🏷️ Blog Sponsorship

Want to sponsor a future edition of the Cybersecurity Automation Blog? Reach out to start the conversation. 🤝

Sponsorship details

🗓️ Request a Services Call

If you want to get on a call and have a discussion about security automation, you can book some time here

Book a call

Join as a top supporter of our blog to get special access to the latest content and help keep our community going.

As an added benefit, each Ultimate Supporter will receive a link to the editable versions of the visuals used in our blog posts. This exclusive access allows you to customize and utilize these resources for your own projects and presentations.

Upgrade

Newsletter Recommendations