Disclaimer: Opinions expressed are solely my own and do not reflect the views or opinions of my employer or any other affiliated entities. Any sponsored content featured on this blog is independent and does not imply endorsement by, nor relationship with, my employer or affiliated organisations.



1. Introduction

The AI SOC market is growing fast and there are products on it that are doing serious work. Some of them have strong integration capabilities, solid reasoning engines, and response actions that actually execute in production. The market has come a long way in four years.

But there is a problem with how we evaluate these products.

When every vendor says "AI-powered response," that phrase covers everything from a fully autonomous isolation workflow to a chatbot that suggests you maybe think about resetting a password. Both get the same label in the marketing material. Both show up in the same analyst reports. And when a security team sits down to compare three products, they have no standardized way to measure the gap between "our AI handles response" and what that actually means in operational terms.

Some products are close to real autonomy in specific domains. Some are strong in analysis but thin on execution. Some have broad coverage but almost nothing runs without human approval. These are all valid positions on a maturity spectrum. The problem is that there is no shared framework to place them on that spectrum consistently.

So we built one.

We call it ARMM. And yes, the name is intentional.

A decade ago, the SOAR generation solved half the problem. We built the arms. Playbooks, integrations, automated response workflows. The execution layer was there. What was missing was the brain. Every decision tree was hand-coded. Every branching logic was written by an engineer who had to anticipate every possible scenario. The arms moved, but only along rails that humans laid down manually. When the scenario deviated from the playbook, the arm froze.

Now the AI SOC generation has solved the other half. We built the brain. LLMs reason across alerts, correlate context, analyze logs, and make judgment calls that no static playbook could replicate. But somewhere along the way, a lot of products forgot to attach the arms. The reasoning is strong. The analysis is sharp. And then it hands you a summary and says "here is what you should probably do." The brain thinks. The arm does not move.

ARMM evaluates both. The reasoning quality, the decision-making maturity, the trust you can place in the AI's judgment. And the response capability, the execution depth, the ability to actually take action without three humans supervising. It weighs the arm heavier because that is where the industry gap is widest right now. But it does not ignore the brain, because an arm without a brain is just a SOAR playbook and we already know how that story ended.

ARMM is a structured scoring system for evaluating what an AI SOC solution can actually do in the response layer. It covers 80+ response capabilities across six domains: Identity, Network, Endpoint, Cloud, SaaS, and General Options. And it provides a common language so that when someone says "we handle response," there is a way to ask: at what level, across how many actions, and with what degree of autonomy?

The CyberSec Automation Blog has published over a dozen articles and podcast episodes covering what makes a good automation program succeed, how to evaluate tools, and how to structure decision-making around security automation purchases. We have built tool comparison lists, evaluation checklists, and decision frameworks. ARMM is the next step in that work.

2. Why Another Framework

Most existing evaluation methods for AI SOC solutions are either vendor-produced (and therefore biased toward their own capabilities) or too generic to capture the specific nuances of AI-driven response. Analyst reports compare products at a feature-list level without measuring automation depth. Vendor demos show best-case scenarios without exposing the operational friction underneath.

Our focus is narrow and deliberate: response capabilities. Most AI SOC solutions already deliver strong reporting and analysis features. They can summarize alerts, correlate indicators, and reduce false negatives in a mature environment (we emphasize mature because these solutions need access to quality logs and, in more advanced implementations, to organizational documentation and environment-specific context). Where the industry needs structured evaluation is in the response layer: the actions an AI SOC solution can take, how autonomously it can take them, and under what conditions.

We acknowledge that some of the capabilities listed in this framework may seem aspirational at this stage. That is by design. The framework is intended to serve both as a current-state evaluation tool and as a forward-looking roadmap.

We are not scoring specific vendors. The goal is to establish a shared methodology that allows security teams to answer questions such as:

  • Which solution provides more relevant response capabilities for my environment?

  • Which solution operates at a higher level of autonomy for the actions that matter to my program?

  • Which solution can help me reduce my alert backlog without requiring additional headcount?

For product managers working on AI SOC products, the framework serves as a competitive analysis baseline:

  • Where is my competition positioned, and what capabilities are driving their wins?

  • What high-value capabilities are underserved across the market?

  • Am I investing engineering resources in features that security practitioners actually prioritize?

Because this is a fast-moving space, we are starting at version 0.1. This is a living document. Version 1.0 will be designated when the framework reaches a level of stability and community validation that warrants it.

3. Scoring Methodology

ARMM supports two distinct approaches to scoring, each designed for a different operational question.

Evaluator Mode is the straightforward path. You score each capability on the 0-1-2 scale described above (with the 1C, 1G, 1A sub-levels) and the framework calculates your coverage rate, automation depth, and per-plane breakdown. The tier placements come from ARMM's reference tables. You do not need to factor in your organizational context. This mode answers one question: given two or more AI SOC products, which one covers more of what I need and at what automation level? It is built for procurement teams, SOC managers running vendor evaluations, and anyone who needs a side-by-side comparison without spending weeks on it.

Builder Mode adds a second scoring layer on top. Instead of relying on fixed reference tiers, you score each action across three axes: Trust (how much confidence does your implementation warrant), Complexity (how hard is it for your specific team to build and maintain), and Impact (what is the blast radius if something goes wrong). The action score becomes T + C + I, and the tier placement shifts based on your organizational reality. The same action that scores Entry for a mature team with established automation pipelines might score Explorer for a team that is deploying its first AI SOC integration. This mode answers a different question: given my team, my environment, and my risk tolerance, where should I invest engineering effort to move up the maturity ladder? It is built for product managers, engineering leads, and internal SOC teams running their own automation programs.

Both modes evaluate the same six planes and the same 80+ response capabilities. Both produce per-plane breakdowns and a composite maturity label. The difference is whether you want a product-level comparison (Evaluator) or an environment-aware implementation roadmap (Builder). The public ARMM app at armm.secops-unpacked.ai supports both.

3.1 The Capability Scoring System (0-1-2)

Each response capability in the framework is scored on a three-level scale that measures the degree of automation available:

0 (Not Available): The feature does not exist in the product. There is no mechanism, manual or automated, to perform this action through the AI SOC solution.

1 (Available with Human Involvement): The feature exists but requires some form of human interaction before execution. Because human involvement can range from full collaboration to a simple approval click, this level is subdivided into three sub-categories:

  • 1C (Collaborator): The solution requires continuous back-and-forth interaction with an analyst to reach a response action. The AI acts as a partner, not an autonomous agent.

  • 1G (Guide): The solution generates a plan and presents options for a specific action, but it is not confident in recommending a single path. It lays out alternatives and lets the analyst choose.

  • 1A (Approver): The action is essentially ready to execute. The AI has determined the correct response and prepared the action, but requires a human to click approve before it fires. This is the closest step to full automation while still keeping a human in the loop.

2 (Fully Automated): The action is performed without any human involvement. The vendor (or internal implementation) has demonstrated that the AI SOC solution can execute this action with sufficient confidence that no human review is required. At the time of writing, level 2 is exceptionally rare for most response categories. The framework includes it to establish the target state and to differentiate products that are moving in that direction from those that are not.

3.2 The Three Scoring Axes (Builder Mode)

In Builder Mode, each response action is evaluated across three dimensions:

Axis 1: Decision Fidelity and Programmatic Trust (T)

This axis measures the confidence level warranted by the AI SOC implementation. It correlates directly with implementation quality: reasoning log depth, context-aware decision-making, and guardrails against hallucination.

  • T = 1 (Enrichment): AI output assists human-led investigations. The AI provides context and data but does not recommend or execute actions.

  • T = 2 (Validated): AI recommends a specific action. A human confirms before execution occurs.

  • T = 3 (Autonomous): AI executes without human intervention. This requires the highest level of implementation maturity and organizational trust.

Axis 2: Implementation and Maintenance Complexity (C)

This axis evaluates the technical friction in building and sustaining the automation, relative to the skills and resources of the team responsible for it. This is deliberately team-dependent. An automation rated C = 3 for a junior team may be C = 2 for a team of specialized AI engineers with established CI/CD pipelines for their playbooks.

  • C = 1 (Low): Simple API calls or native integrations with minimal configuration.

  • C = 2 (Medium): Multi-step orchestration across multiple systems requiring coordination and testing.

  • C = 3 (High): Complex behavioral baselining, legacy system integration, or custom model tuning.

Axis 3: Operational Impact and Blast Radius (I)

This axis captures the business risk associated with the action. It is typically the most stable axis across organizations, but shifts based on asset criticality. Isolating a standard employee laptop has a different blast radius than isolating a production database server.

  • I = 1 (Low): Negligible disruption. Background scans, tagging, enrichment activities.

  • I = 2 (Medium): Temporary disruption. Resetting a standard user session, blocking a non-critical port.

  • I = 3 (High): Significant downtime, data loss risk, or reputational damage. Production system changes, VIP account modifications, critical infrastructure alterations.

3.3 The Maturity Computation Logic

The scoring system builds from individual actions up to a full program assessment through five layers. Each layer uses a defined formula.

Layer 1: Action-Level Score (S)

For a single response action, the score is the sum of its three axis values:

The minimum possible score is 3 (T=1, C=1, I=1). The maximum is 9 (T=3, C=3, I=3).

Layer 2: Tier Mapping

The action score maps to one of four maturity tiers:

Score Range

Tier

Description

3.00 to 5.99

Explorer

Foundational; low-risk quick wins with minimal blast radius

6.00 to 6.99

Entry

Stabilized; moderate effort and impact, suitable for early-stage programs

7.00 to 7.99

Advanced

Mature; requires high-fidelity reasoning and established trust

8.00 to 9.00

Expert

Critical; high blast radius, autonomous VIP handling, or production-critical actions

Layer 3: Domain Maturity Score (D)

The maturity score for a specific domain (e.g., Endpoint, Identity) is the arithmetic mean of all action scores within that domain:

Where n is the number of scored actions in the domain. The resulting D value maps to a tier using the same thresholds from Layer 2.

Layer 4: Program Maturity Score (P)

The overall program score is the arithmetic mean of all domain scores, with equal weighting across all six planes:

Equal plane weighting is a deliberate design choice. It prevents planes with more actions (Endpoint has 22, SaaS has 10) from dominating the evaluation. Each plane contributes exactly one-sixth of the overall score.

Layer 5: Composite Maturity Label

The composite label is not derived from the program score directly. It uses sequential gating logic:

The composite label equals the highest tier where at least four out of six planes independently meet that tier's threshold, and the qualification chain is unbroken from Explorer upward. A product cannot be labeled Advanced if it has gaps at the Explorer tier.

The four-out-of-six rule is intentionally forgiving. A product focused on cloud-native environments may legitimately deprioritize network-level response. That should not disqualify it from a meaningful composite label. But it still needs breadth across most planes to earn a higher tier.

3.4 Context-Aware Scoring: Why Environment Matters

The ARMM recognizes that the maturity level of an automated action is not a static property of the feature itself. It is an emergent property of the environment where it is applied. The three axes (T, C, I) are all subject to organizational variance, which means the same product capability produces different scores in different contexts.

Example: "Isolate Device" evaluated by three different organizations using the same AI SOC product:

Context

Trust (T)

Complexity (C)

Impact (I)

Score

Tier

Org A: Mature Program / Expert Team

3

2

3

8

Expert

Org B: New Program / Junior Team

1

3

3

7

Advanced

Org C: High-Risk Assets / Manual-First

2

2

3

7

Advanced

The product capability is identical across all three. The scores differ because the Trust axis reflects implementation maturity, the Complexity axis reflects team capability, and the Impact axis (while stable here) can shift based on asset criticality. A vendor benchmark alone is insufficient. Builder Mode exists specifically to capture this variance.

4. Response Capability Domains

The framework organizes response capabilities into six domains. The first five (Identity, Network, Endpoint, Cloud, SaaS) cover specific technical response planes. The sixth (General Options / Usability) covers platform-level characteristics that affect the operational quality of the solution independent of any specific response action.

For the first five domains, each capability is scored using the 0-1-2 system described in Section 3.1 (Evaluator Mode) or the T+C+I system described in Section 3.3 (Builder Mode). For the General Options domain, the scoring criteria shift slightly: 0 means the feature is not available, 1 means the feature is available but limited in capability or partially implemented, and 2 means the feature is fully available, functional, and tested.

4.1 Identity Response Plane

Identity-related response actions target user accounts, service principals, groups, and access permissions. These actions are among the most commonly needed in incident response and are often the first automation candidates for SOC teams.

Action

Description

Reset Password

Reset a standard user's password

Revoke Sessions

Terminate all active sessions for a user account

Disable User

Disable a standard user account

Disable Service Principals

Disable a service account, service principal, or managed identity

Remove Permissions

Remove a specific set of permissions from an account

Group Adherence

Add or remove an account from a security group

Group Creation

Create a new security group

Token Rotation

Create or rotate secrets and tokens

Delete Sharing Permissions

Remove sharing permissions on resources

Label User (Tagging)

Apply a tag or label to a user account for tracking

Builder Mode Reference Scoring (Mature AI SOC Program, Skilled Engineering Team):

Action

T

C

I

Score

Tier

Group Adherence

3

1

1

5

Explorer

Label User (Tagging)

3

1

1

5

Explorer

Revoke Sessions

3

1

1

5

Explorer

Reset Password (Std)

3

1

2

6

Entry

Disable Standard User

3

1

2

6

Entry

Delete Sharing Permissions

2

2

2

6

Entry

Remove Specific Permissions

2

2

3

7

Advanced

Group Creation

3

2

2

7

Advanced

Disable Service Principals

2

2

3

7

Advanced

Reset VIP Password

3

2

3

8

Expert

Rotate Secrets (Prod)

2

3

3

8

Expert

4.2 Network Response Plane

Network-level response actions modify traffic flow, access control, and device connectivity. These are often high-impact actions with significant blast radius, making the Trust and Impact axes particularly important in scoring.

Action

Description

ACL Creation

Create a new access control list on the network

VLAN Creation

Create a new VLAN on the network

Firewall Rule Creation

Create a new firewall rule

IPS Rule Creation

Create a new IPS rule in deny mode

Network Connection Reset

Reset a network connection

DNS Entry Change

Modify an entry in the DNS records

Routing Table Change

Modify a routing entry

Sinkhole Traffic

Redirect traffic to a sinkhole

Rate Limit Traffic

Limit traffic by a particular indicator

VLAN Modification

Move a device to a restricted VLAN

Quarantine Device

Quarantine a device at the network level

Quarantine Server

Quarantine a server running an enterprise-level service

Modify NAT Rules

Change NAT rules to modify traffic patterns

Builder Mode Reference Scoring:

Action

T

C

I

Score

Tier

Network Connection Reset

3

1

1

5

Explorer

Sinkhole Traffic

3

1

1

5

Explorer

Rate Limit Traffic

3

1

1

5

Explorer

ACL Creation

2

2

2

6

Entry

Quarantine Device

3

1

2

6

Entry

Firewall Rule Creation

2

1

3

6

Entry

DNS Entry Change

2

2

2

6

Entry

Modify NAT Rules

2

2

2

6

Entry

IPS Rule Creation

2

2

3

7

Advanced

VLAN Creation

3

2

2

7

Advanced

VLAN Modification

3

2

3

8

Expert

Routing Table Change

3

3

3

9

Expert

Quarantine Server

2

3

3

8

Expert

4.3 Endpoint Response Plane

Endpoint response actions operate directly on devices and their software environment. This domain has the largest number of capabilities because endpoint response spans file operations, process management, application control, forensics, and OS-level changes.

Action

Description

Isolate Device

Isolate a device from all network connectivity

Initiate Malware Scan

Start a scan on the device

Grab File from Device

Upload a file to a designated container

Submit File to Sandbox

Submit a file for sandbox analysis

Lock Out User

Lock a user out of the device

Remove User from Device

Remove a user account from the device

Delete Files

Delete specific files from the device

Kill Processes

Terminate a running process

Remove Application

Uninstall an application

Remove Browser Extension

Remove a browser extension

Modify Browser Settings

Set, modify, or replace browser security parameters

Remove Scheduled Task

Remove a cron entry or scheduled task

Remove Startup Items

Remove a process, agent, or file from system startup

Remove Library / Package

Remove a library from a development environment

Upgrade Application

Force an automatic update on installed software

Upgrade OS

Force an automatic OS update

Deploy Script

Deploy a script or application needed for remediation

Modify Registry Key

Change a value or create a new registry key

Disable Service

Change the status of or remove a service

Collect Memory Dump

Initiate and retrieve a memory dump forensically

Clear Browser Cache

Remove all files, cookies, and data from the browser cache

Remove Device from Domain

Remove a device from the domain

Builder Mode Reference Scoring:

Action

T

C

I

Score

Tier

Initiate Malware Scan

3

1

1

5

Explorer

Clear Browser Cache

3

1

1

5

Explorer

Grab File from Device

3

1

1

5

Explorer

Collect Memory Dump

2

2

1

5

Explorer

Submit File to Sandbox

3

1

1

5

Explorer

Kill Processes

3

1

2

6

Entry

Block File (via Hash)

3

1

2

6

Entry

Lock Out User

2

2

2

6

Entry

Remove Browser Extension

3

1

2

6

Entry

Remove Scheduled Task

2

2

2

6

Entry

Remove Startup Items

2

2

2

6

Entry

Disable Service

2

2

2

6

Entry

Delete Files

2

2

2

6

Entry

Modify Browser Settings

2

2

2

6

Entry

Remove Application

2

2

3

7

Advanced

Remove User from Device

2

2

3

7

Advanced

Remove Library / Package

2

2

3

7

Advanced

Modify Registry Key

2

3

3

8

Expert

Isolate Device

3

2

3

8

Expert

Remove Device from Domain

2

3

3

8

Expert

Upgrade Application

2

3

3

8

Expert

Upgrade OS

2

3

3

8

Expert

Deploy Script

3

3

3

9

Expert

4.4 Cloud Response Plane

Cloud response actions target infrastructure resources, access controls, and storage in cloud environments. The blast radius of cloud actions can be particularly severe because a single misconfigured change can affect multiple dependent services.

Action

Description

Modify Security Group Rules

Modify firewall rules on a cloud resource to restrict access

Create Security Group

Create a new security group and apply it to restrict traffic

Isolate Resource

Quarantine a cloud resource so it is unreachable

Modify Access Type

Switch a resource from public to private or restrict anonymous access

Remove Permissions to Resource

Remove a service principal or managed identity from accessing a resource

Delete Resource

Delete a resource from the cloud environment

Stop Resource

Stop a resource from execution

Modify KeyVault Entries

Add or modify resources in a KeyVault

Use Breakglass Account

Use a breakglass account in case of emergency

Remove Files from Storage

Remove files from a storage bucket or storage account

Copy Storage Device

Create a copy of a cloud storage resource for forensic investigation

Mount Storage Device

Mount a new storage capability to a VM for forensic investigation

Snapshot VM

Create a snapshot of the current state of a virtual machine

Enable Diagnostic Settings

Alter settings that enable advanced log gathering

Apply Resource Lock

Make the resource immutable or read-only

Builder Mode Reference Scoring:

Action

T

C

I

Score

Tier

Enable Diagnostic Settings

3

1

1

5

Explorer

Apply Resource Lock

3

1

1

5

Explorer

Snapshot VM

3

1

1

5

Explorer

Stop Resource

3

1

2

6

Entry

Modify Security Group Rules

2

2

2

6

Entry

Create Security Group

2

2

2

6

Entry

Remove Permissions to Resource

2

2

2

6

Entry

Copy Storage Device

2

2

2

6

Entry

Mount Storage Device

2

2

2

6

Entry

Modify Access Type

2

2

3

7

Advanced

Isolate Resource

3

2

3

8

Expert

Remove Files from Storage

2

2

3

7

Advanced

Modify KeyVault Entries

2

3

3

8

Expert

Use Breakglass Account

2

3

3

8

Expert

Delete Resource

2

3

3

8

Expert

4.5 SaaS Response Plane

SaaS response actions focus primarily on email and productivity platforms, which are among the most common attack surfaces in enterprise environments. Actions in this domain directly affect end-user workflows and communications.

Action

Description

Delete Email

Remove an email from a user's mailbox

Quarantine Email

Move an email to the user's quarantine or junk box

Create Routing Rules

Create rules to handle and route incoming email

Grab Email Sample

Extract an attached file from an email

Grab Email Link

Extract a link from inside an email message

Add / Remove Meeting Invite

Modify a user's calendar

Read / Modify User Status

Read or change a user's status in the HR platform

Disable Malicious Inbox Rule

Disable a rule created by a malicious actor from a user's mailbox

Block Sender

Block a sender from the domain

Modify HR Records

Modify HR records in the system beyond status

Builder Mode Reference Scoring:

Action

T

C

I

Score

Tier

Enable Diagnostic Settings

3

1

1

5

Explorer

Apply Resource Lock

3

1

1

5

Explorer

Snapshot VM

3

1

1

5

Explorer

Stop Resource

3

1

2

6

Entry

Modify Security Group Rules

2

2

2

6

Entry

Create Security Group

2

2

2

6

Entry

Remove Permissions to Resource

2

2

2

6

Entry

Copy Storage Device

2

2

2

6

Entry

Mount Storage Device

2

2

2

6

Entry

Modify Access Type

2

2

3

7

Advanced

Isolate Resource

3

2

3

8

Expert

Remove Files from Storage

2

2

3

7

Advanced

Modify KeyVault Entries

2

3

3

8

Expert

Use Breakglass Account

2

3

3

8

Expert

Delete Resource

2

3

3

8

Expert

4.6 General Options / Usability

This domain evaluates platform-level capabilities that are not tied to any specific response action but directly affect how useful, trustworthy, and manageable the AI SOC solution is in production. The scoring for this domain uses a modified scale: 0 means not available, 1 means available but limited, and 2 means fully available and functional.

This domain is split into two sub-categories to distinguish between operational platform features and AI-specific evaluation criteria.

Platform Operations

Action

Description

Close Alerts in SIEM

The tool can close alerts in all major SIEM solutions

Logging

Platform logging allows identification of all actions taken

Reasoning Logging

Reasoning steps taken by the platform are logged at sufficient detail

API Development

The API is robust enough for integration with other security tools

Support Level

Support is responsive and allows for adequate issue resolution

Account Management

Account management is straightforward with SSO integration

Roles and Responsibility

Role-based access control is available with sufficient granularity

Ease of Use (GUI)

The GUI is navigable and intuitive

Native Chat Integration

Native integration with major communication platforms

Alerting

Automatic alerting when platform-level or analysis-level issues arise

Stats / Health Dashboards

Dashboards showing current platform status and performance

AI-Specific Evaluation Criteria

Action

Description

Bring Your Own Model

Ability to integrate custom models into the platform

Context Grounding

Ability to bring organizational data to feed into the ML model

Autonomous Action Thresholds

Platform allows setting confidence thresholds for autonomous execution

Investigation Audit Trail

Complete, exportable record of every action (AI and human) with timestamps

IR Metrics Tracking

Native tracking of MTTD, MTTA, MTTT, MTTI, MTTR without external tooling

Feedback Loop Mechanism

Analysts can confirm, reject, or correct AI decisions with feedback incorporated

Auto-Close Reversal Tracking

Tracks the rate at which auto-closed alerts are reopened by analysts

Explainability / Decision Transparency

AI provides clear, traceable reasoning for every decision

AI Decision Accuracy Reporting

Tracks TP accuracy, FP accuracy, and confidence scores over time

Model Drift Detection

Monitors AI model performance and alerts when accuracy degrades

Adversarial Robustness Testing

Supports or integrates with red team exercises to test AI resilience

5. Aggregate Maturity Scoring

Evaluating each plane individually is necessary but not sufficient. Security teams making purchasing decisions and product managers tracking competitive positioning need a consolidated view that communicates the overall picture without hiding the details.

5.1 Automation Depth Score

This is the most operationally significant metric and the one that separates real autonomous solutions from products that wrapped a chatbot interface around a set of API calls.

Across all covered capabilities, calculate the distribution:

  • What percentage is fully automated (level 2)?

  • What percentage sits at Approver level (1A)?

  • What percentage sits at Guide level (1G)?

  • What percentage sits at Collaborator level (1C)?

  • What percentage is not available at all (0)?

A product could have 80% of capabilities covered but only 5% fully automated. That is a fundamentally different product than one with 60% covered but 40% fully automated. The first is broad but shallow. The second is narrower but operates with real autonomy where it counts.

Full Automation Rate: The percentage of total capabilities at level 2. This is the true measure of how much an AI SOC solution can operate without human intervention.

Coverage Rate: The percentage of total capabilities at any level above 0. This measures breadth regardless of automation depth.

The relationship between these two numbers tells you everything about how the product actually operates. A high coverage rate with a low automation rate means the product is a guided workflow tool with AI branding. A moderate coverage rate with a high automation rate relative to coverage means the product is autonomous in its areas of focus but limited in scope.

5.2 Combined Scoring Readout

A complete ARMM evaluation for a product produces the following consolidated output:

Metric

Value

Overall Score

47% (equal plane weighted)

Composite Maturity

Entry (5 of 6 planes at Entry or above)

Automation Depth

12% fully automated, 61% covered at any level

Per-Plane Breakdown:

Plane

Score

Coverage

Fully Automated

Tier

Identity

78%

7/9 covered

2 actions

Advanced

Network

42%

8/13 covered

1 action

Entry

Endpoint

38%

12/21 covered

1 action

Entry

Cloud

31%

6/15 covered

0 actions

Explorer

SaaS

55%

7/10 covered

3 actions

Entry

General Options

64%

10/14 covered

3 actions

Advanced

6. Reading the Model

A product can reach Expert level on a specific plane by checking all the boxes for that domain. But it would be difficult to consider an AI SOC Response solution as Expert level overall if it lacks the ability to perform foundational actions like closing alerts in a SIEM. The tier system is designed to reward both depth within a domain and breadth across domains.

The reference maturity tables provided in Section 4 use example scores from a hypothetical mature AI SOC program with a skilled engineering team. These are illustrative, not universal benchmarks. The environmental dynamics described in Section 3.5 are not optional context; they are a core part of how the framework is intended to be used.

When comparing two products, the most informative comparison is not the aggregate score. It is the per-plane breakdown combined with the Automation Depth Score. Two products at the same composite tier can have radically different operational profiles. One may cover 80% of capabilities at the Collaborator level. The other may cover 50% but with 30% at full automation. These are different products for different buyers with different operational maturity levels.

7. Limitations and Future Work

This is version 0.1. The framework has known limitations:

  • The capability lists are not exhaustive. New response actions will emerge as AI SOC products mature and as attack surfaces expand.

  • The three-axis scoring (T, C, I) requires subjective judgment that will vary between evaluators. We plan to develop calibration guidelines to reduce inter-evaluator variance.

  • The framework does not currently weight domains differently. In practice, Identity response may be more important than Network response for a given organization. Weighted scoring is planned for a future version.

  • Detection and analysis capabilities are out of scope for this version. A separate framework or an extension to ARMM may address those in the future.

  • We have not included pricing, deployment time, or vendor lock-in considerations. These are important purchase factors but are outside the scope of a technical maturity model.

We are building a public web application where users can input their product's capabilities and generate ARMM scoring layers automatically, along with an exportable CSV. The application is available at: armm.secops-unpacked.ai

8. Conclusion

The AI SOC market is growing faster than the industry's ability to evaluate products on consistent terms. The ARMM framework provides a structured, repeatable methodology for measuring what an AI SOC solution can actually do in the response layer, how autonomously it can do it, and what it takes to deploy and maintain that capability in a specific operational environment.

The framework is built for two audiences: security teams evaluating products and product managers building them. For security teams, it provides a checklist and scoring system that cuts through marketing language and focuses on operational capability. For product teams, it provides a competitive analysis baseline and a prioritization framework for feature development.

SOAR gave us arms without brains. The first wave of AI SOC products gave us brains without arms. The products that will win this market are the ones that connect both. ARMM gives you a way to measure how far along that connection is, and where the gaps remain.

No current AI SOC solution will check every box. That is not the point. The point is to establish a common language and a common measurement system so that the conversation about AI SOC response capability is grounded in specifics rather than promises. Version 0.1 is the starting point. The framework will evolve as the market does.

Reply

Avatar

or to participate

Keep Reading