Disclaimer: Opinions expressed are solely my own and do not reflect the views or opinions of my employer or any other affiliated entities. Any sponsored content featured on this blog is independent and does not imply endorsement by, nor relationship with, my employer or affiliated organisations.

1. Introduction

The AI SOC market is growing fast and there are products on it that are doing serious work. Some of them have strong integration capabilities, solid reasoning engines, and response actions that actually execute in production. The market has come a long way in four years.

But there is a problem with how we evaluate these products.

When every vendor says "AI-powered response," that phrase covers everything from a fully autonomous isolation workflow to a chatbot that suggests you maybe think about resetting a password. Both get the same label in the marketing material. Both show up in the same analyst reports. And when a security team sits down to compare three products, they have no standardized way to measure the gap between "our AI handles response" and what that actually means in operational terms.

Some products are close to real autonomy in specific domains. Some are strong in analysis but thin on execution. Some have broad coverage but almost nothing runs without human approval. These are all valid positions on a maturity spectrum. The problem is that there is no shared framework to place them on that spectrum consistently.

So we built one.

We call it ARMM. And yes, the name is intentional.

A decade ago, the SOAR generation solved half the problem. We built the arms. Playbooks, integrations, automated response workflows. The execution layer was there. What was missing was the brain. Every decision tree was hand-coded. Every branching logic was written by an engineer who had to anticipate every possible scenario. The arms moved, but only along rails that humans laid down manually. When the scenario deviated from the playbook, the arm froze.

Now the AI SOC generation has solved the other half. We built the brain. LLMs reason across alerts, correlate context, analyze logs, and make judgment calls that no static playbook could replicate. But somewhere along the way, a lot of products forgot to attach the arms. The reasoning is strong. The analysis is sharp. And then it hands you a summary and says "here is what you should probably do." The brain thinks. The arm does not move.

ARMM evaluates both. The reasoning quality, the decision-making maturity, the trust you can place in the AI's judgment. And the response capability, the execution depth, the ability to actually take action without three humans supervising. It weighs the arm heavier because that is where the industry gap is widest right now. But it does not ignore the brain, because an arm without a brain is just a SOAR playbook and we already know how that story ended.

ARMM is a structured scoring system for evaluating what an AI SOC solution can actually do in the response layer. It covers 80+ response capabilities across six domains: Identity, Network, Endpoint, Cloud, SaaS, and General Options. And it provides a common language so that when someone says "we handle response," there is a way to ask: at what level, across how many actions, and with what degree of autonomy?

The CyberSec Automation Blog has published over a dozen articles and podcast episodes covering what makes a good automation program succeed, how to evaluate tools, and how to structure decision-making around security automation purchases. We have built tool comparison lists, evaluation checklists, and decision frameworks. ARMM is the next step in that work.

2. Why Another Framework

Most existing evaluation methods for AI SOC solutions are either vendor-produced (and therefore biased toward their own capabilities) or too generic to capture the specific nuances of AI-driven response. Analyst reports compare products at a feature-list level without measuring automation depth. Vendor demos show best-case scenarios without exposing the operational friction underneath.

Our focus is narrow and deliberate: response capabilities. Most AI SOC solutions already deliver strong reporting and analysis features. They can summarize alerts, correlate indicators, and reduce false negatives in a mature environment (we emphasize mature because these solutions need access to quality logs and, in more advanced implementations, to organizational documentation and environment-specific context). Where the industry needs structured evaluation is in the response layer: the actions an AI SOC solution can take, how autonomously it can take them, and under what conditions.

We acknowledge that some of the capabilities listed in this framework may seem aspirational at this stage. That is by design. The framework is intended to serve both as a current-state evaluation tool and as a forward-looking roadmap.

We are not scoring specific vendors. The goal is to establish a shared methodology that allows security teams to answer questions such as:

Which solution provides more relevant response capabilities for my environment?
Which solution operates at a higher level of autonomy for the actions that matter to my program?
Which solution can help me reduce my alert backlog without requiring additional headcount?

For product managers working on AI SOC products, the framework serves as a competitive analysis baseline:

Where is my competition positioned, and what capabilities are driving their wins?
What high-value capabilities are underserved across the market?
Am I investing engineering resources in features that security practitioners actually prioritize?

Because this is a fast-moving space, we are starting at version 0.1. This is a living document. Version 1.0 will be designated when the framework reaches a level of stability and community validation that warrants it.

3. Scoring Methodology

ARMM supports two distinct approaches to scoring, each designed for a different operational question.

Evaluator Mode is the straightforward path. You score each capability on the 0-1-2 scale described above (with the 1C, 1G, 1A sub-levels) and the framework calculates your coverage rate, automation depth, and per-plane breakdown. The tier placements come from ARMM's reference tables. You do not need to factor in your organizational context. This mode answers one question: given two or more AI SOC products, which one covers more of what I need and at what automation level? It is built for procurement teams, SOC managers running vendor evaluations, and anyone who needs a side-by-side comparison without spending weeks on it.

Builder Mode adds a second scoring layer on top. Instead of relying on fixed reference tiers, you score each action across three axes: Trust (how much confidence does your implementation warrant), Complexity (how hard is it for your specific team to build and maintain), and Impact (what is the blast radius if something goes wrong). The action score becomes T + C + I, and the tier placement shifts based on your organizational reality. The same action that scores Entry for a mature team with established automation pipelines might score Explorer for a team that is deploying its first AI SOC integration. This mode answers a different question: given my team, my environment, and my risk tolerance, where should I invest engineering effort to move up the maturity ladder? It is built for product managers, engineering leads, and internal SOC teams running their own automation programs.

Both modes evaluate the same six planes and the same 80+ response capabilities. Both produce per-plane breakdowns and a composite maturity label. The difference is whether you want a product-level comparison (Evaluator) or an environment-aware implementation roadmap (Builder). The public ARMM app at armm.secops-unpacked.ai supports both.

3.1 The Capability Scoring System (0-1-2)

Each response capability in the framework is scored on a three-level scale that measures the degree of automation available:

0 (Not Available): The feature does not exist in the product. There is no mechanism, manual or automated, to perform this action through the AI SOC solution.

1 (Available with Human Involvement): The feature exists but requires some form of human interaction before execution. Because human involvement can range from full collaboration to a simple approval click, this level is subdivided into three sub-categories:

1C (Collaborator): The solution requires continuous back-and-forth interaction with an analyst to reach a response action. The AI acts as a partner, not an autonomous agent.
1G (Guide): The solution generates a plan and presents options for a specific action, but it is not confident in recommending a single path. It lays out alternatives and lets the analyst choose.
1A (Approver): The action is essentially ready to execute. The AI has determined the correct response and prepared the action, but requires a human to click approve before it fires. This is the closest step to full automation while still keeping a human in the loop.

2 (Fully Automated): The action is performed without any human involvement. The vendor (or internal implementation) has demonstrated that the AI SOC solution can execute this action with sufficient confidence that no human review is required. At the time of writing, level 2 is exceptionally rare for most response categories. The framework includes it to establish the target state and to differentiate products that are moving in that direction from those that are not.

3.2 The Three Scoring Axes (Builder Mode)

In Builder Mode, each response action is evaluated across three dimensions:

Axis 1: Decision Fidelity and Programmatic Trust (T)

This axis measures the confidence level warranted by the AI SOC implementation. It correlates directly with implementation quality: reasoning log depth, context-aware decision-making, and guardrails against hallucination.

T = 1 (Enrichment): AI output assists human-led investigations. The AI provides context and data but does not recommend or execute actions.
T = 2 (Validated): AI recommends a specific action. A human confirms before execution occurs.
T = 3 (Autonomous): AI executes without human intervention. This requires the highest level of implementation maturity and organizational trust.

Axis 2: Implementation and Maintenance Complexity (C)

This axis evaluates the technical friction in building and sustaining the automation, relative to the skills and resources of the team responsible for it. This is deliberately team-dependent. An automation rated C = 3 for a junior team may be C = 2 for a team of specialized AI engineers with established CI/CD pipelines for their playbooks.

C = 1 (Low): Simple API calls or native integrations with minimal configuration.
C = 2 (Medium): Multi-step orchestration across multiple systems requiring coordination and testing.
C = 3 (High): Complex behavioral baselining, legacy system integration, or custom model tuning.

Axis 3: Operational Impact and Blast Radius (I)

This axis captures the business risk associated with the action. It is typically the most stable axis across organizations, but shifts based on asset criticality. Isolating a standard employee laptop has a different blast radius than isolating a production database server.

I = 1 (Low): Negligible disruption. Background scans, tagging, enrichment activities.
I = 2 (Medium): Temporary disruption. Resetting a standard user session, blocking a non-critical port.
I = 3 (High): Significant downtime, data loss risk, or reputational damage. Production system changes, VIP account modifications, critical infrastructure alterations.

3.3 The Maturity Computation Logic

The scoring system builds from individual actions up to a full program assessment through five layers. Each layer uses a defined formula.

Layer 1: Action-Level Score (S)

For a single response action, the score is the sum of its three axis values:

The minimum possible score is 3 (T=1, C=1, I=1). The maximum is 9 (T=3, C=3, I=3).

Layer 2: Tier Mapping

The action score maps to one of four maturity tiers:

Score Range	Tier	Description
3.00 to 5.99	Explorer	Foundational; low-risk quick wins with minimal blast radius
6.00 to 6.99	Entry	Stabilized; moderate effort and impact, suitable for early-stage programs
7.00 to 7.99	Advanced	Mature; requires high-fidelity reasoning and established trust
8.00 to 9.00	Expert	Critical; high blast radius, autonomous VIP handling, or production-critical actions

Layer 3: Domain Maturity Score (D)

The maturity score for a specific domain (e.g., Endpoint, Identity) is the arithmetic mean of all action scores within that domain:

Where n is the number of scored actions in the domain. The resulting D value maps to a tier using the same thresholds from Layer 2.

Layer 4: Program Maturity Score (P)

The overall program score is the arithmetic mean of all domain scores, with equal weighting across all six planes:

Equal plane weighting is a deliberate design choice. It prevents planes with more actions (Endpoint has 22, SaaS has 10) from dominating the evaluation. Each plane contributes exactly one-sixth of the overall score.

Layer 5: Composite Maturity Label

The composite label is not derived from the program score directly. It uses sequential gating logic:

The composite label equals the highest tier where at least four out of six planes independently meet that tier's threshold, and the qualification chain is unbroken from Explorer upward. A product cannot be labeled Advanced if it has gaps at the Explorer tier.

The four-out-of-six rule is intentionally forgiving. A product focused on cloud-native environments may legitimately deprioritize network-level response. That should not disqualify it from a meaningful composite label. But it still needs breadth across most planes to earn a higher tier.

3.4 Context-Aware Scoring: Why Environment Matters

The ARMM recognizes that the maturity level of an automated action is not a static property of the feature itself. It is an emergent property of the environment where it is applied. The three axes (T, C, I) are all subject to organizational variance, which means the same product capability produces different scores in different contexts.

Example: "Isolate Device" evaluated by three different organizations using the same AI SOC product:

Context	Trust (T)	Complexity (C)	Impact (I)	Score	Tier
Org A: Mature Program / Expert Team	3	2	3	8	Expert
Org B: New Program / Junior Team	1	3	3	7	Advanced
Org C: High-Risk Assets / Manual-First	2	2	3	7	Advanced

The product capability is identical across all three. The scores differ because the Trust axis reflects implementation maturity, the Complexity axis reflects team capability, and the Impact axis (while stable here) can shift based on asset criticality. A vendor benchmark alone is insufficient. Builder Mode exists specifically to capture this variance.

4. Response Capability Domains

The framework organizes response capabilities into six domains. The first five (Identity, Network, Endpoint, Cloud, SaaS) cover specific technical response planes. The sixth (General Options / Usability) covers platform-level characteristics that affect the operational quality of the solution independent of any specific response action.

For the first five domains, each capability is scored using the 0-1-2 system described in Section 3.1 (Evaluator Mode) or the T+C+I system described in Section 3.3 (Builder Mode). For the General Options domain, the scoring criteria shift slightly: 0 means the feature is not available, 1 means the feature is available but limited in capability or partially implemented, and 2 means the feature is fully available, functional, and tested.

4.1 Identity Response Plane

Identity-related response actions target user accounts, service principals, groups, and access permissions. These actions are among the most commonly needed in incident response and are often the first automation candidates for SOC teams.

Action	Description
Reset Password	Reset a standard user's password
Revoke Sessions	Terminate all active sessions for a user account
Disable User	Disable a standard user account
Disable Service Principals	Disable a service account, service principal, or managed identity
Remove Permissions	Remove a specific set of permissions from an account
Group Adherence	Add or remove an account from a security group
Group Creation	Create a new security group
Token Rotation	Create or rotate secrets and tokens
Delete Sharing Permissions	Remove sharing permissions on resources
Label User (Tagging)	Apply a tag or label to a user account for tracking

Builder Mode Reference Scoring (Mature AI SOC Program, Skilled Engineering Team):

Action	T	C	I	Score	Tier
Group Adherence	3	1	1	5	Explorer
Label User (Tagging)	3	1	1	5	Explorer
Revoke Sessions	3	1	1	5	Explorer
Reset Password (Std)	3	1	2	6	Entry
Disable Standard User	3	1	2	6	Entry
Delete Sharing Permissions	2	2	2	6	Entry
Remove Specific Permissions	2	2	3	7	Advanced
Group Creation	3	2	2	7	Advanced
Disable Service Principals	2	2	3	7	Advanced
Reset VIP Password	3	2	3	8	Expert
Rotate Secrets (Prod)	2	3	3	8	Expert

4.2 Network Response Plane

Network-level response actions modify traffic flow, access control, and device connectivity. These are often high-impact actions with significant blast radius, making the Trust and Impact axes particularly important in scoring.

Action	Description
ACL Creation	Create a new access control list on the network
VLAN Creation	Create a new VLAN on the network
Firewall Rule Creation	Create a new firewall rule
IPS Rule Creation	Create a new IPS rule in deny mode
Network Connection Reset	Reset a network connection
DNS Entry Change	Modify an entry in the DNS records
Routing Table Change	Modify a routing entry
Sinkhole Traffic	Redirect traffic to a sinkhole
Rate Limit Traffic	Limit traffic by a particular indicator
VLAN Modification	Move a device to a restricted VLAN
Quarantine Device	Quarantine a device at the network level
Quarantine Server	Quarantine a server running an enterprise-level service
Modify NAT Rules	Change NAT rules to modify traffic patterns

Builder Mode Reference Scoring:

Action	T	C	I	Score	Tier
Network Connection Reset	3	1	1	5	Explorer
Sinkhole Traffic	3	1	1	5	Explorer
Rate Limit Traffic	3	1	1	5	Explorer
ACL Creation	2	2	2	6	Entry
Quarantine Device	3	1	2	6	Entry
Firewall Rule Creation	2	1	3	6	Entry
DNS Entry Change	2	2	2	6	Entry
Modify NAT Rules	2	2	2	6	Entry
IPS Rule Creation	2	2	3	7	Advanced
VLAN Creation	3	2	2	7	Advanced
VLAN Modification	3	2	3	8	Expert
Routing Table Change	3	3	3	9	Expert
Quarantine Server	2	3	3	8	Expert

4.3 Endpoint Response Plane

Endpoint response actions operate directly on devices and their software environment. This domain has the largest number of capabilities because endpoint response spans file operations, process management, application control, forensics, and OS-level changes.

Action	Description
Isolate Device	Isolate a device from all network connectivity
Initiate Malware Scan	Start a scan on the device
Grab File from Device	Upload a file to a designated container
Submit File to Sandbox	Submit a file for sandbox analysis
Lock Out User	Lock a user out of the device
Remove User from Device	Remove a user account from the device
Delete Files	Delete specific files from the device
Kill Processes	Terminate a running process
Remove Application	Uninstall an application
Remove Browser Extension	Remove a browser extension
Modify Browser Settings	Set, modify, or replace browser security parameters
Remove Scheduled Task	Remove a cron entry or scheduled task
Remove Startup Items	Remove a process, agent, or file from system startup
Remove Library / Package	Remove a library from a development environment
Upgrade Application	Force an automatic update on installed software
Upgrade OS	Force an automatic OS update
Deploy Script	Deploy a script or application needed for remediation
Modify Registry Key	Change a value or create a new registry key
Disable Service	Change the status of or remove a service
Collect Memory Dump	Initiate and retrieve a memory dump forensically
Clear Browser Cache	Remove all files, cookies, and data from the browser cache
Remove Device from Domain	Remove a device from the domain

Builder Mode Reference Scoring:

Action	T	C	I	Score	Tier
Initiate Malware Scan	3	1	1	5	Explorer
Clear Browser Cache	3	1	1	5	Explorer
Grab File from Device	3	1	1	5	Explorer
Collect Memory Dump	2	2	1	5	Explorer
Submit File to Sandbox	3	1	1	5	Explorer
Kill Processes	3	1	2	6	Entry
Block File (via Hash)	3	1	2	6	Entry
Lock Out User	2	2	2	6	Entry
Remove Browser Extension	3	1	2	6	Entry
Remove Scheduled Task	2	2	2	6	Entry
Remove Startup Items	2	2	2	6	Entry
Disable Service	2	2	2	6	Entry
Delete Files	2	2	2	6	Entry
Modify Browser Settings	2	2	2	6	Entry
Remove Application	2	2	3	7	Advanced
Remove User from Device	2	2	3	7	Advanced
Remove Library / Package	2	2	3	7	Advanced
Modify Registry Key	2	3	3	8	Expert
Isolate Device	3	2	3	8	Expert
Remove Device from Domain	2	3	3	8	Expert
Upgrade Application	2	3	3	8	Expert
Upgrade OS	2	3	3	8	Expert
Deploy Script	3	3	3	9	Expert

4.4 Cloud Response Plane

Cloud response actions target infrastructure resources, access controls, and storage in cloud environments. The blast radius of cloud actions can be particularly severe because a single misconfigured change can affect multiple dependent services.

Action	Description
Modify Security Group Rules	Modify firewall rules on a cloud resource to restrict access
Create Security Group	Create a new security group and apply it to restrict traffic
Isolate Resource	Quarantine a cloud resource so it is unreachable
Modify Access Type	Switch a resource from public to private or restrict anonymous access
Remove Permissions to Resource	Remove a service principal or managed identity from accessing a resource
Delete Resource	Delete a resource from the cloud environment
Stop Resource	Stop a resource from execution
Modify KeyVault Entries	Add or modify resources in a KeyVault
Use Breakglass Account	Use a breakglass account in case of emergency
Remove Files from Storage	Remove files from a storage bucket or storage account
Copy Storage Device	Create a copy of a cloud storage resource for forensic investigation
Mount Storage Device	Mount a new storage capability to a VM for forensic investigation
Snapshot VM	Create a snapshot of the current state of a virtual machine
Enable Diagnostic Settings	Alter settings that enable advanced log gathering
Apply Resource Lock	Make the resource immutable or read-only

Builder Mode Reference Scoring:

Action	T	C	I	Score	Tier
Enable Diagnostic Settings	3	1	1	5	Explorer
Apply Resource Lock	3	1	1	5	Explorer
Snapshot VM	3	1	1	5	Explorer
Stop Resource	3	1	2	6	Entry
Modify Security Group Rules	2	2	2	6	Entry
Create Security Group	2	2	2	6	Entry
Remove Permissions to Resource	2	2	2	6	Entry
Copy Storage Device	2	2	2	6	Entry
Mount Storage Device	2	2	2	6	Entry
Modify Access Type	2	2	3	7	Advanced
Isolate Resource	3	2	3	8	Expert
Remove Files from Storage	2	2	3	7	Advanced
Modify KeyVault Entries	2	3	3	8	Expert
Use Breakglass Account	2	3	3	8	Expert
Delete Resource	2	3	3	8	Expert

4.5 SaaS Response Plane

SaaS response actions focus primarily on email and productivity platforms, which are among the most common attack surfaces in enterprise environments. Actions in this domain directly affect end-user workflows and communications.

Action	Description
Delete Email	Remove an email from a user's mailbox
Quarantine Email	Move an email to the user's quarantine or junk box
Create Routing Rules	Create rules to handle and route incoming email
Grab Email Sample	Extract an attached file from an email
Grab Email Link	Extract a link from inside an email message
Add / Remove Meeting Invite	Modify a user's calendar
Read / Modify User Status	Read or change a user's status in the HR platform
Disable Malicious Inbox Rule	Disable a rule created by a malicious actor from a user's mailbox
Block Sender	Block a sender from the domain
Modify HR Records	Modify HR records in the system beyond status

Builder Mode Reference Scoring:

Action	T	C	I	Score	Tier
Enable Diagnostic Settings	3	1	1	5	Explorer
Apply Resource Lock	3	1	1	5	Explorer
Snapshot VM	3	1	1	5	Explorer
Stop Resource	3	1	2	6	Entry
Modify Security Group Rules	2	2	2	6	Entry
Create Security Group	2	2	2	6	Entry
Remove Permissions to Resource	2	2	2	6	Entry
Copy Storage Device	2	2	2	6	Entry
Mount Storage Device	2	2	2	6	Entry
Modify Access Type	2	2	3	7	Advanced
Isolate Resource	3	2	3	8	Expert
Remove Files from Storage	2	2	3	7	Advanced
Modify KeyVault Entries	2	3	3	8	Expert
Use Breakglass Account	2	3	3	8	Expert
Delete Resource	2	3	3	8	Expert

4.6 General Options / Usability

This domain evaluates platform-level capabilities that are not tied to any specific response action but directly affect how useful, trustworthy, and manageable the AI SOC solution is in production. The scoring for this domain uses a modified scale: 0 means not available, 1 means available but limited, and 2 means fully available and functional.

This domain is split into two sub-categories to distinguish between operational platform features and AI-specific evaluation criteria.

Platform Operations

Action	Description
Close Alerts in SIEM	The tool can close alerts in all major SIEM solutions
Logging	Platform logging allows identification of all actions taken
Reasoning Logging	Reasoning steps taken by the platform are logged at sufficient detail
API Development	The API is robust enough for integration with other security tools
Support Level	Support is responsive and allows for adequate issue resolution
Account Management	Account management is straightforward with SSO integration
Roles and Responsibility	Role-based access control is available with sufficient granularity
Ease of Use (GUI)	The GUI is navigable and intuitive
Native Chat Integration	Native integration with major communication platforms
Alerting	Automatic alerting when platform-level or analysis-level issues arise
Stats / Health Dashboards	Dashboards showing current platform status and performance

AI-Specific Evaluation Criteria

Action	Description
Bring Your Own Model	Ability to integrate custom models into the platform
Context Grounding	Ability to bring organizational data to feed into the ML model
Autonomous Action Thresholds	Platform allows setting confidence thresholds for autonomous execution
Investigation Audit Trail	Complete, exportable record of every action (AI and human) with timestamps
IR Metrics Tracking	Native tracking of MTTD, MTTA, MTTT, MTTI, MTTR without external tooling
Feedback Loop Mechanism	Analysts can confirm, reject, or correct AI decisions with feedback incorporated
Auto-Close Reversal Tracking	Tracks the rate at which auto-closed alerts are reopened by analysts
Explainability / Decision Transparency	AI provides clear, traceable reasoning for every decision
AI Decision Accuracy Reporting	Tracks TP accuracy, FP accuracy, and confidence scores over time
Model Drift Detection	Monitors AI model performance and alerts when accuracy degrades
Adversarial Robustness Testing	Supports or integrates with red team exercises to test AI resilience

5. Aggregate Maturity Scoring

Evaluating each plane individually is necessary but not sufficient. Security teams making purchasing decisions and product managers tracking competitive positioning need a consolidated view that communicates the overall picture without hiding the details.

5.1 Automation Depth Score

This is the most operationally significant metric and the one that separates real autonomous solutions from products that wrapped a chatbot interface around a set of API calls.

Across all covered capabilities, calculate the distribution:

What percentage is fully automated (level 2)?
What percentage sits at Approver level (1A)?
What percentage sits at Guide level (1G)?
What percentage sits at Collaborator level (1C)?
What percentage is not available at all (0)?

A product could have 80% of capabilities covered but only 5% fully automated. That is a fundamentally different product than one with 60% covered but 40% fully automated. The first is broad but shallow. The second is narrower but operates with real autonomy where it counts.

Full Automation Rate: The percentage of total capabilities at level 2. This is the true measure of how much an AI SOC solution can operate without human intervention.

Coverage Rate: The percentage of total capabilities at any level above 0. This measures breadth regardless of automation depth.

The relationship between these two numbers tells you everything about how the product actually operates. A high coverage rate with a low automation rate means the product is a guided workflow tool with AI branding. A moderate coverage rate with a high automation rate relative to coverage means the product is autonomous in its areas of focus but limited in scope.

5.2 Combined Scoring Readout

A complete ARMM evaluation for a product produces the following consolidated output:

Metric	Value
Overall Score	47% (equal plane weighted)
Composite Maturity	Entry (5 of 6 planes at Entry or above)
Automation Depth	12% fully automated, 61% covered at any level

Per-Plane Breakdown:

Plane	Score	Coverage	Fully Automated	Tier
Identity	78%	7/9 covered	2 actions	Advanced
Network	42%	8/13 covered	1 action	Entry
Endpoint	38%	12/21 covered	1 action	Entry
Cloud	31%	6/15 covered	0 actions	Explorer
SaaS	55%	7/10 covered	3 actions	Entry
General Options	64%	10/14 covered	3 actions	Advanced

6. Reading the Model

A product can reach Expert level on a specific plane by checking all the boxes for that domain. But it would be difficult to consider an AI SOC Response solution as Expert level overall if it lacks the ability to perform foundational actions like closing alerts in a SIEM. The tier system is designed to reward both depth within a domain and breadth across domains.

The reference maturity tables provided in Section 4 use example scores from a hypothetical mature AI SOC program with a skilled engineering team. These are illustrative, not universal benchmarks. The environmental dynamics described in Section 3.5 are not optional context; they are a core part of how the framework is intended to be used.

When comparing two products, the most informative comparison is not the aggregate score. It is the per-plane breakdown combined with the Automation Depth Score. Two products at the same composite tier can have radically different operational profiles. One may cover 80% of capabilities at the Collaborator level. The other may cover 50% but with 30% at full automation. These are different products for different buyers with different operational maturity levels.

7. Limitations and Future Work

This is version 0.1. The framework has known limitations:

The capability lists are not exhaustive. New response actions will emerge as AI SOC products mature and as attack surfaces expand.
The three-axis scoring (T, C, I) requires subjective judgment that will vary between evaluators. We plan to develop calibration guidelines to reduce inter-evaluator variance.
The framework does not currently weight domains differently. In practice, Identity response may be more important than Network response for a given organization. Weighted scoring is planned for a future version.
Detection and analysis capabilities are out of scope for this version. A separate framework or an extension to ARMM may address those in the future.
We have not included pricing, deployment time, or vendor lock-in considerations. These are important purchase factors but are outside the scope of a technical maturity model.

We are building a public web application where users can input their product's capabilities and generate ARMM scoring layers automatically, along with an exportable CSV. The application is available at: armm.secops-unpacked.ai

8. Conclusion

The AI SOC market is growing faster than the industry's ability to evaluate products on consistent terms. The ARMM framework provides a structured, repeatable methodology for measuring what an AI SOC solution can actually do in the response layer, how autonomously it can do it, and what it takes to deploy and maintain that capability in a specific operational environment.

The framework is built for two audiences: security teams evaluating products and product managers building them. For security teams, it provides a checklist and scoring system that cuts through marketing language and focuses on operational capability. For product teams, it provides a competitive analysis baseline and a prioritization framework for feature development.

SOAR gave us arms without brains. The first wave of AI SOC products gave us brains without arms. The products that will win this market are the ones that connect both. ARMM gives you a way to measure how far along that connection is, and where the gaps remain.

No current AI SOC solution will check every box. That is not the point. The point is to establish a common language and a common measurement system so that the conversation about AI SOC response capability is grounded in specifics rather than promises. Version 0.1 is the starting point. The framework will evolve as the market does.

We built a framework to score AI SOC response capabilities

1. Introduction

2. Why Another Framework

3. Scoring Methodology

3.1 The Capability Scoring System (0-1-2)

3.2 The Three Scoring Axes (Builder Mode)

Axis 1: Decision Fidelity and Programmatic Trust (T)

Axis 2: Implementation and Maintenance Complexity (C)

Axis 3: Operational Impact and Blast Radius (I)

3.3 The Maturity Computation Logic

Layer 1: Action-Level Score (S)

Layer 2: Tier Mapping

Layer 3: Domain Maturity Score (D)

Layer 4: Program Maturity Score (P)

Layer 5: Composite Maturity Label

3.4 Context-Aware Scoring: Why Environment Matters

4. Response Capability Domains

4.1 Identity Response Plane

4.2 Network Response Plane

4.3 Endpoint Response Plane

4.4 Cloud Response Plane

4.5 SaaS Response Plane

4.6 General Options / Usability

Platform Operations

AI-Specific Evaluation Criteria

5. Aggregate Maturity Scoring

5.1 Automation Depth Score

5.2 Combined Scoring Readout

6. Reading the Model

7. Limitations and Future Work

8. Conclusion

Reply

Keep Reading

Cyber Security Automation and Orchestration