Disclaimer: Opinions expressed are solely my own and do not reflect the views or opinions of my employer or any other affiliated entities. Any sponsored content featured on this blog is independent and does not imply endorsement by, nor relationship with, my employer or affiliated organisations.

Article written in collaboration with Andrew Green

We often argue over semantics, but your agents shouldn't. They need explicit definitions for what data and events mean for your business. Otherwise they will refer back to their training data, which is often not applicable.

For example, a log from an on-prem Cisco router and an AWS VPC log are structured differently, but they both contain IP addresses, ports, protocols. An LLM can understand what these network elements mean and make general inferences about network requests. But how will it determine which instances of traffic between your on-prem environment and cloud are expected versus suspicious?

You've got three options here.

The first one is to ask the LLM: 'Hey, is this suspicious?', at which point the LLM will be using its training data and prompt context to give a statistically likely interpretation. This is the neural approach.

The second one is to define an explicit rule, such as alert if request.source_ip in "10.0.0.0/8" and request.destination_ip in "52.0.0.0/8". This is the symbolic approach.

The third one is to tell the LLM: 'Hey, our dev Jerry had a leg surgery and he won't leave the house while he heals, but he'll be working from home'. With the right tools, the LLM will now interpret the ambiguities of this message and combine them with hard rules to define an explicit instruction, where all of Jerry's access requests will be made from his VPN for the next 6 months, while anything else is suspicious. This is the neurosymbolic approach.

To get to this third option, we need a semantic layer.

The Fundamental Tension: Why We Need Semantic Layers

Before we define what semantic layers are, let's understand the problem they're solving. There's a fundamental tension in how we approach automated reasoning in security:

Pure Neural (LLM) Approach:

  • Excellent at pattern recognition and handling ambiguity

  • Can process natural language queries

  • Adaptable to new situations

  • BUT: Non-deterministic, expensive at scale, can't explain decisions

Pure Symbolic Approach:

  • Deterministic and auditable

  • Fast and efficient at scale

  • Provides clear reasoning chains

  • BUT: Brittle with edge cases, requires extensive rule maintenance

The semantic layer provides a foundation for hybrid approaches where symbolic reasoning handles the facts and constraints while neural networks handle the ambiguity and interface. It's the bridge between "Jerry is working from home while recovering" (natural language) and "source_ip = Jerry_VPN_IP AND time_range = next_6_months" (executable logic).

But What Is a Semantic Layer, Exactly?

A semantic layer is an LLM-queriable abstraction layer that pulls and correlates information about entities, their relationships, and the deterministic rules that govern their interactions. 

It correlates raw technical data with business meaning through symbolic reasoning.

But what is symbolic reasoning?

Actually, what is a symbol?

A symbol is a notation that captures the meaning of a concept, such as 'putting two things together' is expressed by the symbol '+'. Humans and LLMs can use these symbols to read and define rules and instructions.

Symbolic reasoning therefore uses these notations to work out a conclusion.

At its core, a semantic layer must define the following three concepts:

  1. Entities: identities (both human and workload), assets such as servers and applications, events like a GET request, and behaviors such as Dev (identity) makes a POST request (event) to a database (asset).

  2. Relationships also known as relationship graphs or knowledge graphs, these map things like User X manages Asset Y, Asset Y hosts Application Z, Application Z processes Data Type W, Data Type W is subject to Regulation V.

  3. Symbolic Reasoning: applying logic-based rules. You can trace exactly why a decision was made. Every step is auditable, testable, and deterministic.

    a. IF user.role = "contractor"

    b. AND access.time NOT IN business_hours

    c. AND asset.classification = "confidential"

    d. THEN risk_score = HIGH

Where Should Semantic Layers Live?

Considering most security activities revolve around gathering accurate and relevant data, where in the modern SecOps stack should this semantic layer be inserted?This is where theory meets painful reality, and where practitioners diverge sharply on the right approach.

There are four main architectural positions, each with compelling arguments and serious tradeoffs:

Option 1: In the Data Pipeline (Pre-SIEM)

The argument here is to do semantic processing before data ever reaches your SIEM or data lake.

Advantages:

  • Reduces data volume through intelligent filtering

  • Normalizes entities at the source

  • Cheaper than processing in expensive SIEM platforms

  • Can route data based on semantic understanding

Challenges:

  • Requires real-time processing at massive scale

  • Changes to semantic models require pipeline updates

  • Limited context (can't look at historical data easily)

  • Becomes another system to maintain

Verdict: This works well for basic entity resolution and enrichment, but struggles with complex relationship modeling that requires historical context.

Option 2: Within the SIEM Platform

Modern SIEMs increasingly claim to embed semantic capabilities directly.

Advantages:

  • Integrated with detection and investigation workflows

  • Access to all historical data for context

  • Single platform to manage

  • Vendor support (in theory)

Challenges:

  • Vendor lock-in to proprietary semantic models

  • Performance impacts on already-stressed SIEMs

  • Limited flexibility in modeling

  • Expensive processing at SIEM rates

Verdict: most SIEM vendors are adding "semantic" features, but they're often just rebranded lookup tables and data models. True semantic reasoning with relationship graphs and symbolic logic remains limited. XDR vendors have marketed this capability more aggressively, positioning themselves as solving the "semantic gap" problem, but implementation depth varies significantly.

Option 3: As a Separate Analytics Layer

Building a semantic layer that operates alongside or on top of your security data infrastructure.

Advantages:

  • Flexibility to model your specific environment

  • Can work across multiple data sources

  • Optimized for complex reasoning

  • Natural fit for AI/ML integration

Challenges:

  • Another platform to integrate and maintain

  • Potential latency for real-time use cases

  • Requires data federation or complex APIs

  • Organizational boundaries (who owns it?)

Verdict: This is where most successful implementations end up, but it requires significant investment and organizational commitment. Some SOAR platforms have evolved toward this model, building semantic layers with automation workflows and embedding business context in case management. AI SOC platforms are also exploring this space, though many are still in early stages of true semantic reasoning versus simple enrichment.

Option 4: Distributed Semantic Processing

The most ambitious approach - semantic capabilities distributed throughout your stack, with processing happening wherever it makes most sense.

Advantages:

  • Process data where it makes most sense

  • No single point of failure

  • Can optimize for different use cases

  • Scales naturally with infrastructure

Challenges:

  • Consistency across distributed models

  • Complex orchestration and governance

  • Difficult to maintain and update

  • Requires sophisticated engineering

Verdict: This is the "microservices of semantic layers" - sounds great in theory, nightmarish in practice for most organizations. You need mature DevOps practices and significant engineering resources to make this work.

How Semantic Layer Projects Fail

Let's be honest about the failure modes, because understanding these is more valuable than any architecture diagram:

1. Trying to Model Everything

Organizations attempt to create comprehensive ontologies of their entire environment. This is impossibly complex and never finishes. You don't need a complete knowledge graph of your infrastructure to get value from semantic layers. But you do not explicit information about what is modeled and what is not.

2. Ignoring Organizational Reality

Semantic models require agreement on basic concepts like "what is a critical asset?" Most organizations can't even agree on this informally, much less formally model it. The technical challenge is often easier than the political one.

3. Underestimating Maintenance

Semantic models aren't write-once. They require constant updates as your environment evolves. Without dedicated resources, they become stale and useless within months.

4. Over-Engineering the Solution

Building elaborate graph databases and reasoning engines when simple lookup tables would suffice for 80% of use cases. Start with the minimum viable semantic layer that solves your specific problems.

5. Lacking Clear Use Cases

Implementing semantic layers because they sound advanced, not because they solve specific problems. "We need better context" isn't a use case - "We need to automatically identify which database access is anomalous based on team structure and data classification" is.

The Practical Path Forward

If you're considering implementing semantic capabilities, here's what actually works:

Start Small and Specific

Pick one use case with clear business value. User-to-asset relationship modeling for access analysis. Application dependency mapping for incident scope. Data classification for DLP prioritization. Prove value before expanding.

Embrace Imperfect Models

Your semantic layer doesn't need to be complete or perfect. Even modeling 60% of your critical relationships provides massive value over having none. Accept that edge cases will exist.

Build for Maintenance

Plan for model updates from day one. Who owns entity definitions? How do relationships get updated? What's the approval process for changes? The technical implementation is secondary to the governance model.

Leverage Standards Where Possible

Don't reinvent entity definitions that already exist. Frameworks like OCSF provide common semantic models that reduce custom development. Think of these as starting points, not constraints.

Consider Buy vs Build Carefully

Building custom semantic layers requires significant engineering investment. Many organizations would be better served by vendor solutions, even with their limitations. Be honest about your capabilities and resources.

As well some great resources:
https://www.holistics.io/books/setup-analytics/data-modeling-layer-and-concepts/

Speculation on The Future of Semantic Layers in Security

Looking ahead, semantic layers will evolve in three key directions:

Standardization

Common semantic models will reduce the need for custom development. We'll share entity definitions and relationship structures like we share threat intelligence today. OCSF and similar frameworks are laying this groundwork.

Automation

ML will help maintain semantic models by learning from patterns and proposing updates. Instead of manually defining every relationship, you'll validate what the system discovers. Humans will shift from creation to curation.

Distributed Intelligence

Rather than centralizing all semantic reasoning in one layer, we'll see semantic capabilities embedded throughout security infrastructure. Your EDR will understand user context, your SIEM will reason about asset relationships, your SOAR will make decisions based on business impact. The semantic layer becomes infrastructure, not a product.

The organizations that succeed won't be those with the most sophisticated semantic models, but those that solve specific problems with appropriately scoped semantic reasoning. Start with Jerry and his VPN, not with a complete ontology of your enterprise.

Join as a top supporter of our blog to get special access to the latest content and help keep our community going.

As an added benefit, each Ultimate Supporter will receive a link to the editable versions of the visuals used in our blog posts. This exclusive access allows you to customize and utilize these resources for your own projects and presentations.

Reply

or to participate

Keep Reading

No posts found