Log Curation 101

A guardian dragon curls around nine fragile eggs, each one a vital source of security telemetry.

If you are building or improving a SIEM, start with the logs before you start with the rules. A detection rule is a query over stored events. It works only when the SIEM receives the events the rule expects, parses the fields analysts need, and keeps the data long enough for an investigation.

In a SIEM review, check the data requirement for each detection before judging the rule library. The event must exist, the relevant fields must parse cleanly, and the data must remain searchable long enough for analysts to use it. A good rule can still fail when the required source is absent or poorly parsed.

Use the nine categories below as a review checklist. For each one, decide whether the source is present, whether analysts can join it to identity and asset context, and whether retention is long enough to answer incident questions after the first alert appears.

Tier The Sources By Investigation Value

The tiers below rank where engineering time should go first. They guide parser work, retention spend, normalization, and detection effort. A lower-tier source can still forward events to the SIEM; the tier only describes how much tuning budget it should receive by default.

Tier 1 sources explain the two questions analysts ask first: who accessed the environment, and what ran on managed systems. Tier 2 sources explain cloud and network control points. Tier 3 sources add perimeter evidence and vendor findings. Tier 4 sources cover business systems, specialized infrastructure, and managed devices. In some environments, a Tier 4 source deserves Tier 1 treatment because it holds regulated data or controls production deployments.

#	Category	Tier	Volume	Signal density	Primary detection value
1	Identity and Access	1	Medium	High	Credential abuse, Kerberos, OAuth, MFA bypass
2	Endpoint	1	High	High	Execution, persistence, credential access, lateral movement
3	Cloud Platform	2	Medium	High	IAM abuse, cloud persistence, storage exfiltration
4	Network Infrastructure	2	Very high	Medium	DNS C2, IP-to-host binding, lateral reconnaissance
5	Network Perimeter	3	Very high	Low to medium	Inbound exploitation, web attacks, egress C2
6	Detection Findings	3	Low	High for corroboration	Vendor alerts, threat-intel matches, deception trips
7	SaaS and Productivity	4	Medium	High	BEC, OAuth app abuse, document-store exfiltration
8	Applications and Data	4	Variable	High	Database exfiltration, app auth abuse, supply-chain compromise
9	Containers, VDI, and MDM	4	Medium	Medium	Kubernetes RBAC abuse, VDI abuse, MDM drift

Identity And Access

Identity logs answer three questions. Who signed in? How was access approved? What changed after access was granted? Collection should cover authentication, session and token use, authorization changes, and trust relationships. Those groups matter because identity attacks usually move from a successful sign-in to a change in access.

Failed logins help find password attacks, but successful sign-ins carry the incident story. A spray becomes a compromise when one account signs in and starts using access from an unusual device or access path. Directory and authorization changes need strong retention because attackers can keep access through role changes, application grants, or federation settings.

This category supports detection of credential and token abuse, MFA bypass, service-account misuse, and durable identity persistence. It also gives analysts the account context needed to interpret activity in other sources.

References:

NIST SP 800-63B-4: Authentication and Authenticator Management gives current terminology for authentication, assurance, and session handling. Use it to decide which authentication context your SIEM needs to preserve.
MITRE ATT&CK User Account Authentication connects authentication events to adversary behavior. Use it when deciding which sign-in events need to exist before writing credential-attack detections.
RFC 9700: Best Current Practice for OAuth 2.0 Security explains current OAuth threats and mitigations. It supports the recommendation to collect OAuth consent, token activity, and application-related identity events.

Tool I like: BloodHound Community Edition shows identity relationships as attack paths. It helps teams see which privileges and trust relationships matter enough to monitor closely.

Endpoint

Endpoint logs tell the SOC what happened on a workstation or server. Mature collection shows process execution with command context. It should also tie process network activity, scripts, local authentication, security controls, and persistence changes back to the host.

EDR alerts help triage. Raw endpoint telemetry should sit beside them so analysts can rebuild execution chains and tune detections after the first alert. It also supports historical hunting when a new indicator, toolmark, or technique becomes relevant.

Endpoint telemetry helps answer direct questions. What ran? What changed? What did the process contact? That evidence explains execution and follow-on behavior. It supports detection across credential access, persistence, lateral movement, evasion, and ransomware staging.

References:

MITRE ATT&CK Process Creation defines the process details defenders need for execution detection. Use it as a checklist for endpoint parser requirements.
Open Source Security Events Metadata (OSSEM) models security event fields across operating systems and providers. Use it to normalize endpoint events into consistent names.
Sigma Specification provides a vendor-agnostic detection rule format for log analytics. It helps separate the detection idea from the query language of a specific SIEM.

Tool I like: osquery exposes endpoint state through SQL. It lets analysts ask one style of question across Windows, macOS, and Linux instead of switching models for each operating system.

Cloud Platform

Cloud-platform logs record API calls against cloud resources. The SIEM should receive management-plane activity and data-plane access for sensitive services. It should also capture identity changes, trust changes, policy changes, and logging or detection changes. Read operations deserve collection because cloud reconnaissance often happens through normal API calls before privilege changes or data access.

Cloud audit needs to identify the caller and target resource. It should also preserve source location and request result. Keep the account context and request ID. That record lets the SOC detect credential abuse, IAM escalation, persistence, data exposure, and logging tamper attempts.

References:

CSA Cloud Controls Matrix is a vendor-neutral cloud control framework. Use it to tie cloud logging requirements to IAM, data security, monitoring, and change-control objectives.
MITRE ATT&CK Cloud Matrix lists cloud techniques across provider and SaaS platforms. It helps readers connect cloud audit sources to concrete attacker actions.
OCSF gives a vendor-agnostic schema for security telemetry. Use it when cloud logs from multiple providers need a common event shape.

Tool I like: Prowler audits cloud environments for posture, compliance, and forensics readiness. Its findings help teams identify which services and misconfigurations deserve audit coverage first.

Network Infrastructure

Network-infrastructure logs identify the systems behind network activity. DNS and DHCP resolve names and addresses to assets. Flow records show communication patterns. Routing, switching, NAC, and wireless logs add the network location and access context.

The SIEM should receive resolver and lease history, boundary flow, network-device changes, and access outcomes. Wireless association events matter in wireless-heavy environments. Parsing should make IP ownership clear within a time window. Without that, investigations can stop at an address whose owner changed during the incident.

This evidence supports detection of DNS-based command and control and domain enumeration. It also surfaces unmanaged devices, unauthorized network changes, unusual egress, and suspicious physical-edge access.

References:

Zeek documentation is a reference for network security monitoring logs. Even if an organization does not deploy Zeek, its log model helps define which network metadata fields analysts need.
MITRE ATT&CK Network Traffic Flow describes the session-level fields used for traffic analysis. Use it to validate flow logging and egress-detection requirements.
NIST SP 800-207: Zero Trust Architecture explains why network location alone is no longer enough to make access decisions. It supports joining network telemetry with identity, device, application, and resource context.

Tool I like: Zeek produces structured network metadata from traffic. It gives analysts searchable session records before they need to reach for packet capture.

Network Perimeter

Perimeter logs describe traffic through boundary controls and the decisions those controls made. At minimum, collect boundary traffic and inspection decisions. Include remote-access and posture data. Add user-to-IP mapping, NDR metadata, and email gateway verdicts where those controls exist.

This category can produce very high volume. Field selection matters. Retain what analysts use to explain a session: initiator, destination, policy, volume, and inspection result. Egress needs the same care as ingress because compromised hosts usually communicate outward for command and control or data movement.

Perimeter telemetry helps detect exposed-service exploitation, web attacks, remote-access abuse, and outbound activity such as command and control or exfiltration. It improves triage when it joins cleanly to the rest of the asset and identity record.

References:

NIST SP 800-41 Rev. 1: Guidelines on Firewalls and Firewall Policy explains firewall policy and management in vendor-neutral terms. It gives readers a baseline for what perimeter controls are expected to record and enforce.
OWASP Core Rule Set provides generic web attack detection rules for compatible WAF engines. It helps readers understand WAF findings as rule-driven evidence that needs request context and tuning.
MITRE ATT&CK Network Traffic Content and Network Traffic Flow distinguish payload content from session metadata. Use that distinction when deciding what perimeter data to retain.

Tool I like: Security Onion packages network detection, metadata, packet capture, hunting, and case management. It helps a team validate perimeter telemetry and pivot into packets without building the whole stack by hand.

Detection Findings

Detection findings are alerts created by another analytics engine. They come from the other security tools the SOC already runs. A good finding record explains severity, affected entities, supporting evidence, investigation links, and disposition changes.

Findings help the SOC prioritize work. Analysts still need the underlying evidence. An alert that points to the raw event can be tuned and reproduced. An alert without evidence can be routed and tracked, with limited room for improvement.

Use findings as corroboration. A vendor finding aligned with primary telemetry should raise confidence and speed triage.

References:

OCSF Detection Finding defines normalized fields for alerts and detections. It is directly relevant when designing a finding schema for SIEM ingestion.
STIX 2.1 and TAXII 2.1 are OASIS standards for threat-intelligence representation and exchange. They help readers understand how threat-intel matches should carry structured context.
MISP Standard documents an open format and platform ecosystem for sharing threat intelligence and incident context. It helps teams keep threat-intel findings traceable before they become SIEM signals.

Tool I like: MISP gives analysts a place to store, enrich, correlate, and share threat intelligence. It keeps indicators tied to context instead of turning them into anonymous strings in a lookup table.

SaaS And Productivity

SaaS logs show activity inside the platforms where users communicate, store documents, track work, and manage business records. These systems hold data attackers want, and a compromised account can use them without creating traditional endpoint evidence.

The SIEM should collect mail and file activity, OAuth activity, administrative activity, and audit-retention changes. The same feed should preserve guest access and device context. For Microsoft 365, include mailbox access and send events.

SaaS telemetry helps detect business email compromise, OAuth app abuse, data exfiltration, insider access, and tenant-admin abuse. It answers a question identity logs cannot answer alone: which business data did the account touch?

References:

MITRE ATT&CK Cloud Matrix includes SaaS, Office Suite, and Identity Provider platforms. It helps readers tie SaaS audit events to attacker behavior.
RFC 9700: Best Current Practice for OAuth 2.0 Security gives the security model behind OAuth abuse patterns that appear in SaaS tenants. It supports collecting OAuth consent, token use, and application registration events.
NIST SP 800-53 Rev. 5 Audit and Accountability controls provide control language for audit event content, storage, review, and protection. They help justify SaaS audit retention and administrative-action logging in governance terms.

Tool I like: Maester turns Microsoft 365 and Entra security baselines into runnable tests. It helps teams catch configuration drift and missing audit controls before those gaps appear during an incident.

Applications And Data

Application and data logs show what happened inside the systems that run the business. They identify affected records and business transactions. In engineering systems, they also identify repositories, pipelines, artifacts, and backup jobs.

Prefer structured audit logs with actor, action, resource, and request context. Web logs should preserve authenticated user and forwarded client context where available. Database audit should cover privileged activity and access to sensitive data. Source-control and pipeline systems should record integrity changes. Backup systems should record destructive administrative actions.

Application and data logs help detect application-layer abuse, database exfiltration, repository compromise, pipeline tampering, and backup destruction. They also give the incident team the evidence needed to describe data access accurately.

References:

OWASP Logging Cheat Sheet gives concrete guidance on what application security logs should contain and what sensitive data should be excluded. Start here when designing first-party application audit events.
OWASP ASVS provides testable application security requirements. It helps turn application logging from an engineering preference into a verifiable requirement.
SLSA and in-toto give vendor-neutral models for software supply-chain integrity and provenance. Use them when deciding what CI/CD, artifact, and release events need to be logged.

Tool I like: OpenTelemetry provides vendor-neutral collection and correlation for logs, traces, and metrics. It helps application teams carry request context through services without inventing a new logging pattern for each system.

Containers, VDI, And MDM

Containers, VDI, and managed mobile devices need their own control-plane telemetry. A container workload may be short-lived. A VDI session may move through pooled infrastructure. A mobile device may expose only managed application, compliance, and configuration state.

Kubernetes audit should cover API server activity, sensitive resource access, identity and RBAC changes, and interactive operations such as exec or port forwarding. Admission decisions and runtime alerts add workload context.

VDI platforms should record session lifecycle, broker activity, and data redirection. MDM should record enrollment, compliance state, and device-to-user binding.

This telemetry helps detect Kubernetes privilege abuse, secret access, admission-policy bypass, VDI session abuse, and connection-broker changes. It also connects mobile compliance drift to identity activity.

References:

Kubernetes Auditing defines how kube-apiserver audit records answer who did what, when, against which resource, and from where. It is the primary reference for building Kubernetes control-plane audit policy.
NIST SP 800-190: Application Container Security Guide covers container risks across images, registries, orchestrators, hosts, and runtime. It helps readers decide which container events belong in the SIEM beyond application stdout.
NSA/CISA Kubernetes Hardening Guidance and NIST SP 800-124 Rev. 2 cover hardening and audit considerations for Kubernetes and enterprise mobile devices. They support the recommendation to collect control-plane, compliance, configuration, and device-to-user binding events.

Tool I like: Falco detects suspicious runtime behavior across hosts, containers, Kubernetes, and cloud environments. It gives teams readable alerts for workload behavior without requiring them to write low-level syscall collection code.

Turn The Inventory Into Work

Build a simple source inventory. Use the same three states for every category: present, partial, or absent.

For every partial source, name the blocker. Typical blockers are missing events, parser gaps, weak fields, or retention gaps.

Then connect detection work to collection, parsing, and retention. A detection backlog should say which log source it depends on and which parsed fields it needs.

If the source is partial, fix collection, parsing, or retention before expecting the rule to perform.

The backlog becomes concrete: collect the missing events, parse the fields analysts use, preserve the data long enough to investigate, and write detections that match the telemetry available in the SIEM.

EXTRA:

Log Collection and Correlation:

Once the inventory names which sources are present, partial, or absent, the next task is moving those events into the SIEM. Wazuh is one open-source way to do that across the categories above. It runs a lightweight agent on endpoints and servers, collects host and application logs, and forwards normalized events to a central manager.

Wazuh covers more than one category at a time. The agent reads endpoint telemetry such as process activity, file integrity changes, and local authentication. It also collects operating-system and application log files, and it can pull cloud-platform audit trails through provider integrations. For systems that cannot run an agent, the manager accepts Syslog so network and perimeter devices still reach the SIEM.

Wazuh decodes raw events into named fields and tags them with rule and MITRE ATT&CK context before they leave the pipeline. Send that normalized stream to your SIEM so analysts inherit consistent fields instead of re-parsing each source. Confirm two things for every category you route through it: that the events arrive with the fields analysts need, and that retention downstream is long enough to investigate after the first alert.

The collection priorities from the inventory still apply. Point the agent at the high-tier sources first, keep file integrity monitoring on the assets that hold regulated data, and verify that identity and asset context survives the trip into the SIEM. A collection layer earns its place only when the events it forwards can be joined to the rest of the record.

This piece was researched and written by Arbnor Mustafa, SOC Team Lead at Sentry. Arbnor leads Sentry's Security Operations Center, where he oversees threat detection, incident response, and the continuous monitoring that keeps applications secure at scale.

Log Curation 101

Tier The Sources By Investigation Value

Identity And Access

Endpoint

Cloud Platform

Network Infrastructure

Network Perimeter

Detection Findings

SaaS And Productivity

Applications And Data

Containers, VDI, And MDM

Turn The Inventory Into Work

EXTRA:

Log Collection and Correlation:

Read more

Android Deeplink Hijacking

Exploiting Tool and Function Calling in LLM Agents

Semantic vs. Token-Based LLM Injections

Tokenmaxxing