ML/Analytics16 min read

What is Behavioral Analytics in Observability?

Static thresholds cannot keep up with dynamic systems. Behavioral analytics learns what "normal" looks like and alerts on meaningful deviations. Here is how it works and why data completeness matters.

The problem with static thresholds

Traditional alerting uses static thresholds: alert when CPU exceeds 80%, when error rate exceeds 1%, when latency exceeds 200ms.

This approach fails at scale for predictable reasons:

  • No seasonality awareness: 80% CPU at 2PM on Black Friday is normal. 80% CPU at 3AM on Tuesday is not.
  • Context blindness: A new deployment changes normal behavior. Static thresholds do not adapt.
  • Workload variance: Different services have different baselines. One threshold does not fit all.
  • Constant tuning: Teams spend hours adjusting thresholds that become stale within weeks.

The alert fatigue statistics

The consequences are measurable:

  • SOC teams receive an average of 4,484 alerts per day (Vectra 2023)
  • 67% of alerts are ignored due to false positives
  • Companies with 500-1,499 employees ignore 27% of all alerts (IDC)

When most alerts are noise, real problems get missed.

How behavioral baselines work

Behavioral analytics replaces static thresholds with learned baselines:

  1. Collect historical data: 2-6 weeks minimum for accurate patterns
  2. Analyze patterns: Daily cycles, weekly trends, seasonal variations
  3. Build statistical model: What is "normal" for this metric, at this time, for this entity?
  4. Alert on deviations: Flag significant departures from predicted baseline

Datadog describes their approach: "Anomaly detection monitors... account for seasonality by firing alerts according to a set amount of deviation from the observed pattern, rather than alerting on a fixed threshold."

The key insight: what matters is not the absolute value, but whether the value is unusual for this context.

The AIOps landscape

The AIOps market is growing rapidly, with varying estimates depending on definition:

  • $1.87B (2024) to $8.64B (2032) at 21.4% CAGR (Fortune Business Insights)
  • Gartner predicts large enterprises exclusively using AIOps will grow from 5% (2018) to 30% (2024)
  • 80% of AIOps vendors implementing generative AI by 2024-2025

Vendor capabilities comparison

Major observability vendors have built ML-powered analytics:

VendorProductKey Capabilities
DatadogWatchdogAnomaly detection, root cause analysis, faulty deployment detection. Requires 2 weeks metric history.
DynatraceDavis AIPredictive AI (forecasting), Causal AI (root cause via Smartscape topology), Davis CoPilot (natural language).
SplunkITSIAdaptive Thresholding, Trending (historical comparison), Entity Cohesion (peer group deviation).
MoogsoftAIOpsEvent correlation, noise reduction. Claims 85-99.7% noise reduction.

Entity-level scoring: moving beyond metrics

Individual metric alerting has a fundamental problem: one unhealthy service generates multiple separate alerts.

When a service degrades, you might see alerts for:

  • CPU utilization increased
  • Memory pressure elevated
  • Error rate exceeded threshold
  • Latency p99 degraded
  • Throughput decreased

These are five alerts for one problem. Entity-level scoring aggregates signals into a single health indicator:

Service_Health = f(error_rate, latency_p99, throughput_change, CPU, memory)

Instead of five alerts, you get one: "Service X is unhealthy."

What is an "entity"?

An entity is any logical component you want to track as a unit:

  • Service
  • Host
  • Container
  • Endpoint
  • Pod
  • Database
  • Network device

Entity-level scoring lets you ask "Is this thing healthy?" instead of "Are any of these 47 metrics outside their thresholds?"

Anomaly score vs risk score

Advanced behavioral analytics uses dual scoring systems:

Score TypeQuestionFactors
Anomaly ScoreHow unusual is this?Statistical deviation from baseline, ML confidence
Risk ScoreHow bad could this be?Business impact, blast radius, SLA implications

Combined prioritization

The real power comes from combining both scores:

Priority = Anomaly_Score × Risk_Score
ScenarioAnomalyRiskPriority
High anomaly on internal dev toolHighLowLow priority
Moderate anomaly on payment serviceModerateHighHigh priority
High anomaly on payment serviceHighHighUrgent

This approach surfaces what matters: unusual behavior on critical systems. Not every anomaly deserves attention—only anomalies with impact potential.

Anomaly detection methods

Different algorithms suit different use cases:

MethodData NeededComputeInterpretabilityTemporal
Z-ScoreLowMinimalHighNo
Holt-WintersModerateLowHighYes (seasonal)
Isolation ForestModerateLowMediumLimited
ProphetHigh (1yr+)ModerateHighYes
LSTMHighHighLowYes
AutoencoderHighHighLowLimited

No single method is best. Production systems often combine multiple approaches: fast statistical methods for real-time alerting, more sophisticated ML for deeper analysis.

Why complete data matters for ML

Here is the core problem with sampling for behavioral analytics:

ML models learn from training data. If you sample during the baseline period, the model learns from an incomplete picture.

Consider what 1% sampling misses:

  • Rare but important error classes that occur less than your sampling rate
  • Latency spikes that happen to not get sampled
  • Entire user segments with low traffic
  • Intermittent failures affecting specific request patterns

The model cannot learn what it never sees. Anomaly detection trained on sampled data has blind spots that mirror your sampling gaps.

Datadog Watchdog requires 2 weeks of historical data for metric baselines. If that data is 1% sampled, the baselines are biased toward high-volume patterns.

Security detection through behavioral analytics

Behavioral baselines are not just for performance—they detect security threats too:

  • Lateral Movement: User accounts accessing systems they have never touched before
  • Data Exfiltration: Outbound traffic patterns that deviate from baseline
  • Credential Misuse: Authentication patterns that do not match normal behavior

NETSCOUT notes: "Anomaly Detection: Tools that recognize unusual behavior, like an employee's account accessing systems they have never touched before, can flag potential lateral movement."

The same behavioral model that detects performance anomalies can surface security concerns. The difference is in the interpretation, not the detection.

Introducing ALBA

ALBA—Adaptive Learning Behavioral Analytics—is Sampleless's approach to entity-level scoring. Built to take advantage of complete data, ALBA provides:

  • Entity health scores: Aggregate multiple signals into a single indicator per service, host, or endpoint
  • Dual scoring: Anomaly scores for "how unusual" and risk scores for "how impactful"
  • Context-aware baselines: Time-of-day, day-of-week, and seasonal patterns
  • Continuous learning: Baselines adapt as your system evolves

Because Sampleless collects 100% of telemetry, ALBA trains on complete data. No sampling gaps. No biased baselines. The full picture.

OpenALBA

ALBA is built on OpenALBA—an open specification for entity-level behavioral scoring. We believe the industry needs shared standards for behavioral analytics, not another proprietary lock-in vector.

OpenALBA defines entity schemas, scoring algorithms, and export formats. Your behavioral data is yours, portable to any compatible system.

Frequently asked questions

What is the difference between anomaly score and risk score?

Anomaly score measures how unusual current behavior is compared to the baseline (statistical deviation). Risk score measures potential business impact (blast radius, SLA implications, revenue exposure). Combined priority = Anomaly × Risk. A high anomaly on a low-risk service is low priority; a moderate anomaly on a critical service is high priority.

How long does it take to train behavioral baselines?

Minimum 2-6 weeks of historical data for accurate baselines. Datadog Watchdog requires 2 weeks for metric baselines and 24 hours for logs. Shorter training periods miss weekly patterns and edge cases. This is why sampled data degrades ML accuracy—incomplete training data means biased baselines.

Can behavioral analytics detect security threats?

Yes. Observability signals can detect lateral movement (accounts accessing unusual systems), data exfiltration (outbound traffic anomalies), and credential misuse (authentication pattern anomalies). Behavioral baselines flag deviations regardless of whether the cause is operational or security-related.

See ALBA in action

Request a demo to see how behavioral analytics works with complete data.