Why Sampling Creates Observability Blind Spots

The mathematics of missed events

The probability of detecting at least one occurrence of an event follows the formula:

P(detection) = 1 - (1 - sampling_rate)^n_occurrences

Let's apply this to a real scenario: a 0.1% error rate (1 in 1,000 requests fail) with 1% sampling.

Requests	Expected Errors	P(catching at least 1)
23,000	23	~21%
100,000	100	~63%
230,000	230	~90%
460,000	460	~99%

At 1% sampling, you need approximately 230,000 requests to achieve 90% confidence of capturing at least one 0.1% error.

For a service handling 100 requests per second, that is 38 minutes of traffic before you have reasonable confidence of seeing an error that is already affecting 1 in 1,000 users.

Sampling techniques explained

Head-based sampling

Decision made at trace start using deterministic trace ID hashing.

Pros: Simple, efficient, guarantees complete traces
Cons: "Blind" to trace content. Cannot prioritize errors or high latency.

Most agents default to 10% or 10 traces/second head-based sampling.

Tail-based sampling

Decision deferred until trace completes. Evaluates full context before deciding.

Pros: Can always sample errors and high-latency traces
Cons: Requires stateful infrastructure, buffering, and routing by trace ID

Tail-based sampling is difficult to operate at scale. High-volume services may need dozens or hundreds of compute nodes just for sampling decisions.

Adaptive/dynamic sampling (Google Dapper approach)

Adjusts rate based on system load:

High-load services: 0.01% (1 in 10,000)
Moderate services: ~0.1%
Errors and rare endpoints: "Dynamically cranks sampling to 100%"

The Google Dapper paper notes: "For high-throughput services, aggressive sampling does not hinder most important analyses." But Google's scale is unusual. Most companies are not processing enough traffic for statistical sufficiency at 0.01% sampling.

Industry sampling statistics

Traffic Level	Typical Rate	Notes
Development	100%	Full visibility, low volume
Low volume (<100 req/s)	25-100%	Can often afford full traces
Medium volume	10-20%	Balance cost and visibility
High volume (>1000 req/s)	1-5%	Cost-driven
Ultra-high (Google scale)	0.01-0.1%	Statistical sufficiency at volume

Alibaba's 2025 research reveals the scale of modern tracing challenges: they generate 18.6-20.5 PB of trace data per day. Even with aggressive tail-based sampling, query miss rates for normal traces can reach 27.17%.

Impact on ML and anomaly detection

This is where sampling causes the most insidious problems.

ML models require representative training data to establish accurate behavioral baselines. Datadog's Watchdog requires a minimum of 2 weeks of historical data to train metric baselines. Netdata's ML trains on 6 hours of data and retrains every 3 hours.

If you sample during the baseline period, the ML model learns from an incomplete picture.

Consider what sampling misses:

Rare but important error classes that occur less than your sampling rate
Latency spikes that happen to not get sampled
Entire user segments with low traffic
Intermittent failures that only affect certain request patterns

The model cannot learn what it never sees. Anomaly detection trained on sampled data will have blind spots that mirror your sampling gaps.

OpenTelemetry sampling configuration

If you must sample, here is how to configure it properly in OpenTelemetry.

TraceIdRatioBased with ParentBased

from opentelemetry.sdk.trace.sampling import TraceIdRatioBased, ParentBased

# 10% sampling
sampler = ParentBased(root=TraceIdRatioBased(0.1))

Environment variables

export OTEL_TRACES_SAMPLER="parentbased_traceidratio"
export OTEL_TRACES_SAMPLER_ARG="0.1"

OTel Collector tail sampling

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: error-policy
        type: status_code
        status_code: {status_codes: ["ERROR"]}
      - name: latency-policy
        type: latency
        latency: {threshold_ms: 200}
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: {sampling_percentage: 5}

This configuration always keeps errors and high-latency traces while sampling 5% of everything else. Better than pure head-based sampling, but still misses events that do not match your policies.

The alternative: eliminate sampling entirely

Sampling exists because of economics. SaaS observability vendors charge per-GB, and egress costs add $0.135/GB minimum. At scale, full-fidelity collection becomes prohibitively expensive.

BYOC architecture changes the equation:

Data stays in your cloud. Zero egress costs.
No per-GB charges from the observability vendor.
Full-fidelity collection becomes economically viable.

Sampleless collects 100% of your telemetry because BYOC makes it cost-effective. Complete data means:

ML baselines trained on representative data
The trace you need is always there
No blind spots from sampling gaps
Accurate anomaly detection without data bias

Frequently asked questions

What sampling rate should I use?

There is no universal answer. Higher traffic requires lower rates to control costs: low volume (<100 req/s) can often use 25-100%, medium volume typically uses 10-20%, high volume (>1000 req/s) often drops to 1-5%. But every reduction increases the risk of missing important events.

Does tail-based sampling solve the problem?

Tail-based sampling helps by always capturing errors and high-latency traces, but it requires stateful infrastructure, buffering, and routing by trace ID. It is difficult to operate at scale and still misses events that do not match your defined policies.

How does sampling affect ML-based anomaly detection?

ML models require representative training data. If you sample during the baseline period, the model learns from an incomplete picture and may miss entire classes of normal or anomalous behavior. Datadog Watchdog requires 2 weeks of data; if that data is sampled, baselines are biased.

Conclusion

Sampling is a necessary compromise when economics force the choice between visibility and cost. But every reduction in sampling rate increases the probability of missing the exact event you need to debug a production issue.

The question is not whether to sample 1% or 5%. The question is whether you can afford to miss 99% or 95% of your data.

If you cannot, BYOC architecture makes full-fidelity collection economically viable. Sampleless collects everything because we believe observability should not require gambling on which data to keep.