Anomaly history (since v2.x / Phase F15)¶

The gateway's deterministic anomaly detector (MAD + seasonality + correlator — implemented in mcp-server/src/analysis/) produces scores live. By default those scores live in process memory only — restart the gateway and the trail is gone. F15 adds an opt-in TSDB sink that mirrors every score to a Prometheus remote-write endpoint so post-mortem reconstruction can pull "what did the gateway see at 03:42?" via a normal PromQL query.

Enable¶

```bash export OMCP_ANOMALY_HISTORY_REMOTE_WRITE=https://tsdb.internal/api/v1/write

Optional auth:¶

export OMCP_ANOMALY_HISTORY_TOKEN=$BEARER_TOKEN

Optional extra headers:¶

export OMCP_ANOMALY_HISTORY_HEADERS="x-scope-org-id=tenant-a,x-extra=foo=bar" ```

In Helm:

yaml anomalyHistory: enabled: true remoteWriteUrl: https://tsdb.internal/api/v1/write token: "..." # or existingSecret: my-tsdb-token headers: "x-scope-org-id=tenant-a"

Wire format¶

One time-series sample per anomaly:

text omcp_anomaly_score{ service="payment", tenant="default", method="mad", # mad | seasonality | correlator severity="warn", # info | warn | critical signal="request_latency" # optional }

The sample value is the anomaly score (typically 0..1). Samples are batched in-process and flushed every 10 seconds; a buffer over 500 entries triggers a synchronous flush. The sink is best-effort — a sick TSDB never blocks the detector and never crashes the gateway. Failed flushes log once and drop the batch.

Query it back via `get_anomaly_history`¶

text get_anomaly_history(service="payment", duration="6h", method="mad")

The tool runs omcp_anomaly_score{service="payment",method="mad"} over the configured window against any Prometheus source the gateway already knows about. The operator must wire the round-trip: point a Prometheus instance at the same TSDB the writer pushes to, then add that Prometheus as an MCP source.

Why JSON, not the Prometheus protobuf?¶

The on-the-wire payload is currently JSON shaped like the Prometheus WriteRequest (labels + samples). A real Snappy-compressed protobuf client is a follow-up; until it lands, operators using TSDBs that only accept the protobuf form should front the gateway with a tiny shim (prom-aggregation-gateway, vmagent, or a custom 50-line collector). The JSON path is portable and any new TSDB that accepts the same shape (Mimir, VictoriaMetrics, Thanos, custom collector) works without code changes.

Operational notes¶

Detector hook is wired. The AnomalyHistory writer is fed from the live detect_anomalies path — every scan records its scores via the record() hook (detect-anomalies.ts, wired in index.ts), and they are queryable through get_anomaly_history. Externally-written omcp_anomaly_score metrics (e.g. from a sibling tool that produces them) are queryable too.
Retention is the TSDB's job. The gateway never deletes — configure the receiver's retention to match your post-mortem window (e.g. 30 days).
Health-tab sparkline (since v3.1). Each Health card renders an inline sparkline of the last hour of omcp_anomaly_score for that service, served from GET /api/health/anomaly-sparklines. The data comes from an in-process ring the AnomalyHistory sink keeps alongside the remote-write tier, so the sparkline works as soon as the gateway records anomalies — it does not require the remote-write TSDB round-trip. The series is tenant-scoped and capped to a one-hour window. When no scores have been recorded yet the card falls back to a short-lived client-side trend of the live health score. The remote-write sink remains the durable, queryable store (get_anomaly_history, Grafana, any PromQL client) for longer windows and post-mortems.