Prometheus connector¶

Default queries target prom-client conventions — the de-facto standard for Node.js/Express instrumentation. Most apps that expose /metrics via prom-client work out of the box without any source-level overrides.

Default metrics¶

For each synthetic metric the connector probes a list of candidate series and picks the first one that actually exists in the backend. This makes the same MCP work for prom-client apps and node_exporter hosts without per-source configuration.

Metric	First candidate (prom-client)	Fallback (node_exporter)
`cpu`	`rate(process_cpu_seconds_total[1m]) * 100`	`100 - avg(rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100`
`memory`	`process_resident_memory_bytes`	`node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes`
`request_rate`	`sum(rate(http_requests_total[1m]))`	— (HTTP-app concept)
`error_rate`	`sum(rate(http_requests_total{status=~"5.."}[1m]))`	—
`latency_p99`	`histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[1m])) by (le))`	—
`latency_p50`	same with quantile `0.50`	—
`latency_avg`	`sum(rate(_sum[1m])) / sum(rate(_count[1m]))`	—

The probe queries /api/v1/series?match[]=<seriesName>{<label>="<service>"} for each candidate per service (cached 60 s), so a Prometheus that holds both prom-client apps and node_exporter hosts in the same instance still picks the right candidate per service. The selected candidate is reflected in the response's resolvedSeries field.

Dynamic label resolution¶

The {{selector}} placeholder is resolved at query time. The connector probes a list of labels and uses the first one that contains the requested service name as a value:

job
service
app
service_name

So query_metrics(service="my-app") issues /api/v1/label/job/values, finds my-app, then runs ... process_cpu_seconds_total{job="my-app"} ....

If no label matches, the first label in the list is used as a fallback. Override the order via PROMETHEUS_SERVICE_LABELS:

bash PROMETHEUS_SERVICE_LABELS=service,job npx @thotischner/observability-mcp

Label values are cached per-label for 60 seconds.

Per-instance breakdown (`groupBy`)¶

When a service is scraped on multiple targets (dev + prod, k8s replicas, etc.), the default queries collapse all of them into one number. Pass groupBy to break the result down by any label:

query_metrics(service="api", metric="cpu", groupBy="instance")

The connector swaps the synthetic-metric template for the groupedQuery variant — e.g. sum(rate(...)) becomes sum by(instance) (rate(...)), and histogram_quantile(... by (le)) becomes ... by (le, instance). The response shape becomes:

json { "metric": "cpu", "groupBy": "instance", "groups": [ { "key": "prod-vm-1:9100", "values": [...], "summary": {...} }, { "key": "dev-vm-1:9100", "values": [...], "summary": {...} } ], "values": [...], // first group's series, kept for back-compat "summary": {...} }

If only one group exists, the groups array is omitted and the result keeps the single-series shape.

When you call query_metrics without groupBy and the underlying series has more than one distinct instance (or pod) value for that service, the response includes a hint field telling you the breakdown is available:

json "hint": "2 distinct instances exist for this service. Pass groupBy=\"instance\" to break the result down."

`resolvedSeries` and `resolvedLabel`¶

Every query_metrics response includes the actual PromQL executed and the label that was matched. When results look surprising, check these first.

json { "metric": "cpu", "values": [...], "resolvedSeries": "rate(process_cpu_seconds_total{ job=\"my-app\" }[1m]) * 100", "resolvedLabel": "job" }

Overriding a single metric¶

Source-level metrics entries merge with defaults by name. Pin one metric without re-listing the rest:

yaml sources: - name: prometheus type: prometheus url: http://prometheus:9090 enabled: true metrics: - name: cpu query: 'my_custom_cpu_gauge{job="{{service}}"}' unit: percent description: Project-specific CPU metric

{{service}} (literal name) and {{selector}} (full label="value" pair) are both supported in custom queries.

Compatibility with managed Prometheus¶

The connector works with Grafana Cloud Mimir, AWS Managed Prometheus, and Chronosphere without flags. Health checks probe /api/v1/query?query=up and service discovery falls back to /api/v1/label/job/values when /api/v1/targets is unavailable.