RBAC policy engines¶

OMCP ships with three interchangeable RBAC backends. All three honour the same PolicyEngine interface (evaluate, list, roles, kind) so the UI, the audit chain, and the per-route need() gate work identically regardless of which engine is active.

Engine	Selected by	Source of truth	Hot reload	Best for
Built-in	(default)	`mcp-server/src/auth/rbac.ts` `DEFAULT_POLICY`	no — code	demo, single-user, small teams
File	`OMCP_RBAC_POLICY_FILE=path`	YAML/JSON on disk	no — restart	teams that version-control their policy
OPA	`OMCP_OPA_URL=http://opa:8181`	Rego in OPA	yes — 5s cache TTL	enterprise / multi-product with central policy

OPA takes precedence over a file when both are set, unless OMCP_POLICY_ENGINE=builtin opts back into the built-in.

Built-in (default)¶

Zero config. The policy is the DEFAULT_POLICY map shipped in source. Visible at GET /api/policy (admin-gated) — the engine kind is builtin.

Use when: - You're running the demo or single-operator setup. - You don't need to deviate from the viewer / operator / admin / redaction:bypass shape.

File-backed policy¶

bash OMCP_RBAC_POLICY_FILE=/etc/observability-mcp/policy.yaml

File format:

yaml roles: viewer: - { resource: sources, action: read } - { resource: services, action: read } operator: - { resource: sources, action: write } - { resource: settings, action: write } admin: - { resource: users, action: delete } - { resource: redaction, action: bypass }

Strict validation: unknown resources, unknown actions, AND unexpected object keys all reject loudly at boot. A typo like tesource: doesn't silently produce an empty grant — the process exits with the typo identified.

File-supplied roles REPLACE the built-in of the same name. A custom admin does not inherit redaction:bypass from the built-in; you must re-grant it explicitly if you want it. This is intentional: an operator deploying a restrictive policy shouldn't silently inherit broader defaults.

Fail-closed: a malformed policy file aborts the boot regardless of OMCP_AUTH_ALLOW_FALLBACK. The alternative — silently reverting to the broader built-in — would defeat the purpose of a tightening override.

OPA engine¶

bash OMCP_OPA_URL=http://opa:8181 OMCP_OPA_PACKAGE=observability/authz # default OMCP_OPA_ROLES=admin,operator,viewer # for the Policy UI catalogue OMCP_OPA_TOKEN=<bearer> # optional, OPA --authentication=token

Wire format: OMCP POSTs /v1/data/${OMCP_OPA_PACKAGE} with

json { "input": { "roles": ["admin"], "resource": "sources", "action": "delete" } }

OPA must reply with either of:

json { "result": true }

json { "result": { "allowed": true, "reason": "granted by role admin", "permissions": [ { "resource": "sources", "action": "read" } ] } }

The rich shape lets the Policy UI render full per-role grant tables without OMCP needing to know the Rego internals. Plain boolean responses also work; the UI just shows an empty grant table.

Boot pre-warm¶

PolicyEngine.evaluate() is synchronous; OPA HTTP is not. The engine ships a 5s per-(roles, resource, action, tenant) cache. On a cache miss, evaluate() returns a conservative deny + async-fires a warm. To avoid that "warming-deny" for the very first user request, OMCP hits every (declared role × valid resource × valid action × known tenant) combo at boot.

Known tenants = "default" plus every value parsed from OMCP_KEY_TENANTS. With 3 roles × 10 resources × 4 actions × N tenants, the warm count scales linearly in N; for the typical case of 1–5 tenants OPA handles it in well under a second.

OIDC tenants only become known at session time, so the very first request from a brand-new OIDC tenant still pays one warming-deny per (role, resource, action). Operators that want zero warming- deny for OIDC-only deployments can list expected tenant names in OMCP_KEY_TENANTS even if no MCP credentials use them — the parser treats every value as an additional tenant to pre-warm.

The boot log reports:

[auth] OPA cache pre-warmed: 372 decisions cached for 3 role(s) × 3 tenants

A partial warm (e.g. transient OPA hiccup) logs the count + failure tally; gates retry on the first user-facing call anyway.

Try it locally¶

```bash make demo-opa

OPA at http://localhost:8181¶

OMCP at http://localhost:3002 (separate port from the default mcp-server¶

so you can run both side by side)¶

```

The example Rego at examples/opa/policy.rego reproduces the built-in DEFAULT_POLICY exactly so you can swap engines without losing access.

Redaction-bypass (cross-engine)¶

The two-gate redaction:bypass design (RBAC permission + OMCP_KEY_BYPASS_REDACTION credential allow-list + per-call arg) is identical across engines. The Rego file just needs to grant the admin role:

rego admin_grants := [..., {"resource": "redaction", "action": "bypass"}]

See docs/access-control.md for the full design.

Probing the live engine¶

GET /api/policy (admin-gated, users:delete) reflects the active engine and supports a dry-run for ad-hoc verdict probes — useful for debugging "why doesn't my tenant-conditional Rego rule fire?".

Snapshot:

```bash curl -s -b "omcp_session=$ADMIN_COOKIE" "$URL/api/policy" | jq '{engine, tenantAware}'

{ "engine": "opa:http://opa:8181", "tenantAware": true }¶

```

tenantAware reflects whether the active engine honours session.tenant on .evaluate(). The built-in / file-loaded engines ignore it (false); OPA threads it into the Rego input (true).

Dry-run a single verdict — tenant defaults to the caller's session tenant, an explicit ?tenant= override probes any tenant:

```bash

As tenant Acme, what does the engine say about sources:delete for the admin role?¶

curl -s -b "omcp_session=$ADMIN_COOKIE" \ "$URL/api/policy?roles=admin&resource=sources&action=delete&tenant=acme" | jq .

{ "dryRun": { "roles": ["admin"], "resource": "sources", "action": "delete",¶

"tenant": "acme", "allowed": true, "reason": "allowed by OPA" } }¶

```

If tenantAware is false and a Rego rule keyed on input.tenant isn't firing, the engine kind is the diagnostic — switch the gate plumbing to OPA mode.

Troubleshooting¶

"OPA decision pending (warming cache); request again"¶

The synchronous gate hit a cache miss. The first user call after a fresh OPA-mode boot can see this; subsequent calls within 5s use the warmed result. If it persists, the pre-warm at boot didn't reach OPA — check the boot log for [auth] OPA cache pre-warmed: and the OPA container's egress + auth.

"OPA query failed: HTTP 503 from ..."¶

OPA is down or the wrong URL. The engine caches the denial for ~1s so OMCP doesn't hammer a flapping OPA, then retries on the next gate. Once OPA recovers the cache populates naturally; no manual intervention.

"RBAC policy loaded from (...)" missing on boot¶

The file path didn't resolve. Check that OMCP_RBAC_POLICY_FILE is absolute, the volume mount is read-only-ok, and the YAML is valid (yq . <file> to confirm).

Policy UI shows "Policy view requires the users:delete permission"¶

You're signed in as a non-admin. The Policy tab is admin-only by design (it would otherwise reveal the full grant matrix to a viewer).