Runbook: PostgreSQL Cluster Unhealthy

Alert name: PostgresClusterUnhealthy

The CNPG collector metrics endpoint is down, indicating the PostgreSQL cluster is not responding.

Affected Services

The blumeops-pg CNPG cluster on indri’s minikube runs databases for:

TeslaMate
Authentik (cross-cluster from ringtail)
Immich
Grafana dashboards (TeslaMate datasource)

Diagnostic Steps

Check CNPG cluster status:

kubectl get cluster blumeops-pg -n databases --context=minikube-indri
kubectl get pods -n databases -l cnpg.io/cluster=blumeops-pg --context=minikube-indri

Check pod logs:

kubectl logs -n databases -l cnpg.io/cluster=blumeops-pg --context=minikube-indri --tail=30

Check if pg_isready:
```
pg_isready -h pg.ops.eblu.me -p 5432
```

Check PVC storage:

kubectl get pvc -n databases --context=minikube-indri

Common Causes

Pod crash — OOM, disk full, or configuration error
PVC storage full — check with kubectl exec into the pod and df -h
Minikube issue — if the node is under memory pressure, CNPG pods may be evicted
Network — Caddy L4 proxy (pg.ops.eblu.me) may be misconfigured

Silencing

For planned database maintenance:

Grafana → Alerting → Silences → Create Silence
Match alertname = PostgresClusterUnhealthy

postgresql — CNPG cluster reference
deploy-infra-alerting — Alerting pipeline overview

BlumeOps Docs

Explorer

Runbook: PostgreSQL Cluster Unhealthy

Runbook: PostgreSQL Cluster Unhealthy

Affected Services

Diagnostic Steps

Common Causes

Silencing

Graph View

Table of Contents

Backlinks

BlumeOps Docs

Explorer

Runbook: PostgreSQL Cluster Unhealthy

Runbook: PostgreSQL Cluster Unhealthy

Affected Services

Diagnostic Steps

Common Causes

Silencing

Related

Graph View

Table of Contents

Backlinks