Troubleshooting Common Issues

Quick reference for diagnosing and fixing common BlumeOps issues.

General Health Check

Run the comprehensive service health check:

mise run services-check

This checks all services on indri and in Kubernetes.

Kubernetes Issues

Pod not starting

# Check pod status
kubectl --context=minikube-indri -n <namespace> get pods
 
# Describe pod for events
kubectl --context=minikube-indri -n <namespace> describe pod <pod>
 
# Check logs
kubectl --context=minikube-indri -n <namespace> logs <pod>
 
# Previous container logs (if restarting)
kubectl --context=minikube-indri -n <namespace> logs <pod> --previous

Common causes:

ImagePullBackOff - Image doesn’t exist or registry unreachable
CrashLoopBackOff - Application crashing; check logs
Pending - Insufficient resources or node issues
ContainerCreating - Waiting for volumes or secrets

ArgoCD sync issues

# Check app status
argocd app get <app>
 
# See what will change
argocd app diff <app>
 
# Force sync
argocd app sync <app> --force
 
# Sync with prune (removes deleted resources)
argocd app sync <app> --prune

App stuck in “Syncing”: Check if there are failed hooks or jobs:

kubectl --context=minikube-indri -n <namespace> get jobs
kubectl --context=minikube-indri -n <namespace> get pods --field-selector=status.phase=Failed

ArgoCD login expired:

argocd login argocd.ops.eblu.me --username admin --password "$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get srogeebssulhtb6tnqd7ls6qey --fields password --reveal)"

kubectl connection refused

# Check if minikube is running (on indri)
ssh indri 'minikube status'
 
# Restart if needed
ssh indri 'minikube start'
 
# Verify tailscale is serving the API
ssh indri 'tailscale serve status --json'

Indri Service Issues

Service not responding

# Check LaunchAgent status
ssh indri 'launchctl list | grep mcquack'
 
# Restart a LaunchAgent
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.<service>.plist'
ssh indri 'launchctl load ~/Library/LaunchAgents/mcquack.<service>.plist'
 
# Check service logs
ssh indri 'tail -50 ~/Library/Logs/mcquack.<service>.err.log'
ssh indri 'tail -50 ~/Library/Logs/mcquack.<service>.out.log'

Forgejo not accessible

# Check if forgejo is running
ssh indri 'lsof -nP -iTCP:3001 -sTCP:LISTEN'
 
# Check logs
ssh indri 'tail -50 ~/Library/Logs/mcquack.forgejo.err.log'
 
# Restart forgejo
ssh indri 'launchctl kickstart -k gui/$(id -u)/mcquack.forgejo'

Registry (Zot) issues

# Test registry API
ssh indri 'curl -s http://localhost:5050/v2/_catalog | jq'
 
# Check if zot is running
ssh indri 'lsof -nP -iTCP:5050 -sTCP:LISTEN'
 
# Restart zot
ssh indri 'launchctl kickstart -k gui/$(id -u)/mcquack.zot'

Network Issues

Service unreachable via *.ops.eblu.me

Caddy handles routing for *.ops.eblu.me:

# Check if Caddy is running
ssh indri 'launchctl list | grep caddy'
 
# View Caddy logs
ssh indri 'tail -50 ~/Library/Logs/caddy/access.log'
ssh indri 'tail -50 ~/Library/Logs/caddy/error.log'
 
# Restart Caddy
ssh indri 'launchctl kickstart -k gui/$(id -u)/homebrew.mxcl.caddy'

Tailscale MagicDNS not resolving

# Check tailscale serve status
ssh indri 'tailscale serve status --json'
 
# Restart tailscale if needed
ssh indri 'tailscale down && tailscale up'

Observability

Check metrics

# Open Grafana
open https://grafana.ops.eblu.me
 
# Check Prometheus directly
open https://prometheus.ops.eblu.me

Check logs

# Open Grafana Explore
open https://grafana.ops.eblu.me/explore
 
# Query Loki directly
curl -G 'https://loki.ops.eblu.me/loki/api/v1/query_range' \
  --data-urlencode 'query={service="<service>"}' \
  --data-urlencode 'limit=100'

Alloy (metrics/logs collector) issues

# Indri alloy (host metrics)
ssh indri 'launchctl list | grep alloy'
ssh indri 'tail -50 ~/Library/Logs/alloy/alloy.log'
 
# K8s alloy (pod logs)
kubectl --context=minikube-indri -n monitoring logs -l app=alloy

Database Issues

PostgreSQL connection failed

# Check CNPG cluster status
kubectl --context=minikube-indri -n databases get cluster
 
# Check PostgreSQL pods
kubectl --context=minikube-indri -n databases get pods -l cnpg.io/cluster=blumeops-pg
 
# Connect to database
kubectl --context=minikube-indri -n databases exec -it blumeops-pg-1 -- psql -U postgres

Backup Issues

Check backup status

# View latest backup info
ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom'
 
# Run backup manually
ssh indri 'borgmatic --verbosity 1'
 
# Check backup logs
ssh indri 'tail -100 /opt/homebrew/var/log/borgmatic/borgmatic.log'

observability - Metrics and logs
argocd - GitOps platform
cluster - Kubernetes cluster
routing - Service routing

BlumeOps Docs

Explorer

Troubleshooting

Troubleshooting Common Issues

General Health Check

Kubernetes Issues

Pod not starting

ArgoCD sync issues

kubectl connection refused

Indri Service Issues

Service not responding

Forgejo not accessible

Registry (Zot) issues

Network Issues

Service unreachable via *.ops.eblu.me

Tailscale MagicDNS not resolving

Observability

Check metrics

Check logs

Alloy (metrics/logs collector) issues

Database Issues

PostgreSQL connection failed

Backup Issues

Check backup status

Graph View

Table of Contents

Backlinks

BlumeOps Docs

Explorer

Troubleshooting

Troubleshooting Common Issues

General Health Check

Kubernetes Issues

Pod not starting

ArgoCD sync issues

kubectl connection refused

Indri Service Issues

Service not responding

Forgejo not accessible

Registry (Zot) issues

Network Issues

Service unreachable via *.ops.eblu.me

Tailscale MagicDNS not resolving

Observability

Check metrics

Check logs

Alloy (metrics/logs collector) issues

Database Issues

PostgreSQL connection failed

Backup Issues

Check backup status

Related

Graph View

Table of Contents

Backlinks