Building the Observability Stack

Audiences: Replicator

This tutorial walks through deploying metrics, logs, and dashboards for your homelab - because you can’t fix what you can’t see.

The Stack

A complete observability solution has three pillars:

ComponentPurposeBlumeOps Uses
MetricsNumeric measurements over timeprometheus
LogsText output from applicationsloki
DashboardsVisualization and alertinggrafana
CollectionGathering and forwarding dataalloy

For BlumeOps specifics, see Observability Reference.

Step 1: Create Monitoring Namespace

kubectl create namespace monitoring

Step 2: Deploy Prometheus

Prometheus collects and stores metrics.

Using Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus \
  --namespace monitoring \
  --set server.persistentVolume.size=10Gi

Or via ArgoCD

Create an Application pointing to a values file in your repo:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: prometheus
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://prometheus-community.github.io/helm-charts
    chart: prometheus
    targetRevision: 25.0.0
    helm:
      values: |
        server:
          persistentVolume:
            size: 10Gi
  destination:
    server: https://kubernetes.default.svc
    namespace: monitoring

Verify

kubectl -n monitoring get pods -l app.kubernetes.io/name=prometheus

Step 3: Deploy Loki

Loki aggregates logs (like Prometheus but for logs).

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=10Gi

This also installs Promtail for log collection from pods.

Step 4: Deploy Grafana

Grafana provides dashboards and visualization.

helm install grafana grafana/grafana \
  --namespace monitoring \
  --set persistence.enabled=true \
  --set persistence.size=1Gi \
  --set adminPassword=admin  # Change this!

Configure Data Sources

After installation, add data sources in Grafana UI or via ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
  namespace: monitoring
  labels:
    grafana_datasource: "1"
data:
  datasources.yaml: |
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-server.monitoring.svc:80
      isDefault: true
    - name: Loki
      type: loki
      url: http://loki.monitoring.svc:3100

Step 5: Access Grafana

Expose via Tailscale:

kubectl -n monitoring port-forward svc/grafana 3000:80 &
tailscale serve --bg --https 3000 http://localhost:3000

Or create an Ingress.

Default credentials: admin / (password you set or retrieve from secret)

Step 6: Add Dashboards

Import community dashboards from grafana.com/grafana/dashboards:

DashboardIDShows
Node Exporter Full1860Host metrics
Kubernetes Cluster7249Cluster overview
Loki Logs13639Log exploration

In Grafana: Dashboards > Import > Enter ID

Step 7: Deploy Alloy (Optional)

Grafana Alloy is a unified collector that replaces multiple agents (Promtail, node_exporter, etc.).

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: alloy
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://grafana.github.io/helm-charts
    chart: alloy
    targetRevision: 0.1.0
    helm:
      values: |
        alloy:
          configMap:
            content: |
              // Alloy configuration here
  destination:
    server: https://kubernetes.default.svc
    namespace: monitoring

BluemeOps uses Alloy on both indri (for host metrics, via Ansible role) and in the cluster (for pod logs and service probes).

What You Now Have

  • Metrics collection and storage (Prometheus)
  • Log aggregation (Loki)
  • Dashboards and visualization (Grafana)
  • Foundation for alerting

Adding Alerts

Configure alerting rules in Prometheus:

groups:
- name: example
  rules:
  - alert: HighMemoryUsage
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High memory usage detected"

And notification channels in Grafana (email, Slack, PagerDuty, etc.).

Next Steps

  • Create custom dashboards for your services
  • Set up alerting for critical conditions
  • Add service-specific metrics exporters

BluemeOps Specifics

BlumeOps’ observability setup includes:

  • Prometheus scraping all services via annotations
  • Loki collecting logs from all pods and indri services
  • Custom dashboards for jellyfin, teslamate, and cluster health
  • alloy running on both host and in-cluster

See Observability Reference for full details.

Troubleshooting

ProblemSolution
No metrics appearingCheck Prometheus targets (/targets endpoint)
No logs in LokiVerify Promtail/Alloy is collecting (/ready endpoint)
Dashboard shows no dataCheck data source configuration and time range
High storage usageAdjust retention settings in Prometheus/Loki