Ollama

LLM inference server with GPU acceleration. Runs on ringtail with declarative model management via a sidecar.

Quick Reference

Property	Value
URL	https://ollama.ops.eblu.me
Tailscale URL	https://ollama.tail8d86e.ts.net
Namespace	`ollama`
Cluster	ringtail k3s
Image	`ollama/ollama:0.17.5`
Upstream	https://github.com/ollama/ollama
Manifests	`argocd/manifests/ollama/`
API Port	11434

Architecture

models.txt (ConfigMap, declarative)
    │
    ▼
model-sync sidecar ──ollama pull──► Ollama server (GPU)
    │                                    │
    │ reads /config/models.txt           │ serves /api/*
    │ polls every 30 min                 │ NVIDIA runtime (RTX 4080, time-sliced)
    │                                    │
    └────────────────────────────────────┘
                     │
                /models (200 Gi hostPath PV)
                /mnt/storage1/ollama on ringtail

Models

Declared in argocd/manifests/ollama/models.txt. The model-sync sidecar pulls missing models on startup and every 30 minutes.

Model	Parameters
`qwen2.5:14b`	14B
`deepseek-r1:14b`	14B
`phi4:14b`	14B
`gemma3:12b`	12B

To add or remove models, edit models.txt and sync via ArgoCD.

GPU

Shares ringtail’s RTX 4080 with frigate via NVIDIA device plugin time-slicing (2 virtual slots). Constrained to one loaded model and one parallel request to avoid VRAM contention.

Setting	Value
`OLLAMA_MAX_LOADED_MODELS`	1
`OLLAMA_NUM_PARALLEL`	1
GPU limit	`nvidia.com/gpu: "1"` (time-sliced)

Storage

Mount	Backend	Size
`/models`	hostPath PV (`/mnt/storage1/ollama`)	200 Gi

PV reclaim policy is Retain — models survive PV deletion.

Networking

Endpoint	Reachable from
`https://ollama.ops.eblu.me`	Public internet (Fly.io → Caddy)
`https://ollama.tail8d86e.ts.net`	Tailnet clients
`http://ollama.ollama.svc.cluster.local:11434`	In-cluster (ringtail)

Tailscale ingress uses ProxyGroup ingress — no explicit host: field (see tailscale-operator).

frigate — Shares GPU via time-slicing
ringtail — Host node
apps — ArgoCD application registry
tailscale-operator — Tailscale ingress

BlumeOps Docs

Explorer

Ollama

Ollama

Quick Reference

Architecture

Models

GPU

Storage

Networking

Graph View

Table of Contents

Backlinks

BlumeOps Docs

Explorer

Ollama

Ollama

Quick Reference

Architecture

Models

GPU

Storage

Networking

Related

Graph View

Table of Contents

Backlinks