A backend service is down

Runbook for a non-responding backend — the finance :3000 API timing out, the RunPod pod refusing connections, or a worker that's stopped. Per-host diagnosis and restart.

Symptom: A tool that talks to a specific backend keeps failing while the rest of Leif works fine — finance_health_check times out, runpod_* returns Connection refused, or a host’s worker won’t respond.

Likely cause

A single backend service is stopped (or, for RunPod, the pod is down / rotated). This is a one-service outage, not a Leif outage — if other namespaces work, Leif itself is fine. Several of these have a benign explanation that looks like a failure:

Backend	”Down” signal	Usually means
Finance API (`:3000` on nvrbackup)	`finance_health_check` connect timeout	The backend is stopped, not broken
RunPod GPU pod	`runpod_*` `Connection refused`	Pod is stopped or its SSH endpoint rotated
Pricing worker	imports stuck `pending`	Worker process died

Diagnose

Finance API (`:3000`)

finance_health_check()
finance_app_execute_command(command="systemctl status <finance-service> --no-pager")

A connect timeout from finance_health_check means the :3000 backend isn’t running — confirm with the service status via the finance_app_* tools (they’re scoped to the finance tree; see finance).

RunPod pod

runpod_get_system_info()

Connection refused means the pod is stopped or RunPod assigned a new IP / port since you last used it — the endpoint is ephemeral. This is not a tool fault. Start the pod and confirm the current endpoint before relying on runpod_* again (see Hosts).

Pricing worker

See Pricing import landed nothing — the stuck-pending path covers the worker restart.

Fix

Finance API stopped: start/restart it via the finance source tree, then re-check health.

finance_app_execute_command(command="sudo systemctl restart <finance-service>")
finance_health_check()

RunPod stopped/rotated: start the pod (RunPod console), then re-establish the current endpoint. Once it’s up, runpod_get_system_info() should respond.
Generic host service: use the host’s own family — local_restart_service on Leif, shtops_restart_service on SHTops, pve_lxc_exec for a container — rather than the generic remote_* shell. See Shell & file tools.

Verify

Re-run the health read for the specific backend and confirm a clean response:

finance_health_check()      # finance
runpod_get_system_info()    # runpod

Hosts — the finance API, RunPod, and per-host routing
finance — finance_* (data) vs finance_app_* (source tree)
Service Map — services and the tools that manage each
Pricing import landed nothing — the worker case