A backend service is down
Runbook for a non-responding backend — the finance :3000 API timing out, the RunPod pod refusing connections, or a worker that's stopped. Per-host diagnosis and restart.
Symptom: A tool that talks to a specific backend keeps failing while the
rest of Leif works fine — finance_health_check times out, runpod_* returns
Connection refused, or a host’s worker won’t respond.
Likely cause
A single backend service is stopped (or, for RunPod, the pod is down / rotated). This is a one-service outage, not a Leif outage — if other namespaces work, Leif itself is fine. Several of these have a benign explanation that looks like a failure:
| Backend | ”Down” signal | Usually means |
|---|---|---|
Finance API (:3000 on nvrbackup) | finance_health_check connect timeout | The backend is stopped, not broken |
| RunPod GPU pod | runpod_* Connection refused | Pod is stopped or its SSH endpoint rotated |
| Pricing worker | imports stuck pending | Worker process died |
Diagnose
Finance API (:3000)
finance_health_check()
finance_app_execute_command(command="systemctl status <finance-service> --no-pager")
A connect timeout from finance_health_check means the :3000 backend isn’t
running — confirm with the service status via the finance_app_* tools (they’re
scoped to the finance tree; see finance).
RunPod pod
runpod_get_system_info()
Connection refused means the pod is stopped or RunPod assigned a new IP /
port since you last used it — the endpoint is ephemeral. This is not a tool
fault. Start the pod and confirm the current endpoint before relying on
runpod_* again (see Hosts).
Pricing worker
See Pricing import landed nothing — the stuck-pending path covers the worker restart.
Fix
-
Finance API stopped: start/restart it via the finance source tree, then re-check health.
finance_app_execute_command(command="sudo systemctl restart <finance-service>") finance_health_check() -
RunPod stopped/rotated: start the pod (RunPod console), then re-establish the current endpoint. Once it’s up,
runpod_get_system_info()should respond. -
Generic host service: use the host’s own family —
local_restart_serviceon Leif,shtops_restart_serviceon SHTops,pve_lxc_execfor a container — rather than the genericremote_*shell. See Shell & file tools.
Verify
Re-run the health read for the specific backend and confirm a clean response:
finance_health_check() # finance
runpod_get_system_info() # runpod
Related pages
- Hosts — the finance API, RunPod, and per-host routing
- finance —
finance_*(data) vsfinance_app_*(source tree) - Service Map — services and the tools that manage each
- Pricing import landed nothing — the worker case