Leif / MCP unreachable

Runbook for when Claude can't reach Leif at all — tool calls time out across the board. Covers the Cloudflare Tunnel and the FastMCP service on the Leif host.

Symptom: Claude can’t call Leif tools at all — every tool times out or errors, not just one namespace. The MCP integration looks dead.

Likely cause

The path is Claude → Cloudflare Tunnel (leif.super-ht.com / mcp.super-ht.com) → FastMCP server on the Leif host (10.10.0.25). A total outage is almost always one of:

The FastMCP service on the Leif host stopped or crashed.
The Cloudflare Tunnel (cloudflared) is down or lost its connection.
The Leif host itself is down (rare — it’s a Proxmox guest).

Diagnose

Work the path from the host outward.

1. Is the Leif host up and is FastMCP running? If any Leif tool works, use it; otherwise SSH to 10.10.0.25 directly.

local_execute_command(command="systemctl status leif-mcp --no-pager")
local_execute_command(command="systemctl status cloudflared --no-pager")

2. Is the tunnel connected? Check cloudflared’s recent log for the connection state:

local_execute_command(command="journalctl -u cloudflared --no-pager -n 50")

3. Is it DNS / the hostname? leif.super-ht.com and mcp.super-ht.com are load-bearing — Claude.ai’s MCP integration reaches Leif through them. If someone repointed or deleted the record, the tunnel breaks. Check the zone:

cf_list_dns_records(params={"zone_id": "<super-ht.com zone id>", "name": "leif"})

(If Leif is fully down, do this from the Cloudflare dashboard instead.)

Fix

FastMCP stopped: restart it on the Leif host. Existing Claude sessions reconnect on their own — old mcp-session-ids get a 404 with X-MCP-Reinitialize: true until the client re-initializes.
```
local_restart_service(service_name="leif-mcp")
```

Tunnel down: restart cloudflared.

local_restart_service(service_name="cloudflared")

DNS repointed: restore the leif / mcp records to the tunnel target. Don’t repoint these without updating the Tunnel configuration first — see Hosts.
Host down: start the guest from Proxmox — pve_lxc_start / pve_vm_list against node pve (see pve), or the Proxmox UI if Leif itself is the thing that’s down.

Verify

Call any cheap Leif tool and confirm a clean response:

get_time()

If that returns, the path is healthy end to end. For a deeper look, service_health() reports per-integration init status and recent tool-call error rates.

MCP Server Internals — sessions, restarts, and the service factory table
Architecture — the Claude → Tunnel → FastMCP path
Hosts — the Leif host and the load-bearing hostnames
cloudflare — the cf_* tools for the DNS/tunnel side
pve — bringing the guest back up if the host is down