Leif / MCP unreachable
Runbook for when Claude can't reach Leif at all — tool calls time out across the board. Covers the Cloudflare Tunnel and the FastMCP service on the Leif host.
Symptom: Claude can’t call Leif tools at all — every tool times out or errors, not just one namespace. The MCP integration looks dead.
Likely cause
The path is Claude → Cloudflare Tunnel (leif.super-ht.com /
mcp.super-ht.com) → FastMCP server on the Leif host (10.10.0.25). A total
outage is almost always one of:
- The FastMCP service on the Leif host stopped or crashed.
- The Cloudflare Tunnel (
cloudflared) is down or lost its connection. - The Leif host itself is down (rare — it’s a Proxmox guest).
Diagnose
Work the path from the host outward.
1. Is the Leif host up and is FastMCP running? If any Leif tool works, use
it; otherwise SSH to 10.10.0.25 directly.
local_execute_command(command="systemctl status leif-mcp --no-pager")
local_execute_command(command="systemctl status cloudflared --no-pager")
2. Is the tunnel connected? Check cloudflared’s recent log for the
connection state:
local_execute_command(command="journalctl -u cloudflared --no-pager -n 50")
3. Is it DNS / the hostname? leif.super-ht.com and mcp.super-ht.com are
load-bearing — Claude.ai’s MCP integration reaches Leif through them. If
someone repointed or deleted the record, the tunnel breaks. Check the zone:
cf_list_dns_records(params={"zone_id": "<super-ht.com zone id>", "name": "leif"})
(If Leif is fully down, do this from the Cloudflare dashboard instead.)
Fix
-
FastMCP stopped: restart it on the Leif host. Existing Claude sessions reconnect on their own — old
mcp-session-ids get a 404 withX-MCP-Reinitialize: trueuntil the client re-initializes.local_restart_service(service_name="leif-mcp") -
Tunnel down: restart
cloudflared.local_restart_service(service_name="cloudflared") -
DNS repointed: restore the
leif/mcprecords to the tunnel target. Don’t repoint these without updating the Tunnel configuration first — see Hosts. -
Host down: start the guest from Proxmox —
pve_lxc_start/pve_vm_listagainst nodepve(see pve), or the Proxmox UI if Leif itself is the thing that’s down.
Verify
Call any cheap Leif tool and confirm a clean response:
get_time()
If that returns, the path is healthy end to end. For a deeper look,
service_health() reports per-integration init status and recent
tool-call error rates.
Related pages
- MCP Server Internals — sessions, restarts, and the service factory table
- Architecture — the Claude → Tunnel → FastMCP path
- Hosts — the Leif host and the load-bearing hostnames
- cloudflare — the
cf_*tools for the DNS/tunnel side - pve — bringing the guest back up if the host is down