Solving the MetalLB BGP Ghost: Why My Router Reboot Breaks the Peer

When running MetalLB on Kubernetes with a Home Lab setup (like OpenWrt), I expected a robust network. However, I faced a frustrating issue: the BGP session dies after a router reboot and refuses to reconnect unless the MetalLB Speaker Pod is deleted.

After deep-diving into packet flows, nftables, and FRR (the engine inside MetalLB) logs, I found the culprit.

The Symptoms

Stuck in Active: MetalLB shows BGP state = Active, meaning it’s trying to connect but failing.
BFD is Up: Surprisingly, BFD (UDP) is working fine, but BGP (TCP) is not.
The “No Path” Error: MetalLB logs show No path to specified Neighbor.

The Root Causes

1. The Firewall & Tailscale Interference

Modern OpenWrt uses fw4 (nftables). I use Tailscale and Mihomo (Clash), and they frequently refresh nat and mangle tables. During a reboot, if the BGP TCP handshake (Port 179) gets caught in a connection tracking (conntrack) race condition or gets tagged by a proxy rule, the connection hangs in a “zombie” state.

2. The Kernel vs. FRR Route Conflict

MetalLB’s FRR engine is strict. It looks at the Linux Kernel routing table. If it sees the route to the router as a Kernel route instead of a Connected route, it might refuse to start the BGP session for “security” reasons, thinking the neighbor isn’t actually direct.

The Solution

To fix this, I needed to bypass the strict “Directly Connected” check in MetalLB and protect the BGP traffic from the firewall.

1. OpenWrt: The “Untracked” Protection

I created a high-priority raw table to ensure BGP and BFD traffic are never touched by the proxy or conntrack.

table inet custom_bgp {
    chain raw_pre {
        type filter hook prerouting priority raw; policy accept;
        tcp dport 179 notrack
        tcp sport 179 notrack
        udp dport 3784 notrack  # BFD
    }
    chain filter_input {
        type filter hook input priority filter; policy accept;
        tcp dport 179 accept
        udp dport 3784 accept
    }
}

2. MetalLB: The `ebgpMultiHop` Hack

By setting ebgpMultiHop: true, we tell MetalLB: “Stop checking if the neighbor is directly connected. Just send the packets.” This bypasses the FRR internal route validation that usually causes the No path to Neighbor error.

Final Working Configuration

Router: Bird 3.x Config

On the router, we keep it simple. We let the router be Passive so it doesn’t get confused by multiple connection attempts during the reboot phase.

protocol bfd {
    interface "br-lan";
}

protocol bgp waukeen {
    local 192.168.1.1 as 65001;
    neighbor 192.168.1.2 as 65009;
    
    bfd on;
    graceful restart on;
    passive on; # Wait for the Speaker to initiate
    ipv4 {
        import all;
        export none;
    };
}

Kubernetes: MetalLB CRD

The magic happens here with ebgpMultiHop: true.

apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: openwrt
  namespace: metallb-system
spec:
  myASN: 65009
  peerASN: 65001
  peerAddress: 192.168.1.1
  holdTime: 15s
  keepaliveTime: 5s
  bfdProfile: bfdprofile
  ebgpMultiHop: true # The "Secret Sauce"
---
apiVersion: metallb.io/v1beta1
kind: BFDProfile
metadata:
  name: bfdprofile
  namespace: metallb-system
spec:
  receiveInterval: 380
  transmitInterval: 270

Conclusion

If your MetalLB BGP sessions are brittle, don’t just delete pods. Check your Directly Connected route status and protect your BGP port with notrack. Setting ebgpMultiHop was the simplest way to make the connection resilient against kernel routing table inconsistencies in my setup.

Tags: kubernetes