24 October 2023

Intermittent connectivity issues on podman? Check your rp_filter

I've been chasing a hairy issue these past few days: frequently connections to external hosts would just timeout. tcpdump on the host shows that during these event, TCP packets are being retrasmitted. It would only affect a container with a live app, and I was unable to replicate on an empty container or the host.

Troubleshooting

For thorough container network debugging, nsenter is a godsend, as it lets you to run host commands within the namespace of a container:

## Get network namespace of the container
# podman inspect container -f '{{ .NetworkSettings.SandboxKey }}'
/run/netns/netns-7e957d91-f30e-8da3-77e7-28d41d1dbc56

## Open a shell within this namespace
# nsenter --net=/run/netns/netns-7e957d91-f30e-8da3-77e7-28d41d1dbc56 bash

The following commands are all executed within this namespace.

This container I was having issues with is attached to two podman networks, which presents like this:

# ip route show
default via 10.89.3.1 dev eth0 proto static metric 100
default via 10.89.2.1 dev eth1 proto static metric 100
10.89.2.0/24 dev eth1 proto kernel scope link src 10.89.2.3
10.89.3.0/24 dev eth0 proto kernel scope link src 10.89.3.3

By default all podman networks have a default gateway, which struck me as strange and thought might cause some issues. Eventually, I was able to replicate the problem, intermittently, using ping:

# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=5.47 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=57 time=5.47 ms
64 bytes from 1.1.1.1: icmp_seq=5 ttl=57 time=5.44 ms
64 bytes from 1.1.1.1: icmp_seq=7 ttl=57 time=5.45 ms

Notice the icmp_seq field, it's losing every other packet. I tried running a tcpdump within this namespace, and half the ICMP requests were completely missing, as if never sent in the first place. How is this even possible?

The culprit

This issue is caused by the reverse path filter subsystem of the Linux network stack, which by default is set to operate in strict mode (rp_filter=1). From what I understand, the rp_filter checks if incoming packets have the correct IP of the interface they're coming in from, to avoid spoofed IP packets. In strict mode, packets with the incorrect IP address are silently dropped. Silently as in "not even tcpdump will log them", which makes troubleshooting very difficult.

I figure what's going on is that, since both networks have a default route, some packets are routed through one or the other network in round-robin fashion, and on the way back they appear on the other interface with the source IP of the first. Linux doesn't like that, by default, so it just drops them without any ceremony.

The solution

Simply set net.ipv4.conf.default.rp_filter=2 to enable loose, rather than strict, reverse path filtering, and restart the container and/or the machine.

This is a setting that from now on I'll enable by default on any production server I run. I still don't know why this isn't the default, or why podman doesn't document it in bright red letters. Hope this saves you a day of headaches.

View all posts