[j-nsp] ISIS flaps caused by MX em0 MTU

Fri Aug 30 05:06:10 EDT 2024

Something I discovered a few days ago and I would like to understand if
others saw it and you mitigate it.
The em0/em3 (RE to PFE ) on Juniper MX is using a hardcoded MTU of 1500.
All of the traffic on that interface is encapsulated with Juniper TTP
protocol
As a result, any control packet which uses MTU of 1500 is fragmented (SSH
for example) and there are also some internal predefined telemetry sensors
that send large packets which are also fragmented.
I face this after some wired ISIS flaps from Juniper MX BNG to Cisco IOS-XR
devices that do hello padding by default but it can affect other protocols
too.
We saw that the packets arrived to the PFE but not to the RE and we tried
to find where it was dropped.
After some time, we found that the issue is related to the hello padding,
but it took us a long time to find that the issue is the fragmentation on
the PFE to RE
Now we are trying to find out if there is any storm of fragments that
causes these drops.

The device has a very strict lo0 filter but it is BNG so it still accepts
some control packets from subscribers.

Can you check the following command and see if you have fragments drops?

nitzan at MX> show system statistics ip | match frag | match drop
         684484 fragments dropped (dup or out of space)
         128375 fragment sessions dropped (queue overflow)
         21040072 fragments dropped after timeout

If anybody want to capture only the fragments you can use the following
pcap filter on that interface
monitor traffic interface em0 no-resolve count 100 matching "((ip[6:2] > 0)
and (not ip[6] = 64))"

Does anybody face a similar issue ?
How did you solve it?
Does anybody know why Juniper didn't increase the MTU on that interface
over the years (I believe it is some historical setting that no one wants
to change).

Nitzan