[c-nsp] Help me research how commonly routers mangle packets

Saku Ytti saku at ytti.fi
Thu Sep 1 05:06:19 EDT 2016


Hey,

In two different networks, one Juniper, one Cisco, I've seen router
silently mangle packets in transit, calculate correct Ethernet FCS on
broken packet and forward it.

In MPLS network this means, that you'll only occasionally know about
this problem, when egress PE router notices IP-checksum-error. You
will only notice this, when the corruption happens to happen in the
20B of IP header. When corruption happens anywhere else or in IPv6 you
won't know about it at all.

If this is common failure-mode, then vendors likely can address this
problem, by calculating internalFCS on unchanging parts of the data on
ingressPHY and reverify it on egressPHY, giving us good confidence
that we'll catch broken HW.

This problem does not exist on-link or on L2, in both of these cases
we're protected by FCS over whole data, which is very strong
statistical guarantee that we're not breaking packets. This problem
only impacts L3, and particularly badly MPLS, as it makes finding the
culprit very hard.

If you are running MPLS labeled IPv4 network, I'd like you to check if
you're getting IP checksum errors from your core side:

JunOS Trio:
JunOS> show interfaces et-0/1/0 extensive |match "(index|incompletes)"
  Interface index: 183, SNMP ifIndex: 662, Generation: 186
    Errors: 646537, Drops: 0, Framing errors: 0, Runts: 0, Policed
discards: 0, L3 incompletes: 646537,
JunOS> start shell pfe network fpc0
RMPC0(JunOS vty)# show jnh ifd 183 stream
                 checksum: 0000000003450169 pkts, 0000002655295885 bytes



ASR9k:
IOSXR#show controllers np counters all | i "(Node|NP[0-9]|CHECKSUM)"
                Node: 0/0/CPU0:
Show global stats counters for NP0, revision v2
Show global stats counters for NP1, revision v2
Show global stats counters for NP2, revision v2
Show global stats counters for NP3, revision v2
Show global stats counters for NP4, revision v2
Show global stats counters for NP5, revision v2
Show global stats counters for NP6, revision v2
 142  PARSE_DROP_IPV4_CHECKSUM_ERROR                         24168           0
Show global stats counters for NP7, revision v2
 142  PARSE_DROP_IPV4_CHECKSUM_ERROR                            34           0
                Node: 0/1/CPU0:
Show global stats counters for NP0, revision v2
Show global stats counters for NP1, revision v2
Show global stats counters for NP2, revision v2
Show global stats counters for NP3, revision v2
Show global stats counters for NP4, revision v2
Show global stats counters for NP5, revision v2
                Node: 0/3/CPU0:
Show global stats counters for NP0, revision v3
Show global stats counters for NP1, revision v3
Show global stats counters for NP2, revision v3
Show global stats counters for NP3, revision v3
IOSXR#show controllers np ports np6 location 0/0/CPU0

                Node: 0/0/CPU0:
----------------------------------------------------------------

NP Bridge Fia                       Ports
-- ------ --- ---------------------------------------------------
6  --     3   TenGigE0/0/0/18 - TenGigE0/0/0/20
IOSXR#

- you won't know which of those ports is the culprit, but hopefully
there are few enough options to decide they're coming from MPLS
labeled interface.





If you want, you can further capture the broken packets:
https://gist.github.com/ytti/2323b019152eca6e05718bccd855566e
https://gist.github.com/ytti/436fe3b602a963acf21e

Blog I wrote when I originally saw this in Juniper network:
http://blog.ip.fi/2014/02/junos-l3-incompletes-what-and-why.html

Security implication this may have:
http://dinaburg.org/bitsquatting.html


If you do see the problem, I'd be very happy to talk to you and help
you with the issue.

Thanks!
-- 
  ++ytti


More information about the cisco-nsp mailing list