[c-nsp] Help me research how commonly routers mangle packets
Saku Ytti
saku at ytti.fi
Thu Sep 1 05:06:19 EDT 2016
Hey,
In two different networks, one Juniper, one Cisco, I've seen router
silently mangle packets in transit, calculate correct Ethernet FCS on
broken packet and forward it.
In MPLS network this means, that you'll only occasionally know about
this problem, when egress PE router notices IP-checksum-error. You
will only notice this, when the corruption happens to happen in the
20B of IP header. When corruption happens anywhere else or in IPv6 you
won't know about it at all.
If this is common failure-mode, then vendors likely can address this
problem, by calculating internalFCS on unchanging parts of the data on
ingressPHY and reverify it on egressPHY, giving us good confidence
that we'll catch broken HW.
This problem does not exist on-link or on L2, in both of these cases
we're protected by FCS over whole data, which is very strong
statistical guarantee that we're not breaking packets. This problem
only impacts L3, and particularly badly MPLS, as it makes finding the
culprit very hard.
If you are running MPLS labeled IPv4 network, I'd like you to check if
you're getting IP checksum errors from your core side:
JunOS Trio:
JunOS> show interfaces et-0/1/0 extensive |match "(index|incompletes)"
Interface index: 183, SNMP ifIndex: 662, Generation: 186
Errors: 646537, Drops: 0, Framing errors: 0, Runts: 0, Policed
discards: 0, L3 incompletes: 646537,
JunOS> start shell pfe network fpc0
RMPC0(JunOS vty)# show jnh ifd 183 stream
checksum: 0000000003450169 pkts, 0000002655295885 bytes
ASR9k:
IOSXR#show controllers np counters all | i "(Node|NP[0-9]|CHECKSUM)"
Node: 0/0/CPU0:
Show global stats counters for NP0, revision v2
Show global stats counters for NP1, revision v2
Show global stats counters for NP2, revision v2
Show global stats counters for NP3, revision v2
Show global stats counters for NP4, revision v2
Show global stats counters for NP5, revision v2
Show global stats counters for NP6, revision v2
142 PARSE_DROP_IPV4_CHECKSUM_ERROR 24168 0
Show global stats counters for NP7, revision v2
142 PARSE_DROP_IPV4_CHECKSUM_ERROR 34 0
Node: 0/1/CPU0:
Show global stats counters for NP0, revision v2
Show global stats counters for NP1, revision v2
Show global stats counters for NP2, revision v2
Show global stats counters for NP3, revision v2
Show global stats counters for NP4, revision v2
Show global stats counters for NP5, revision v2
Node: 0/3/CPU0:
Show global stats counters for NP0, revision v3
Show global stats counters for NP1, revision v3
Show global stats counters for NP2, revision v3
Show global stats counters for NP3, revision v3
IOSXR#show controllers np ports np6 location 0/0/CPU0
Node: 0/0/CPU0:
----------------------------------------------------------------
NP Bridge Fia Ports
-- ------ --- ---------------------------------------------------
6 -- 3 TenGigE0/0/0/18 - TenGigE0/0/0/20
IOSXR#
- you won't know which of those ports is the culprit, but hopefully
there are few enough options to decide they're coming from MPLS
labeled interface.
If you want, you can further capture the broken packets:
https://gist.github.com/ytti/2323b019152eca6e05718bccd855566e
https://gist.github.com/ytti/436fe3b602a963acf21e
Blog I wrote when I originally saw this in Juniper network:
http://blog.ip.fi/2014/02/junos-l3-incompletes-what-and-why.html
Security implication this may have:
http://dinaburg.org/bitsquatting.html
If you do see the problem, I'd be very happy to talk to you and help
you with the issue.
Thanks!
--
++ytti
More information about the cisco-nsp
mailing list