[c-nsp] transport path-mtu-discovery - ME3600....too unpredictable to use?

CiscoNSP List CiscoNSP_list at hotmail.com
Tue Feb 23 21:42:42 EST 2016


Hi Everyone,


Quick synopsis of our network, multiple pops, all connected via various 3rd party carriers, who all use differing MTUs, that can also "change" unexpectedly(Unavoidable unfortunately!)...hence, we have a few options, disable transport path-mtu-discovery, and run with the small 536 MTU default, or try setting a larger MTU, and hope the interpop links MTU doesnt drop below this, or use a "dynamic" approach, ala transport path-mtu-discovery.


Faced an unusual issue last night - 2 ME3600's, both connected together, and connected to an ASR1006 (POPD) peer with 2 RR's (ASR1K's)...both "had" transport path-mtu-discovery enabled, and had happily peered with the 2 RRs for ~50weeks....last night, one of engineers attempted to peer with MS/360 on "ME01", and caused the peering seesions from this ME to flap to the RR's.....and I assume transport path-mtu-discovery was then triggered to "re-calc" the optimum MTU to the RRs (This is one piece of info Im not sure on....when does transport path-mtu-discovery actually calc the MTU, what are the triggers for it to re-calc?)


Anyway, the value it ended up with, was too large, and BGP sessions to the 2 RR's would establish for 3minutes, fail, then re-establish ~9sec later....disabling transport path-mtu-discovery "fixed" this.


The thing that concerns/confuses me about transport path-mtu-discovery (And if it simply is unreliable on the ME's to use in our network), is that on the 2 ME's, both with the same path to the 2 RRs, transport path-mtu-discovery came up with 2 completely different MTU sizes.


PE01 (When it failed) - 1954 bytes

PE02 (Which is still is apparently using) - 2936 bytes


Now, ping tests from both these ME's show that 2936 bytes is absolutely not achievable (Where it got this number, I dont know)....but BGP is still up and running, and has been for 50 weeks....so it cant be using this MTU size?


The max I can get through from both PE's is 1552 (Output below from both ME's)...so Im guessing if PE02 should have any flap/issue, we will be hit with a similiar issue that occurred last night on PE01


PE01-EQ-SY3-L1H500160-R1803-RU38#ping xxx.xxx.xxx.213 size 1552 df-bit Type escape sequence to abort.
Sending 5, 1552-byte ICMP Echos to xxx.xxx.xxx.213, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/18/20 ms PE01-EQ-SY3-L1H500160-R1803-RU38#ping xxx.xxx.xxx.213 size 1553 df-bit Type escape sequence to abort.
Sending 5, 1553-byte ICMP Echos to xxx.xxx.xxx.213, timeout is 2 seconds:
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)


PE02-EQ-SY3-L1H500160-R1803-RU37#ping xxx.xxx.xxx.213 size 1552 df-bit Type escape sequence to abort.
Sending 5, 1552-byte ICMP Echos to xxx.xxx.xxx.213, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/19/20 ms PE02-EQ-SY3-L1H500160-R1803-RU37#ping xxx.xxx.xxx.213 size 1553 df-bit Type escape sequence to abort.
Sending 5, 1553-byte ICMP Echos to xxx.xxx.xxx.213, timeout is 2 seconds:
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)



Any insight/recommendations are highly appreciated......as it stands now, I dont think we have any other choice than to completely remove transport path-mtu-discovery, and run with the small 536byte default...not ideal, but Im at a loss how transport mtu disc actually "works out" the MTU it decides on....from my limited experience with it, lol, it appears to pick a number at random (I know this cant be the case)



Cheers




More information about the cisco-nsp mailing list