[c-nsp] MTU and PMTUD
Marcin Kurek
md.kurek at gmail.com
Thu Dec 8 04:25:20 EST 2022
Hi Saku,
> To handle NIC received packets you can do two things
>
> a) CPU can get interrupt, and handle the interrupt
> b) Interrupts can be disabled, and CPU can poll to see if there are
> packets to process
>
> The mechanism a) is the norm and the mechanism b) is modernish. To
> improve PPS performance under heavy rate, at cost of increasing jitter
> and latency because it takes variable time to pick up packet. In
> software based routers, like VXR, if you had precise enough (thanks
> Creanord!) measurements of network performance, you could observe
> jitter during rancid (Thanks Heas!) collections, because 'show run'
> and 'write' raises interrupts, which stops packet forwarding.
Interesting, but why would 'sh run' or 'write' raise an interrupt?
Isn't this a branch in code that handles the CLI?
I'm not sure if I'm reading it right - on the one hand, the interrupts are
disabled, but on the other hand, some CLI commands actually raise them?
> So less PPS, less interrupt, might be one contributing factor. I don't
> know what the overhead cost of processing packets is, but intuitively
> I don't expect much improvement with large MTU BGP packets. And at any
> rate, going above 4k would mean newish features you don't have. But I
> don't have high confidence in being right.
Would you mind elaborating on why going above 4k would mean "newish
features" and what are they?
>MSS is 'negotiated' to the smallest. Much like BGP timers are
>'negotiated' to the smallest (so your customer controls your BGP
>timers, not you). Does this help to explain what you saw?
Right, MSS should be 'negotiated' to the smallest. But what I'm referring
to is a situation where negotiated MSS value depends on who is initiating
the BGP session.
Scenario:
CSR1kv (12.0.0.13) ----------(iBGP)----------------- ASR9006 (12.0.0.7)
If I clear the BGP session on CSR1kv, resulting MSS is 1240.
Logs from CSR1kv:
*Dec 8 11:17:15.453: TCB7FFB9A6D64C0 bound to 12.0.0.13.20794
*Dec 8 11:17:15.453: Reserved port 20794 in Transport Port Agent for TCP
IP type 1
*Dec 8 11:17:15.453: TCP: pmtu enabled,mss is now set to 8936
*Dec 8 11:17:15.453: TCP: sending SYN, seq 1638888268, ack 0
*Dec 8 11:17:15.453: TCP0: Connection to 12.0.0.7:179, advertising MSS 8936
*Dec 8 11:17:15.453: TCP0: state was CLOSED -> SYNSENT [20794 ->
12.0.0.7(179)]
*Dec 8 11:17:15.456: TCP0: state was SYNSENT -> ESTAB [20794 ->
12.0.0.7(179)]
*Dec 8 11:17:15.456: TCP: tcb 7FFB9A6D64C0 connection to 12.0.0.7:179,
peer MSS 1240, MSS is 1240
*Dec 8 11:17:15.456: TCB7FFB9A6D64C0 connected to 12.0.0.7.179
*Dec 8 11:17:15.456: TCB7FFB9A6D64C0 setting property TCP_NO_DELAY (0)
7FFBF2CCED7C
*Dec 8 11:17:15.456: TCB7FFB9A6D64C0 setting property TCP_RTRANSTMO (36)
7FFBF2CCED7C
*Dec 8 11:17:18.081: TCP0: RETRANS timeout timer expired
*Dec 8 11:17:18.081: 12.0.0.13:20794 <---> 12.0.0.7:179 congestion
window changes
*Dec 8 11:17:18.081: cwnd from 1240 to 1240, ssthresh from 65535 to 2480
*Dec 8 11:17:18.081: TCP0: timeout #1 - timeout is 5250 ms, seq 1638888269
*Dec 8 11:17:18.081: TCP: (20794) -> 12.0.0.7(179)
*Dec 8 11:17:18.084: %BGP_SESSION-5-ADJCHANGE: neighbor 12.0.0.7 IPv4
Unicast topology base removed from session Capability changed
*Dec 8 11:17:18.084: %BGP-5-ADJCHANGE: neighbor 12.0.0.7 Up
CSR1000v#show bgp vpnv4 unicast all neighbors 12.0.0.7 | i segment
Maximum output segment queue size: 50
Datagrams (max data segment is 1240 bytes):
So here CSR1kv is initiating the connection to XR box advertising MSS 8936
(as expected).
However, peer MSS is 1240, which is not quite expected, considering XR
config:
neighbor 12.0.0.13
remote-as 12
tcp mss 8936
update-source Loopback0
address-family vpnv4 unicast
!
address-family ipv4 rt-filter
!
address-family l2vpn evpn
!
If I clear the BGP session on XR, resulting MSS is 8936.
Again, logs from CSR1kv:
*Dec 8 11:20:22.918: TCB7FFB9A6CF7A8 created
*Dec 8 11:20:22.918: TCP0: state was LISTEN -> SYNRCVD [179 ->
12.0.0.7(38087)]
*Dec 8 11:20:22.918: TCP: tcb 7FFB9A6CF7A8 connection to 12.0.0.7:38087,
peer MSS 8936, MSS is 516
*Dec 8 11:20:22.918: TCP: pmtu enabled,mss is now set to 8936
*Dec 8 11:20:22.918: TCP: sending SYN, seq 1440899137, ack 3532544296
*Dec 8 11:20:22.918: TCP0: Connection to 12.0.0.7:38087, advertising MSS
8936
*Dec 8 11:20:22.921: TCP0: state was SYNRCVD -> ESTAB [179 ->
12.0.0.7(38087)]
*Dec 8 11:20:22.921: TCB7FFBF25C07E0 accepting 7FFB9A6CF7A8 from
12.0.0.7.38087
*Dec 8 11:20:22.921: TCB7FFB9A6CF7A8 setting property TCP_VRFTABLEID (20)
7FFB9A0B9820
*Dec 8 11:20:22.921: TCB7FFB9A6CF7A8 setting property TCP_PMTU (45)
7FFBF2CCED00
*Dec 8 11:20:22.921: TCB7FFB9A6CF7A8 setting property TCP_NO_DELAY (0)
7FFBF2CCED60
*Dec 8 11:20:22.921: TCB7FFB9A6CF7A8 setting property TCP_ACK_RATE (37)
7FFBF2CCED5C
*Dec 8 11:20:22.921: TCB7FFB9A6CF7A8 setting property TCP_RTRANSTMO (36)
7FFBF2CCED60
*Dec 8 11:20:22.922: %BGP_SESSION-5-ADJCHANGE: neighbor 12.0.0.7 IPv4
Unicast topology base removed from session Capability changed
*Dec 8 11:20:22.926: %BGP-5-ADJCHANGE: neighbor 12.0.0.7 Up
CSR1000v#show bgp vpnv4 unicast all neighbors 12.0.0.7 | i segment
Maximum output segment queue size: 50
Datagrams (max data segment is 8936 bytes):
Here it's the other way around, XR is the client, and XE is the server and
apparently XR has advertised MSS=8936.
But maybe I don't understand how this command 'tcp mss' is supposed to work
:)
Kind regards,
Marcin
More information about the cisco-nsp
mailing list