[c-nsp] MTU and PMTUD

Marcin Kurek md.kurek at gmail.com
Thu Dec 8 04:25:20 EST 2022


Hi Saku,

> To handle NIC received packets you can do two things
>
> a) CPU can get interrupt, and handle the interrupt
> b) Interrupts can be disabled, and CPU can poll to see if there are
> packets to process
>
> The mechanism a) is the norm and the mechanism b) is modernish. To
> improve PPS performance under heavy rate, at cost of increasing jitter
> and latency because it takes variable time to pick up packet. In
> software based routers, like VXR, if you had precise enough (thanks
> Creanord!) measurements of network performance, you could observe
> jitter during rancid (Thanks Heas!) collections, because 'show run'
> and 'write' raises interrupts, which stops packet forwarding.

Interesting, but why would 'sh run' or 'write' raise an interrupt?
Isn't this a branch in code that handles the CLI?
I'm not sure if I'm reading it right - on the one hand, the interrupts are
disabled, but on the other hand, some CLI commands actually raise them?

> So less PPS, less interrupt, might be one contributing factor. I don't
> know what the overhead cost of processing packets is, but intuitively
> I don't expect much improvement with large MTU BGP packets. And at any
> rate, going above 4k would mean newish features you don't have. But I
> don't have high confidence in being right.

Would you mind elaborating on why going above 4k would mean "newish
features" and what are they?

>MSS is 'negotiated' to the smallest. Much like BGP timers are
>'negotiated' to the smallest (so your customer controls your BGP
>timers, not you). Does this help to explain what you saw?

Right, MSS should be 'negotiated' to the smallest. But what I'm referring
to is a situation where negotiated MSS value depends on who is initiating
the BGP session.

Scenario:

CSR1kv (12.0.0.13) ----------(iBGP)----------------- ASR9006 (12.0.0.7)

If I clear the BGP session on CSR1kv, resulting MSS is 1240.

Logs from CSR1kv:

*Dec  8 11:17:15.453: TCB7FFB9A6D64C0 bound to 12.0.0.13.20794
*Dec  8 11:17:15.453: Reserved port 20794 in Transport Port Agent for TCP
IP type 1
*Dec  8 11:17:15.453: TCP: pmtu enabled,mss is now set to 8936
*Dec  8 11:17:15.453: TCP: sending SYN, seq 1638888268, ack 0
*Dec  8 11:17:15.453: TCP0: Connection to 12.0.0.7:179, advertising MSS 8936
*Dec  8 11:17:15.453: TCP0: state was CLOSED -> SYNSENT [20794 ->
12.0.0.7(179)]
*Dec  8 11:17:15.456: TCP0: state was SYNSENT -> ESTAB [20794 ->
12.0.0.7(179)]
*Dec  8 11:17:15.456: TCP: tcb 7FFB9A6D64C0 connection to 12.0.0.7:179,
peer MSS 1240, MSS is 1240
*Dec  8 11:17:15.456: TCB7FFB9A6D64C0 connected to 12.0.0.7.179
*Dec  8 11:17:15.456: TCB7FFB9A6D64C0 setting property TCP_NO_DELAY (0)
7FFBF2CCED7C
*Dec  8 11:17:15.456: TCB7FFB9A6D64C0 setting property TCP_RTRANSTMO (36)
7FFBF2CCED7C
*Dec  8 11:17:18.081: TCP0: RETRANS timeout timer expired
*Dec  8 11:17:18.081: 12.0.0.13:20794 <---> 12.0.0.7:179   congestion
window changes
*Dec  8 11:17:18.081: cwnd from 1240 to 1240, ssthresh from 65535 to 2480
*Dec  8 11:17:18.081: TCP0: timeout #1 - timeout is 5250 ms, seq 1638888269
*Dec  8 11:17:18.081: TCP: (20794) -> 12.0.0.7(179)
*Dec  8 11:17:18.084: %BGP_SESSION-5-ADJCHANGE: neighbor 12.0.0.7 IPv4
Unicast topology base removed from session  Capability changed
*Dec  8 11:17:18.084: %BGP-5-ADJCHANGE: neighbor 12.0.0.7 Up
CSR1000v#show bgp vpnv4 unicast all neighbors 12.0.0.7 | i segment
Maximum output segment queue size: 50
Datagrams (max data segment is 1240 bytes):

So here CSR1kv is initiating the connection to XR box advertising MSS 8936
(as expected).
However, peer MSS is 1240, which is not quite expected, considering XR
config:

neighbor 12.0.0.13
  remote-as 12
  tcp mss 8936
  update-source Loopback0
  address-family vpnv4 unicast
  !
  address-family ipv4 rt-filter
  !
  address-family l2vpn evpn
  !

If I clear the BGP session on XR, resulting MSS is 8936.

Again, logs from CSR1kv:

*Dec  8 11:20:22.918: TCB7FFB9A6CF7A8 created
*Dec  8 11:20:22.918: TCP0: state was LISTEN -> SYNRCVD [179 ->
12.0.0.7(38087)]
*Dec  8 11:20:22.918: TCP: tcb 7FFB9A6CF7A8 connection to 12.0.0.7:38087,
peer MSS 8936, MSS is 516
*Dec  8 11:20:22.918: TCP: pmtu enabled,mss is now set to 8936
*Dec  8 11:20:22.918: TCP: sending SYN, seq 1440899137, ack 3532544296
*Dec  8 11:20:22.918: TCP0: Connection to 12.0.0.7:38087, advertising MSS
8936
*Dec  8 11:20:22.921: TCP0: state was SYNRCVD -> ESTAB [179 ->
12.0.0.7(38087)]
*Dec  8 11:20:22.921: TCB7FFBF25C07E0 accepting 7FFB9A6CF7A8 from
12.0.0.7.38087
*Dec  8 11:20:22.921: TCB7FFB9A6CF7A8 setting property TCP_VRFTABLEID (20)
7FFB9A0B9820
*Dec  8 11:20:22.921: TCB7FFB9A6CF7A8 setting property TCP_PMTU (45)
7FFBF2CCED00
*Dec  8 11:20:22.921: TCB7FFB9A6CF7A8 setting property TCP_NO_DELAY (0)
7FFBF2CCED60
*Dec  8 11:20:22.921: TCB7FFB9A6CF7A8 setting property TCP_ACK_RATE (37)
7FFBF2CCED5C
*Dec  8 11:20:22.921: TCB7FFB9A6CF7A8 setting property TCP_RTRANSTMO (36)
7FFBF2CCED60
*Dec  8 11:20:22.922: %BGP_SESSION-5-ADJCHANGE: neighbor 12.0.0.7 IPv4
Unicast topology base removed from session  Capability changed
*Dec  8 11:20:22.926: %BGP-5-ADJCHANGE: neighbor 12.0.0.7 Up

CSR1000v#show bgp vpnv4 unicast all neighbors 12.0.0.7 | i segment
Maximum output segment queue size: 50
Datagrams (max data segment is 8936 bytes):

Here it's the other way around, XR is the client, and XE is the server and
apparently XR has advertised MSS=8936.

But maybe I don't understand how this command 'tcp mss' is supposed to work
:)


Kind regards,
Marcin


More information about the cisco-nsp mailing list