[c-nsp] BFD on ME3600/ME3800/7600s

James Bensley jwbensley at gmail.com
Fri May 27 09:28:00 EDT 2016


On 27 May 2016 at 07:29, Adam Vitkovsky <Adam.Vitkovsky at gamma.co.uk> wrote:
>> James Bensley
>> Sent: Thursday, May 26, 2016 4:37 PM
>>
>> Running BFD in no echo mode (which is what I usually do) we see the
>> average interval is a lovely 47ms (but I've never had 100% clarification of
>> why):
>> Rx Count: 3314443, Rx Interval (ms) min/max/avg: 1/72/47 last: 36 ms ago Tx
>> Count: 3310865, Tx Interval (ms) min/max/avg: 1/72/47 last: 40 ms ago
>>
>>
> Isn’t it because in no echo mode it does just half of the work it has to in echo mode?
>
> In no echo each participant has to just fire hellos at a given rate and reset hold-down timer upon receiving neighbour's hello.
>
> In echo mode each participant has to fire hellos at a given rate and reset timers upon receiving its own  hello.
> And in addition to that it has to receive loop back and transmit neighbour's hellos at the rate the neighbour is sending them.
>
> And according to your little test it looks like the looping back is lot more process intensive then just resetting holdup timer.

In no echo mode the local node is sending control packets at the
specified interval of 50ms to the remote node, the remote node is
checking that incomming packet and marking the link as "still up".

In echo mode the local node sends echo packets at the specified
interval of 50ms, the remote node loops them back in hardware (because
they are UDP packets they shouldn't be punted to the CPU) and the
local node receives and checks it's own echo packets.

So in no echo mode the local node originates packets and checks packet
received from the remote  node. In echo mode the local node originates
and checks its own packets only. Either way its basically the same
amoutn of work, and in both cases setting the timers to 50ms I would
expect to mean there must be some level of hardware offload here?


On 27 May 2016 at 08:41, Saku Ytti <saku at ytti.fi> wrote:
> On 27 May 2016 at 09:29, Adam Vitkovsky <Adam.Vitkovsky at gamma.co.uk> wrote:
> The design goal was opposite. Echo mode was supposed to be the
> performant and easy to implement in HW. Results look fishy, I've seen
> others report similar in NXOS.

This is what is confusing me (as above).

> I wouldn't bother with echo mode anyhow, much poorer interop than with
> control mode. And many platforms manage to control mode in HW just
> fine.

You say you wouldn't bother with echo mode, however if I disable echo
mode on the ASR9001 and ASR920, two devices that "support BFD in
hardware" as you can see in the below output from the ASR9001, the
ASR920 isn't managing to keep to the 50ms timmers and the ASR9001 is
just scraping by;

 Intervals between echo packets:
   Tx: Number of intervals=100, min=52 ms, max=54 ms, avg=52 ms
       Last packet transmitted 86 s ago
   Rx: Number of intervals=99, min=50 ms, max=5310 ms, avg=105 ms
       Last packet received 86 s ago
 Latency of echo packets (time between tx and rx):
   Number of packets: 100, min=1 ms, max=5 ms, avg=2940 us

It seems for these platforms that do "support BFD in hardware"
disabling echo mode switches back to CPU based forwarding of control
packets at the  configured interval (50ms) based on the outputs
directly above (how sloppy the timers have become, before they were
pretty much exactly 50ms). However for platforms that don't support
BFD in hardware, disabling echo modes "seems" to give the performance
one can only get with hardware accelerated forarding (as per the
outputs in my original email).

Further example...

7600 <> ME3600X with echo mode on (default in IOS) both configured
with 500ms interval and 3x multiplier:
7600 (WS-X6704-10GE):
Rx Count: 219, Rx Interval (ms) min/max/avg: 1/1000/873 last: 548 ms ago
Tx Count: 218, Tx Interval (ms) min/max/avg: 756/1000/877 last: 504 ms ago
ME3600X:
Rx Count: 1305954, Rx Interval (ms) min/max/avg: 1/1008/883 last: 680 ms ago
Tx Count: 1302978, Tx Interval (ms) min/max/avg: 1/1016/881 last: 932 ms ago

7600 <> ME3600X with no echo mode configured (still with 500ms and 3x
multiplier):
7600 (WS-X6704-10GE):
Rx Count: 50, Rx Interval (ms) min/max/avg: 1/512/408 last: 224 ms ago
Tx Count: 49, Tx Interval (ms) min/max/avg: 1/500/416 last: 144 ms ago
ME3600X:
Rx Count: 67, Rx Interval (ms) min/max/avg: 1/500/431 last: 176 ms ago
Tx Count: 68, Tx Interval (ms) min/max/avg: 1/512/423 last: 336 ms ago

For some reason these platforms behave in the exact opposite manner to
platforms that support hardware offload for BFD (echo mode). We've
been rolloing out ME3600s/ME3800s/7600s with echo mode disabled and
ASR920s and ASR9000s with echo mode enabled, so overall we get the
50ms failover effect as desired, but it would be nice to know whats
going on (how are we getting these surprisingly good results?) without
having to go to spend an age with TAC.

Cheers,
James.


More information about the cisco-nsp mailing list