[j-nsp] Soft removal of traffic from AE?

adamv0025 at netconsultings.com adamv0025 at netconsultings.com
Fri Oct 28 19:54:57 EDT 2016


> Of Saku Ytti
> Sent: Friday, October 28, 2016 9:20 PM
> 
> On 28 October 2016 at 15:06, Eugeniu Patrascu <eugen at imacandi.net>
> wrote:
> > If you use LACP on the link, to mitigate the packets loss, set it to
> > fast and then just yank the cable from the switch. The traffic will be
> > rehashed on the remaining links and at most you'll lose around 1
> > second worth of traffic.
> 
> LACP actually has provision for hitless addition and removal. However,
even if
> that is so, hash bucket will redistribute traffic, which may cause
reordering,
> which TCP stack will interpret as packet loss.
>
Wowowow let me stop you right there folks, 

Saku is right there in saying that LACP should have provisions for hitless
addition and removal of links from bundle. (not quite sure about removal
though, but I'll play along). 
But my experience is that's not how it works unfortunately. 

Let's talk about removal first. 
This process is governed by link failure detection and that can be done in
many ways, fastest is the physical layer LAN/WAN/OTN-PHY alarms -that will
kick in well before the micro-BFD or CFM on each bundle member link notices
a failure. If you have a switch in between or a device that won't pass
phy-layer alarms then you have to rely on BFD/CFM which can react within
50ms if configured so. And only if you don't have any of these available,
then the only method to find out whether a link should be withdrawn from
hashing are the LACPDUs -which even if tuned to fast rate will react after 3
seconds (unacceptable in modern networks). 
So yeah if you shut down/disable a direct patch between two devices then on
a good day(depending on how fast the HW can be reprogramed) both parties
will react promptly and there will be minimal packet loss. 
This is assuming a " hold-time down 0" of course. 
So as you can see this process is totally independent of LACP(except the
corner case where you'd rely on LACPDUs). 

Adding links to bundle is a whole another story, because well... bundles are
tricky. 
One would think how on earth could bundle allow a link to be added back to
hashing if LACP is not ready yet? Maybe it's a problem with implementation
of IEEE802.3ad in vendor's code or problem of when each side "receives a
final I'm ready" how long it takes to start accepting packets on that
link(yes even with "accept-data"), I don't know. 
The bottom line is that where you would expect 0 packet loss you'll actually
get this 100s to 1000s of milliseconds worth of packet loss depending on
configuration and LACP machine state (so somewhat unpredictable). 

So if possible just steer traffic away from the bundle if you are doing any
maintenance on it. 

adam



netconsultings.com
::carrier-class solutions for the telecommunications industry::

 



More information about the juniper-nsp mailing list