[cisco-voip] CMR and the SD-WAN

Anthony Holloway avholloway+cisco-voip at gmail.com
Mon Apr 30 11:27:43 EDT 2018


Yes, what James said, thank you for sharing this info.  I think I would
have given up at "counting f**king packet sequence numbers."

On Mon, Apr 30, 2018 at 10:13 AM James Buchanan <james.buchanan2 at gmail.com>
wrote:

> Painful as this was, hats off to you for writing this up and sharing. Much
> appreciated!
>
> On Mon, Apr 30, 2018 at 3:36 PM, Ryan Huff <ryanhuff at outlook.com> wrote:
>
>> So here is a *neat* little situation I ran into recently, and is worth
>> sharing and reading; if this saves a life it was worth the crap I had to go
>> through …..
>>
>>
>>
>> == The Scenario ==
>>
>>
>>
>>    - Expressway C/E 8.10.3 cluster over wan (2 Control Peers, 2 Edge
>>    Peers)
>>    - Customer deployed and managed SD-WAN solution in front of the Edge
>>    cluster to the Internet (with two separate transport carriers). I think it
>>    was Palos, but we’ll call it a whitebox’ed solution for our purposes
>>    - Using MRA and B2B Expressway configs
>>    - UAT for MRA and B2B is accepted and works great
>>
>>
>>
>> == The Problem ==
>>
>>
>>
>> The customer applies the zone/search rule config in Expressway for CMR
>> and notices that randomly, during a presentation session in the CMR, the
>> BFCP server (AKA, the WebEx meeting) will close the BFCP presentation to
>> the endpoint coming from the customer’s Expressway; all other BFCP clients
>> are still receiving the BFCP presentation. That’s right, it *appears*
>> that WebEx *kicked* the BFCP participant coming from the customer’s
>> Edge, but not because the BFCP server closed the session (all other
>> participants remain)! Although it was happening randomly’ish in length of
>> time into the presentation, it would always happen at some point to the
>> endpoint, generally around the 2 minute’ish mark.
>>
>>
>>
>> == The diagnosis ==
>>
>>
>>
>> Although random, a consistent’ish length would seem to suggest a timer /
>> re-invite of some flavor, and that would be wrong, as ultimately uncovered.
>> Sparing you all the gory tales of escalation and vendor bus underskirt
>> sliding; the issue was in fact, the SD-WAN solution itself.
>>
>>
>>
>> == The Explanation & The Fix ==
>>
>>
>>
>> What was happening is that every 120 seconds or so, the BFCP server
>> (WebEx meeting) would send a UDP BFCP packet to all the BFCP presentation
>> subscribers. The customer’s SD-WAN solution was *identifying* these
>> packets according to the customer (gotta love layer 7 capable firewalls
>> 😊) and queueing them onto a physically different link than which the
>> stream was on, thus creating *physical asymmetry, delay and latency*. I
>> specifically requested that all inspection capabilities be turned off for
>> the traffic but I guess that isn’t the same as “identifying the traffic” ….
>> Lol. In a TCP stream, this would likely be tolerated to a degree as packet
>> loss or delay and/or jitter and would simply re transmit ….. but we are
>> dealing with *UDP* here, no bueno.
>>
>>
>>
>> To resolve, the customer had to identify and classify the traffic and
>> force a active/failover transmission through the SD-WAN solution for that
>> traffic, rather than a “load balance” transmission behavior.
>>
>>
>>
>> == Sleuthing & The Closing ==
>>
>>
>>
>> In hind sight, seems simple and makes perfect sense right? However, when
>> your only visibility into the network is the Expressway servers themselves,
>> it can be *very* challenging to discover because at that point in the
>> topology, everything looks like it is coming from and going to the VIP on
>> the firewall pair. So how do you catch something like this when you can’t
>> see everything? *PCAPs*. *Literally counting f**king packet sequence
>> numbers for 6 hours and identifying a consistent pattern of packets coming
>> out of order and being “lost”.*
>>
>>
>>
>> -Ryan-
>>
>>
>>
>>
>>
>> _______________________________________________
>> cisco-voip mailing list
>> cisco-voip at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-voip
>>
>>
> _______________________________________________
> cisco-voip mailing list
> cisco-voip at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-voip
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20180430/fc671dfc/attachment.html>


More information about the cisco-voip mailing list