[cisco-voip] CMR and the SD-WAN

James Buchanan james.buchanan2 at gmail.com
Mon Apr 30 11:12:28 EDT 2018


Painful as this was, hats off to you for writing this up and sharing. Much
appreciated!

On Mon, Apr 30, 2018 at 3:36 PM, Ryan Huff <ryanhuff at outlook.com> wrote:

> So here is a *neat* little situation I ran into recently, and is worth
> sharing and reading; if this saves a life it was worth the crap I had to go
> through …..
>
>
>
> == The Scenario ==
>
>
>
>    - Expressway C/E 8.10.3 cluster over wan (2 Control Peers, 2 Edge
>    Peers)
>    - Customer deployed and managed SD-WAN solution in front of the Edge
>    cluster to the Internet (with two separate transport carriers). I think it
>    was Palos, but we’ll call it a whitebox’ed solution for our purposes
>    - Using MRA and B2B Expressway configs
>    - UAT for MRA and B2B is accepted and works great
>
>
>
> == The Problem ==
>
>
>
> The customer applies the zone/search rule config in Expressway for CMR and
> notices that randomly, during a presentation session in the CMR, the BFCP
> server (AKA, the WebEx meeting) will close the BFCP presentation to the
> endpoint coming from the customer’s Expressway; all other BFCP clients are
> still receiving the BFCP presentation. That’s right, it *appears* that
> WebEx *kicked* the BFCP participant coming from the customer’s Edge, but
> not because the BFCP server closed the session (all other participants
> remain)! Although it was happening randomly’ish in length of time into the
> presentation, it would always happen at some point to the endpoint,
> generally around the 2 minute’ish mark.
>
>
>
> == The diagnosis ==
>
>
>
> Although random, a consistent’ish length would seem to suggest a timer /
> re-invite of some flavor, and that would be wrong, as ultimately uncovered.
> Sparing you all the gory tales of escalation and vendor bus underskirt
> sliding; the issue was in fact, the SD-WAN solution itself.
>
>
>
> == The Explanation & The Fix ==
>
>
>
> What was happening is that every 120 seconds or so, the BFCP server (WebEx
> meeting) would send a UDP BFCP packet to all the BFCP presentation
> subscribers. The customer’s SD-WAN solution was *identifying* these
> packets according to the customer (gotta love layer 7 capable firewalls 😊)
> and queueing them onto a physically different link than which the stream
> was on, thus creating *physical asymmetry, delay and latency*. I
> specifically requested that all inspection capabilities be turned off for
> the traffic but I guess that isn’t the same as “identifying the traffic” ….
> Lol. In a TCP stream, this would likely be tolerated to a degree as packet
> loss or delay and/or jitter and would simply re transmit ….. but we are
> dealing with *UDP* here, no bueno.
>
>
>
> To resolve, the customer had to identify and classify the traffic and
> force a active/failover transmission through the SD-WAN solution for that
> traffic, rather than a “load balance” transmission behavior.
>
>
>
> == Sleuthing & The Closing ==
>
>
>
> In hind sight, seems simple and makes perfect sense right? However, when
> your only visibility into the network is the Expressway servers themselves,
> it can be *very* challenging to discover because at that point in the
> topology, everything looks like it is coming from and going to the VIP on
> the firewall pair. So how do you catch something like this when you can’t
> see everything? *PCAPs*. *Literally counting f**king packet sequence
> numbers for 6 hours and identifying a consistent pattern of packets coming
> out of order and being “lost”.*
>
>
>
> -Ryan-
>
>
>
>
>
> _______________________________________________
> cisco-voip mailing list
> cisco-voip at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-voip
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20180430/831483ae/attachment.html>


More information about the cisco-voip mailing list