[cisco-voip] MRA DR / Resilience

Pawlowski, Adam ajp26 at buffalo.edu
Wed Jan 20 11:22:07 EST 2021


For SIP yes, but I can’t tell if it also works with UDS or not.  I’ve a question in to find out, but that may be dependent on future releases.

I’ve been going over what happened, and SSO introduces a new layer. A very helpful gentleman from TAC spend a bunch of time with me going over how Jabber and Expressway sort of handle this.

Expressway, at least in X12.6 where I’m at, has no understanding that a UCM node is down as far as UDS is concerned.
Requests can forward to downed UCM which will return a HTTP error status code to Jabber
HTTP transaction failures will cause re-auth to be triggered
Jabber can also want to talk to a UCM that isn’t there, sometimes repeatedly for some reason instead of choosing a new one

So, there are a number of reasons that may trigger a cycle where Jabber wants to verify it’s token validity, tries to talk to nothing a few times, then after about 3 retries it will give up and punt the user out.

It’s not clear from looking at the Jabber log (and going cross eyed in the process) if Jabber is aware that it can try a different UCM or not. It shows the URL being put on a block list, but, then it just uses it again anyway.

Still waiting to learn more about what happened.

Best,

Adam


From: ROZA, Ariel <Ariel.ROZA at LA.LOGICALIS.COM>
Sent: Monday, January 18, 2021 1:59 PM
To: ROZA, Ariel <Ariel.ROZA at LA.LOGICALIS.COM>; NateCCIE <nateccie at gmail.com>; Pawlowski, Adam <ajp26 at buffalo.edu>
Cc: cisco-voip at puck.nether.net
Subject: RE: [cisco-voip] MRA DR / Resilience

I just reread the release notes, and it includes the case where CUCM is down.

De: cisco-voip <cisco-voip-bounces at puck.nether.net<mailto:cisco-voip-bounces at puck.nether.net>> En nombre de ROZA, Ariel
Enviado el: lunes, 18 de enero de 2021 15:53
Para: NateCCIE <nateccie at gmail.com<mailto:nateccie at gmail.com>>; Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>>
CC: cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Asunto: Re: [cisco-voip] MRA DR / Resilience

But will this include the scenario were one of the CUCMs  is down? Don´t see explicitly in the notes…

De: cisco-voip <cisco-voip-bounces at puck.nether.net<mailto:cisco-voip-bounces at puck.nether.net>> En nombre de NateCCIE
Enviado el: miércoles, 13 de enero de 2021 10:56
Para: Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>>
CC: cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Asunto: Re: [cisco-voip] MRA DR / Resilience

SIP Registration Failover for Cisco Jabber - MRA Deployments

https://www.cisco.com/c/dam/en/us/td/docs/voice_ip_comm/expressway/release_note/Cisco-Expressway-Release-Note-X12-7.pdf#page16<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cisco.com%2Fc%2Fdam%2Fen%2Fus%2Ftd%2Fdocs%2Fvoice_ip_comm%2Fexpressway%2Frelease_note%2FCisco-Expressway-Release-Note-X12-7.pdf%23page16&data=04%7C01%7Cariel.roza%40la.logicalis.com%7C7102b260f7c543fc5d8c08d8bbe27944%7C2e3290cb8d404058abe502c4f58b87e3%7C0%7C0%7C637465928819010016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2hhQwkNYTqiqc6wDDUwV%2B%2BZUcfpKc%2Bpg3otGhRX5ePw%3D&reserved=0>

This is new in x12.7
Sent from my iPhone

On Jan 13, 2021, at 6:10 AM, Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>> wrote:

Hey all,

I’m playing in this scenario now and trying to figure out what parts of the solution work, and which do not, in a DR “site failover’ kind of scenario with regard to MRA.

I understand the documentation prescribes there’s no failover for voice and video, but I think that failover is different than the one I’m describing here.

I know I can take Expressway C and Expressway E nodes out of the cluster at will, and things will heal over time once the Jabber clients catch up.

I can take a Unity Connection guest down, and it should work, though the Jetty service certainly has load limits. I don’t think I’m hitting those here.

I can take an IM&P node down, and, with the exception of pChat services (DB was not deployed HA and merge job just seems to fail but that’s another investigation), clients will eventually fail over and recover.

Today, we have half the C  cluster, half the E cluster, and one of two CUC nodes down. All IMP are up. One UCM subscriber is down, and things have been going poorly. Jabber customers keep getting punted from the client with “Your session has expired” randomly. The Jabber log looks like this token has expired, but, doesn’t provide enough debugging to know why. It’s possible that the Expressway E is fronting this message, since I understand it sits between Jabber and the rest of the infrastructure for oAuth, and Jabber does not talk to the UCM/CUC directly.

When we did not have SSO, the worst thing we had to do is make sure that the Jabber client’s device pool had an active UCM as the primary in the CMGroup, as they wouldn’t register properly without that, but, those UCMs are up.

Does anyone know what might be going on here?

My best guess is that the Expressway isn’t intelligent enough to mark a UCM out of service when unreachable (or CUC server for that matter) and it is trying to refresh a customer’s token against a server that isn’t up. When this times out, instead of trying another it is telling Jabber the refresh token is expired. If this is the case, there’s no cluster resilience with Jabber, if any nodes are down then things are going to be intermittent.

Why does Jabber sometimes choose to pop the dialog asking for a new session, and sometimes it just kicks the customer out of the client requiring a new sign in? I see a bug that suggests enabling LegacyOAuthSignout parameter, but, it doesn’t explain what effect that’s going to have on the client.

Basically, this is just a test but I am trying to learn from it, and would appreciate any thoughts/experiences. If it is the Expressway cluster, then there’s no way around this as far as I can tell. Marking a UCM inactive with xAPI doesn’t work, it just gets pushed back to active.

Any comments appreciated.

Best,

Adam Pawlowski
SUNYAB NCS


_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fcisco-voip&data=04%7C01%7Cariel.roza%40la.logicalis.com%7C7102b260f7c543fc5d8c08d8bbe27944%7C2e3290cb8d404058abe502c4f58b87e3%7C0%7C0%7C637465928819010016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BppcmVisIn5sIsTs58PMMqmKAtYeB3M0G9HQF7LRt%2Fw%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20210120/6b1646eb/attachment.htm>


More information about the cisco-voip mailing list