[cisco-voip] CUCM split brain question

Ovidiu Popa ovi.popa at gmail.com
Wed Apr 6 18:29:41 EDT 2011


Woohoo ... Got my answer. And not only that, but I learned many other 
things in the process.

Too bad for the book, I had already found a sexy title to propose to you 
"Inside the Seadragon" :). "Seadragon" is the codename for the first 
Linux-based CUCM if I'm not mistaken.

On a more serious note, I believe that I speak for a lot of people who 
are just hungry for details as our knowledge and experience would 
benefit greatly from this information.

I appreciate a lot Jeff Lindborg's example on this and his site 
http://www.ciscounitytools.com <http://www.ciscounitytools.com/> 
provides invaluable information for a lot of people.

If you ever start a new blog/site or add a post on docwiki.cisco.com 
<http://docwiki.cisco.com/wiki/Main_Page>give us a heads up.

And lastly the most important thing: keep acting green eventually 
everybody will understand.

A sincere thank you,
Ovidiu

On 06/Apr/11 11:57 PM, Wes Sisk wrote:
> For call processing unknown = unregistered.
>
> Remember those SDL links?  They sends signals back and forth called 
> "DMPropagateRegister" and UnRegister. Instead of querying RIS CM just 
> checks its internal memory for local or remote registrations.  If ccm 
> does not know about registration state (either locally or remotely) 
> then CFUR.
>
> <strictly personal comment>
> Thanks for the props.  I'm just one of a large pool of people who 
> generate the need, concepts, code, product, knowledge, and use cases.  
> I've been involved with a few book projects but I'm much more 
> committed to wikis, web publishing, and mailing lists.  Trees are much 
> more beautiful on a trail than stacked on a shelf.  Technical 
> information changes so quickly that it's obsolete before the ink dries.
> </strictly personal comment>
>
> Regards,
> Wes
>
> On 4/6/2011 5:13 PM, Ovidiu Popa wrote:
>>
>> And the gifts keep on coming... the registered/unknown puzzle 
>> unraveled :)
>>
>> Now for the final question:
>>
>> How does the Call Forward Unregistered handle Unknown states? In my 
>> case if the branch phone never registered to CM1 since the last CM 
>> service restart its state will be unknown. Will an incoming call to 
>> the phone DN follow the CFUR ?
>>
>> Sorry to pester you with questions and thank you again.
>>
>> PS: If you ever decide to write a book on CUCM please sign me up on 
>> the pre-order list.
>> PPS : somehow I don't think I am the only one.
>>
>> Best regards,
>> Ovidiu
>>
>>
>>
>> On 06/Apr/11 10:49 PM, Wes Sisk wrote:
>>> If phone is local to CM1 and registers to CM1 then CM1 sees the 
>>> phone as registered.  Because the SDL link is down CM2 will see the 
>>> phone as unregistered.
>>>
>>> If phone is local to CM2 and registers to CM2 then CM2 sees the 
>>> phone as registered.  Because the SDL links is down CM1 will see the 
>>> phone as unregistered.
>>>
>>> The appearance of "unregistered" vs "unknown" in the UI is a bit of 
>>> a red herring.  Registration status is captured in a shared memory 
>>> segment by the ccm process. Processes such as AXL and RIS read that 
>>> shared memory segment.
>>>
>>> When AXL or web service goes to read that shared memory segment it 
>>> reads all nodes in the cluster.  With connectivity to CM2 being down 
>>> AXL or RIS on CM1 will only be able to query CM1 shared memory. If 
>>> the phone in question has never registered to CM1 then status will 
>>> be "unknown".
>>>
>>> Similarly if the ccm process on CM2 restarts then the status will be 
>>> "unknown" in the CM2 shared memory segment until the first time the 
>>> phone registers with CM2.
>>>
>>> So, "unknown" vs "unregistered" has a very subtle, possibly even 
>>> nuance, different meaning.  "unknown" means the shared memory 
>>> segments currently available to query have not been updated with a 
>>> known status for that device since the last ccm process restart.  
>>> "unregistered" means the phone last transitioned to "unregistered" 
>>> signaling status with one of the nodes that is currently accessible 
>>> from the AXL or RIS instance you are querying
>>>
>>> In ASCII:
>>> browser->Tomcat->RIS1->shared_memory1->CM1
>>>               |->RIS2->shared_memory2->CM2
>>>               |->RIS3->shared_memory3->CM2
>>>
>>> All contingent on IP connectivity from Tomcat to the various RIS 
>>> processes across the cluster.
>>>
>>> Regards,
>>> Wes
>>>
>>> On 4/6/2011 4:01 PM, Ovidiu Popa wrote:
>>>> Hello Wes
>>>>
>>>> Excellent information. I always had a difficulty understanding the 
>>>> insides of the CUCM box. Thank you.
>>>>
>>>> One question remains:
>>>> Imagine a WAN failure for 30 minutes. CM1 and CM2 are working 
>>>> without the SDL layer. CM2 (branch) sees the branch phone as 
>>>> Registered, CM1 (hq) sees the phone as unregistered or unknown?
>>>>
>>>> Thanks.
>>>> Ovidiu
>>>>
>>>>
>>>> On 06/Apr/11 9:22 PM, Wes Sisk wrote:
>>>>> For example sake:
>>>>> CM1: headquarters
>>>>> CM2: branch
>>>>>
>>>>> When IP connectivity exists between CM1 and CM2 SDL TCP sessions 
>>>>> are established between the 2 ccm processes. Through SDL each 
>>>>> server tells every other server about registered devices.  There 
>>>>> is opportunity for duplicate registration and some propagation 
>>>>> time so there is a window of convergence involved.  Once things 
>>>>> are in sync each CM node tells every other node about all 
>>>>> significant local state changes.  Make sense?  This usually helps 
>>>>> folks understand why QoS is so critical on SDL links.  SDL links 
>>>>> are the vehicle for synchronization for 2 real time processes.  
>>>>> This isn't quite as sensitive as parallel graphics processing but 
>>>>> it's not far off.
>>>>>
>>>>> When SDL link goes down each node forgets about all entities it 
>>>>> learned from the remote node.  It literally purges them.  Devices 
>>>>> have to register to their local node (even the best network admins 
>>>>> miss some especially when it comes to virtual devices like hunt 
>>>>> pilots, route lists, and software media resources).  Again there 
>>>>> is opportunity for some duplicate registration.  SDL links detect 
>>>>> outage on the order of 10 seconds or less.  SCCP devices do 
>>>>> keepalives on the order of 30 seconds with allowance for 1-2 
>>>>> missed keepalives.  10seconds vs 60-90seconds creates a window of 
>>>>> overlap.  If a duplicate registration is detected then CM resets 
>>>>> both device processes.  This extends downtime but there really is 
>>>>> no way of knowing which is the "right" registration.  This is 
>>>>> another window of convergence.
>>>>>
>>>>> So, after convergence the device appears unregistered on the 
>>>>> remote node.  For an interesting dig into this scenario take a look at
>>>>> CSCsc62081    CCM SDL Out of Service / In Service causes 
>>>>> Unexpected Unity Failover.
>>>>>
>>>>> and similarly related to realtime synchronization of state machines:
>>>>> CSCsc62073    Locations Out of Bandwidth causes unexpected Unity 
>>>>> Failover
>>>>>
>>>>> It was the same customer who originated both of these.  This 
>>>>> customer had truly the worst luck with timing that I have ever seen.
>>>>>
>>>>> Regards,
>>>>> Wes
>>>>>
>>>>>
>>>>> On 4/6/2011 11:51 AM, Ovidiu Popa wrote:
>>>>>> Hello everyone
>>>>>>
>>>>>> Here's an unusual scenario that kind of puzzles me.
>>>>>>
>>>>>> Here are the details:
>>>>>> - 2 CUCM with Clustering over WAN (HQ and Branch)
>>>>>> - Centralized PSTN Access at HQ (DID numbers routed to HQ)
>>>>>> - 1 phone with the Branch CUCM as primary and the HW CUCM as 
>>>>>> secondary
>>>>>>
>>>>>> Disaster strikes, the WAN link goes down and we have a split 
>>>>>> brain condition.
>>>>>>
>>>>>> What is the state to the phone on HQ CUCM? Will the phone be 
>>>>>> Unregistered or state Unknown?
>>>>>>
>>>>>> And the most important question is will the HQ CUCM follow the 
>>>>>> CFUR if the state is Unknown?
>>>>>>
>>>>>> Thanks for your input.
>>>>>>
>>>>>> Regards,
>>>>>> Ovidiu
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> cisco-voip mailing list
>>>>>> cisco-voip at puck.nether.net
>>>>>> https://puck.nether.net/mailman/listinfo/cisco-voip
>>>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20110407/dd31bb73/attachment.html>


More information about the cisco-voip mailing list