[cisco-voip] CUCM split brain question
Ovidiu Popa
ovi.popa at gmail.com
Wed Apr 6 18:29:41 EDT 2011
Woohoo ... Got my answer. And not only that, but I learned many other
things in the process.
Too bad for the book, I had already found a sexy title to propose to you
"Inside the Seadragon" :). "Seadragon" is the codename for the first
Linux-based CUCM if I'm not mistaken.
On a more serious note, I believe that I speak for a lot of people who
are just hungry for details as our knowledge and experience would
benefit greatly from this information.
I appreciate a lot Jeff Lindborg's example on this and his site
http://www.ciscounitytools.com <http://www.ciscounitytools.com/>
provides invaluable information for a lot of people.
If you ever start a new blog/site or add a post on docwiki.cisco.com
<http://docwiki.cisco.com/wiki/Main_Page>give us a heads up.
And lastly the most important thing: keep acting green eventually
everybody will understand.
A sincere thank you,
Ovidiu
On 06/Apr/11 11:57 PM, Wes Sisk wrote:
> For call processing unknown = unregistered.
>
> Remember those SDL links? They sends signals back and forth called
> "DMPropagateRegister" and UnRegister. Instead of querying RIS CM just
> checks its internal memory for local or remote registrations. If ccm
> does not know about registration state (either locally or remotely)
> then CFUR.
>
> <strictly personal comment>
> Thanks for the props. I'm just one of a large pool of people who
> generate the need, concepts, code, product, knowledge, and use cases.
> I've been involved with a few book projects but I'm much more
> committed to wikis, web publishing, and mailing lists. Trees are much
> more beautiful on a trail than stacked on a shelf. Technical
> information changes so quickly that it's obsolete before the ink dries.
> </strictly personal comment>
>
> Regards,
> Wes
>
> On 4/6/2011 5:13 PM, Ovidiu Popa wrote:
>>
>> And the gifts keep on coming... the registered/unknown puzzle
>> unraveled :)
>>
>> Now for the final question:
>>
>> How does the Call Forward Unregistered handle Unknown states? In my
>> case if the branch phone never registered to CM1 since the last CM
>> service restart its state will be unknown. Will an incoming call to
>> the phone DN follow the CFUR ?
>>
>> Sorry to pester you with questions and thank you again.
>>
>> PS: If you ever decide to write a book on CUCM please sign me up on
>> the pre-order list.
>> PPS : somehow I don't think I am the only one.
>>
>> Best regards,
>> Ovidiu
>>
>>
>>
>> On 06/Apr/11 10:49 PM, Wes Sisk wrote:
>>> If phone is local to CM1 and registers to CM1 then CM1 sees the
>>> phone as registered. Because the SDL link is down CM2 will see the
>>> phone as unregistered.
>>>
>>> If phone is local to CM2 and registers to CM2 then CM2 sees the
>>> phone as registered. Because the SDL links is down CM1 will see the
>>> phone as unregistered.
>>>
>>> The appearance of "unregistered" vs "unknown" in the UI is a bit of
>>> a red herring. Registration status is captured in a shared memory
>>> segment by the ccm process. Processes such as AXL and RIS read that
>>> shared memory segment.
>>>
>>> When AXL or web service goes to read that shared memory segment it
>>> reads all nodes in the cluster. With connectivity to CM2 being down
>>> AXL or RIS on CM1 will only be able to query CM1 shared memory. If
>>> the phone in question has never registered to CM1 then status will
>>> be "unknown".
>>>
>>> Similarly if the ccm process on CM2 restarts then the status will be
>>> "unknown" in the CM2 shared memory segment until the first time the
>>> phone registers with CM2.
>>>
>>> So, "unknown" vs "unregistered" has a very subtle, possibly even
>>> nuance, different meaning. "unknown" means the shared memory
>>> segments currently available to query have not been updated with a
>>> known status for that device since the last ccm process restart.
>>> "unregistered" means the phone last transitioned to "unregistered"
>>> signaling status with one of the nodes that is currently accessible
>>> from the AXL or RIS instance you are querying
>>>
>>> In ASCII:
>>> browser->Tomcat->RIS1->shared_memory1->CM1
>>> |->RIS2->shared_memory2->CM2
>>> |->RIS3->shared_memory3->CM2
>>>
>>> All contingent on IP connectivity from Tomcat to the various RIS
>>> processes across the cluster.
>>>
>>> Regards,
>>> Wes
>>>
>>> On 4/6/2011 4:01 PM, Ovidiu Popa wrote:
>>>> Hello Wes
>>>>
>>>> Excellent information. I always had a difficulty understanding the
>>>> insides of the CUCM box. Thank you.
>>>>
>>>> One question remains:
>>>> Imagine a WAN failure for 30 minutes. CM1 and CM2 are working
>>>> without the SDL layer. CM2 (branch) sees the branch phone as
>>>> Registered, CM1 (hq) sees the phone as unregistered or unknown?
>>>>
>>>> Thanks.
>>>> Ovidiu
>>>>
>>>>
>>>> On 06/Apr/11 9:22 PM, Wes Sisk wrote:
>>>>> For example sake:
>>>>> CM1: headquarters
>>>>> CM2: branch
>>>>>
>>>>> When IP connectivity exists between CM1 and CM2 SDL TCP sessions
>>>>> are established between the 2 ccm processes. Through SDL each
>>>>> server tells every other server about registered devices. There
>>>>> is opportunity for duplicate registration and some propagation
>>>>> time so there is a window of convergence involved. Once things
>>>>> are in sync each CM node tells every other node about all
>>>>> significant local state changes. Make sense? This usually helps
>>>>> folks understand why QoS is so critical on SDL links. SDL links
>>>>> are the vehicle for synchronization for 2 real time processes.
>>>>> This isn't quite as sensitive as parallel graphics processing but
>>>>> it's not far off.
>>>>>
>>>>> When SDL link goes down each node forgets about all entities it
>>>>> learned from the remote node. It literally purges them. Devices
>>>>> have to register to their local node (even the best network admins
>>>>> miss some especially when it comes to virtual devices like hunt
>>>>> pilots, route lists, and software media resources). Again there
>>>>> is opportunity for some duplicate registration. SDL links detect
>>>>> outage on the order of 10 seconds or less. SCCP devices do
>>>>> keepalives on the order of 30 seconds with allowance for 1-2
>>>>> missed keepalives. 10seconds vs 60-90seconds creates a window of
>>>>> overlap. If a duplicate registration is detected then CM resets
>>>>> both device processes. This extends downtime but there really is
>>>>> no way of knowing which is the "right" registration. This is
>>>>> another window of convergence.
>>>>>
>>>>> So, after convergence the device appears unregistered on the
>>>>> remote node. For an interesting dig into this scenario take a look at
>>>>> CSCsc62081 CCM SDL Out of Service / In Service causes
>>>>> Unexpected Unity Failover.
>>>>>
>>>>> and similarly related to realtime synchronization of state machines:
>>>>> CSCsc62073 Locations Out of Bandwidth causes unexpected Unity
>>>>> Failover
>>>>>
>>>>> It was the same customer who originated both of these. This
>>>>> customer had truly the worst luck with timing that I have ever seen.
>>>>>
>>>>> Regards,
>>>>> Wes
>>>>>
>>>>>
>>>>> On 4/6/2011 11:51 AM, Ovidiu Popa wrote:
>>>>>> Hello everyone
>>>>>>
>>>>>> Here's an unusual scenario that kind of puzzles me.
>>>>>>
>>>>>> Here are the details:
>>>>>> - 2 CUCM with Clustering over WAN (HQ and Branch)
>>>>>> - Centralized PSTN Access at HQ (DID numbers routed to HQ)
>>>>>> - 1 phone with the Branch CUCM as primary and the HW CUCM as
>>>>>> secondary
>>>>>>
>>>>>> Disaster strikes, the WAN link goes down and we have a split
>>>>>> brain condition.
>>>>>>
>>>>>> What is the state to the phone on HQ CUCM? Will the phone be
>>>>>> Unregistered or state Unknown?
>>>>>>
>>>>>> And the most important question is will the HQ CUCM follow the
>>>>>> CFUR if the state is Unknown?
>>>>>>
>>>>>> Thanks for your input.
>>>>>>
>>>>>> Regards,
>>>>>> Ovidiu
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> cisco-voip mailing list
>>>>>> cisco-voip at puck.nether.net
>>>>>> https://puck.nether.net/mailman/listinfo/cisco-voip
>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20110407/dd31bb73/attachment.html>
More information about the cisco-voip
mailing list