[cisco-voip] Database Layer Change Notification
Wes Sisk
wsisk at cisco.com
Mon Apr 27 16:55:04 EDT 2009
Hi Robert,
You can get details on each of the mentioned defects in Cisco Bug
Toolkit on Cisco.com. CSCsf31622 specifically applies to 'change
notification load'. What does that mean? Enterprise wide extension
mobility login/logout, enterprise wide MWI refresh. These are the 2
biggest change notification generators that come to mind. My point here
was not load but rather backup. Once change notification (CN) begins to
backup the performance rapidly degrades. If aupair gets stopped for
some reason so that the queue gets backed up, then aupair will have a
very difficult time catching up once aupair is started back.
CM5.x and 6.x offer perfmon counters to monitor for CN backups. I am
not aware of a good way to monitor in CM4.x.
One thing we have used successfully in the past is a set of
'TraceCollection' scripts authored by Jim Cardon at Cisco. These
periodically pull traces from defined CM nodes and parse the traces for
specified strings. You could use this to look for aupair performing
normally ("Select TOP 499 * from DBLCNQueueHead AS H JOIN DBLCNQueueOld
AS O ON O.ref=H.ref JOIN DBLCNQueueNew AS N ON N.ref=H.ref order by
H.seq"). If not found then alert.
CM5.x and 6.x have this trace collection and trace monitoring built in
by default, just need to be configured.
/Wes
On Monday, April 27, 2009 4:14:21 PM, Robert Singleton
<rsingleton at morsco.com> wrote:
> Wes Sisk wrote:
>
>> Just one question - what CM version?
>
> Sorry, I skip that all the time... 4.1(3)sr4d
>
>> Point 1:
>> CSCsf31622 change notification performance degrades rapidly under
>> load
>
> load where? Generally, my CallManagers are not especially heavily
> loaded, CPU wise. That was not always the case. Once upon a time, just
> leaving traces running was enough to affect call handling, but
> upgraded hardware a couple or three years ago and it works much better
> now.
>
>> Point 2:
>> CSCse41788 Change Notification Fails - DBLCNQueue Counts Rise -
>> DBL Ptr Corruption
>
> Assuming traces aren't rolled out, I will look for this...
>
>> Point 3:
>> Aupair grabs change notifications from those tables and sends them
>> out to processes running on nodes in the cluster via TCP. We have
>> seen those TCP sessions get aborted and otherwise hung causing aupair
>> to hang:
>> CSCsa64684 Change Notify stops working due to bug in TcpLib
>>
>> Are you servers separated by a WAN, firewalls, or any TCP inspection
>> device that may interrupt TCP sessions? Are any processes in your
>> cluster flapping or crashing so they might not consume their change
>> notifications?
>
> None that I am aware of. Both servers are in the same colo. I cannot
> say at this writing whether or not they are on the same switch, but
> it's fairly likely that they may be in adjacent racks and thus on
> different 35XX switches. If so, these switches are probably daisy
> chained with 100Mb copper connections. I will investigate.
>
>> Point 4:
>
>> When you find the tables backing up grab these outputs for your
>> CCM03xx database using SQL query analyzer:
>> sp_who2
>> sp_lock
>>
>> in the sp_who2 output you can see if any process is 'waiting' on a
>> lock. you can also see who has locks
>> sp_lock shows specifically who has what locks
>
> Interesting... Since the issue will have to be ongoing for these to
> apply, do you have any specific recommendation for detecting the
> backup before the world comes crashing down?
>
>
>> CSCsl21023 Change notification broken after CM
>> deactivated,DBLCNQueue* tbls filling
>
> This definitely seems to fit my secondary symptom, DBCN being broken
> after a reboot to attempt to blindly fix the other issue.
>
> Thanks for the info and reply!
>
> Robert
> _______________________________________________
> cisco-voip mailing list
> cisco-voip at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-voip
More information about the cisco-voip
mailing list