[cisco-voip] Database Layer Change Notification

Mon Apr 27 13:15:34 EDT 2009

Hello, all!

I've just recovered from the second (known) occurrence of a problem 
wherein a table in CallManager's database, DBLCNQueueHead, seems to fill 
up and never empty, eventually bringing database changes to a grinding halt.

Both times, there has been an otherwise inexplicable call handling issue 
that eventually lead to a reboot of the cluster as a 
last-ditch-finger-crossing-wood-knocking attempt to make it go again. 
Both times, the original complaint was not resolved and the reboot 
apparently caused a new error to appear whenever any database change was 
attempted.

The first time, Call Forwarding was stuck in whatever state a given DN 
was set to. If a DN was forwarded, the act of removing forwarding 
appears to work, but calls to the DN were still forwarded. Likewise, if 
one forwarded a DN, it would appear to take the command, but the DN 
would continue to ring locally. Eventually, we tried the reboot (what I 
unaffectionately call "The Windows Fix") and when I started getting the 
errors afterward, I opened a TAC SR. I was passed around until I got an 
engineer who was very comfortable with the database and found that a few 
tables that were apparently related to database change notification were 
jam packed with 100's of thousands of records.

Last Friday, I had two locations for which incoming calls did not work 
correctly. Some telephones at each site appeared to be stuck loading a 
template, though they appeared to be registered in CallManager. Some 
switch and routing troubleshooting appeared to point to a UDP problem, 
but it was eventually discovered that certain telephones in the 
locations did work, though they were phones *without* the incoming DN on 
them.

We handle incoming calls at most locations by sending calls to shared 
DNs on most, if not all, telephones at the locations. Since phones 
without incoming lines were operating normally, we started by picking 
one phone, wiping it out and reconfiguring it one line at a time. We 
found that once we added the lead number of huntgroup, that phone began 
choking on loading a template. So, we deleted all traces of the DNs 
associated with incoming calls at that particular location but when we 
began adding them back, adding that lead DN number would again bring 
down the affected phones.

At that point, we decided that rebooting the cluster would probably be a 
good idea. When the system was back up, however, I now began getting 
errors whenever I tried make any database changes.

I then reviewed TAC history to find when we'd had similar issues and 
found where an engineer had determined that we had 200K+ entries in the 
DBLCNQueueHead table in the CCM0301 database. I looked and I had over 
456K rows.

I followed the same procedure, which was basically to truncate the three 
tables associated with change notification. For 456K rows to truncate 
takes almost 9 hours. Once that was done, not only could I now make 
database changes, but the original symptoms went away.

Now when I check properties on those tables, they have either one row or 
no rows, depending on which table.

I apologize for the exceptionally long introduction, but the real 
question is: What do these tables do? What makes them "stick" and fill 
up? How many rows is a critical number; when will it break because this 
table isn't clearing out?

The three tables are:

DBLCNQueueHead
DBLCNQueueNew
DBLCNQueueOld

I have viewed the contents of DBLCNQueueHead while making various 
database changes and the one row never changes. Color me confused.

Thanks!!!

Robert