[cisco-voip] DeviceInitTimerPop

Fri Aug 15 13:30:40 EDT 2008

The device init timer should be the time it takes CCM to read in  
device info from the database during initialization.  The delays  
could be normal, ie you just have so many devices it takes longer  
than 6 minutes to read them all from the db, or could be caused by  
something else taking cpu time away from CCM.

-Ryan

On Aug 15, 2008, at 12:13 PM, Keith Klevenski wrote:

I was fortunate enough to be greeted by the following errors in the  
event log on both servers last night after simply rebooting our 4.1.3  
cluster:

Event Type:      Error

Event Source:   Cisco CallManager

Event Category:            None

Event ID:          3

Date:                8/14/2008

Time:                10:57:31 PM

User:                N/A

Computer:         HOUCM01

Description:

Error: DeviceInitTimerPop - Device Initialization timer has expired.

   Timer Value in seconds: 360

   App ID: Cisco CallManager

   Cluster ID: US-HOU-CL01

   Node ID: x.x.x.x

Explanation: The Device Initialization process has taken longer than  
expected to initialize.  The system configuration is larger and/or  
more complex than expected, or an error has been encountered.

Recommended Action: Try increasing the "Device Initialization Timer"  
value in the CallManager Service Parameters administration page in  
50% multiples of the suggested default value.

*Note: This parameter is cluster-wide.  Setting this parameter will  
effect each call processing server within the cluster on the next  
service restart.

If the system does not successfully initialize by increasing this  
parameter parameter to the maximum value, contact Cisco TAC support  
immediately..

And this gem of an error:

Event Type:      Error

Event Source:   Cisco CallManager

Event Category:            None

Event ID:          3

Date:                8/14/2008

Time:                10:57:31 PM

User:                N/A

Computer:         HOUCM01

Description:

Error: CallManagerFailure - Indicates some failure in the Cisco  
CallManager system.

   Host name of hosting node.: HOUCM01

   IP address of hosting node.: x.x.x.x

   Reason code.: 5

   Additional Text [Optional]: MMManInit::initializing_devices

   App ID: Cisco CallManager

   Cluster ID: US-HOU-CL01

   Node ID: x.x.x.x

Explanation: This alarm indicates that some failure occurred in the  
Cisco CallManager system.

Recommended Action: Monitor for other alarms and restart Cisco  
CallManager service, if necessary..

The CallManager service kept ‘terminating unexpectedly’ every 10  
minutes or so on both servers and therefore not a single device in  
the entire cluster would register to either the pub or the sub.  I  
ended up calling TAC since I had never seen these two errors before  
and tried increasing  the Device Initialization Timer, but it did not  
help.  Almost 2 hours into it I realized that when I updated this  
service parameter it did not get changed for some reason when I did  
it earlier and I remember TAC seeing the service parameter page with  
a bunch of errors on it.  So I thought I had updated it, but it  
errored out and I didn’t realize it.  So I upped it from 360 to 550,  
restarted CallManager and DBL services on both servers and 5 minutes  
later all the phones registered and peace was restored.

TAC has no answer for what may have happened other than there was a  
Dr. Watson log generated right when I rebooted the pub which they are  
analyzing.  So ultimately my question is why would the ‘device  
subsystem’ suddenly not initialize within the default 360 seconds?   
What exactly is the device subsystem other than what the name  
implies?  I can only assume changing this value to 550 is what  
restored order in the cluster.  There are less than 1800 phones in  
the database and just under 800 gateways and probably a quarter of  
them are old phones and gateways that were never deleted.  This is  
the HP equivalent MSC7835 bought last year so the hardware is  
current.  I wouldn’t think  there are so many devices to initialize  
that this timer had to be upped. Or is it?  Has anyone ever had to  
change this timer?

I’m at a loss and have to explain why the entire US cluster including  
9 6608 PRI’s was down hard for almost 3 hours after a run of the mill  
preventative maintenance quarterly reboot.

Thanks for any input!

........................................................................ 
............

Keith Klevenski

Manager, Telephony Engineering

1880 S. Dairy Ashford, Suite 300  |  Houston, TX  77077- 4760

Tel: 281.674.0702  |  Mobile: 713.677.3925  |  Fax: 281.674.0101

keith.klevenski at rig.net
www.rig.net

<image001.jpg>

<image001.jpg>
_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20080815/540ed10c/attachment-0001.html>