[Outages-discussion] VoIP - complete outage at DASH Carrier Services

Chris Stone cstone at axint.net
Wed Dec 15 16:01:28 EST 2010


DASH has posted (email) the following with regards to their outage yesterday:

Date of Incident:                Tuesday 14 December, 2010
Time Incident Began:        3:00 PM MST Denver POP, 3:00 MST Atlanta POP
Time Incident Resolved:    4:40 MST Denver POP,  4:50 MST Atlanta POP

Reason for Outage

dash experienced a corruption of a configuration file on our Acme Packet SBC
clusters in the Denver and Atlanta POPs.  The clusters do not share a common
configuration, but they are configured similarly.  dash is working with Acme
Packet to identify the cause of the corruption.

Services Affected

Inbound and outbound call routing.

Resolution

dash removed the corrupted entity and rebuilt that same portion of the
configuration in each cluster. No other changes were made to the configuration.


Root Cause

The corrupt configuration database caused routing requests to not complete
correctly and over a short time caused process failure on the Acme Packet
SBC cluster. Specifically the process failure resulted in the public VRRP
interfaces of the border controller to drop.

 dash is working with Acme Packet to identify root cause and implement
corrective action as necessary. The root cause will be communicated at such
time it is identified.

Corrective Action

Until root cause is identified and long term corrective action is
implemented, dash monitoring will continue to send critical alerts if the
situation is repeated. To resolve the issue the corrupt configuration file
would be removed and rebuilt. Time to remove the corrupt file and rebuild is
approximately one minute for each SBC cluster.


More information about the Outages-discussion mailing list