[Outages-discussion] VoIP - complete outage at DASH Carrier Services

Warren Kumari warren at kumari.net
Wed Dec 15 18:42:24 EST 2010


On Dec 15, 2010, at 4:01 PM, Chris Stone wrote:

> DASH has posted (email) the following with regards to their outage  
> yesterday:

Excellent.

We all moan and complain when folk suffer outages and don't post any  
sort of postmortem (or some sort of hand-wavy "This is not the outage  
you are looking for" nonsense) -- I'm glad to see that dash has done  
the right thing here...

W

>
> Date of Incident:                Tuesday 14 December, 2010
> Time Incident Began:        3:00 PM MST Denver POP, 3:00 MST Atlanta  
> POP
> Time Incident Resolved:    4:40 MST Denver POP,  4:50 MST Atlanta POP
>
> Reason for Outage
>
> dash experienced a corruption of a configuration file on our Acme  
> Packet SBC
> clusters in the Denver and Atlanta POPs.  The clusters do not share  
> a common
> configuration, but they are configured similarly.  dash is working  
> with Acme
> Packet to identify the cause of the corruption.
>
> Services Affected
>
> Inbound and outbound call routing.
>
> Resolution
>
> dash removed the corrupted entity and rebuilt that same portion of the
> configuration in each cluster. No other changes were made to the  
> configuration.
>
>
> Root Cause
>
> The corrupt configuration database caused routing requests to not  
> complete
> correctly and over a short time caused process failure on the Acme  
> Packet
> SBC cluster. Specifically the process failure resulted in the public  
> VRRP
> interfaces of the border controller to drop.
>
> dash is working with Acme Packet to identify root cause and implement
> corrective action as necessary. The root cause will be communicated  
> at such
> time it is identified.
>
> Corrective Action
>
> Until root cause is identified and long term corrective action is
> implemented, dash monitoring will continue to send critical alerts  
> if the
> situation is repeated. To resolve the issue the corrupt  
> configuration file
> would be removed and rebuilt. Time to remove the corrupt file and  
> rebuild is
> approximately one minute for each SBC cluster.
> _______________________________________________
> Outages-discussion mailing list
> Outages-discussion at outages.org
> https://puck.nether.net/mailman/listinfo/outages-discussion



More information about the Outages-discussion mailing list