[Outages-discussion] CenturyLink Outages this morning

Jay Ashworth jra at baylink.com
Fri Dec 28 00:11:36 EST 2018


I got to tell you, reading back through that ticket, it certainly feels like the big ss7 storm from about 10 years ago. The question is, is it a bug or an attack. 

Or a test.

On December 27, 2018 10:48:20 PM EST, frnkblk at iname.com wrote:
>I wonder why they want to remove the secondary communication channel –
>is it because they believe it’s helping facilitate the broadcast
>traffic? Or is to simplify the application of filters? Or simplify
>troubleshooting of the issue?
>
> 
>
>I don’t know what Infinera platforms are involved with this issue, but
>Figure 3-24 of this (very old) document shows a Control Path
>B/secondary control path:
>
>https://fccid.io/ANATEL/01163-09-05381/Manual-02/072CE5D9-762C-49CE-B9F0-FEC86BF9077B/PDF
>
> 
>
>Frank 
>
> 
>
>From: Erik Wooding <erik.wooding at wework.com> 
>Sent: Thursday, December 27, 2018 9:06 PM
>To: Biddle, Josh <JBiddle at ntst.com>
>Cc: frnkblk at iname.com; outages-discussion at outages.org
>Subject: Re: [Outages-discussion] CenturyLink Outages this morning
>
> 
>
>They're definitely lost. Last troubleshoot was a card issue in Denver
>which didn't change anything. 
>
> 
>
>These are the updates from the ticket we have with them if anyone isn't
>getting these.  
>
> 
>
>------------------------------------------------------------------------------------------------------------------------
>
> 
>
>2018-12-28 02:02:14 GMT - Once the card was removed in Denver, CO it
>was confirmed that there was no significant improvement. Additional
>packet captures, and logs will be pulled from the device with the card
>removed to further isolate the root cause. The Equipment vendor
>continues to work with CenturyLink Field Operations at multiple sites
>to remove the secondary communication channel tunnel across the network
>until full visibility can be restored. The equipment vendor has
>identified a number of additional nodes that visibility has been
>restored to, and their engineers are currently working to apply the
>necessary filter to each of the reachable nodes. 
>
> 
>
>2018-12-28 01:03:11 GMT - Following the review of the logs and packet
>captures, the Equipment Vendor驴s Tier IV Support team has identified a
>suspected card issue in Denver, CO.驴驴Field Operations has arrived on
>site and are working in cooperation with the Equipment Vendor to remove
>the card. 
>
> 
>
>2018-12-28 00:01:53 GMT - The Equipment Vendor is currently reviewing
>the logs and packet captures from devices that have been completed,
>while logs and packet captures continue to be pulled from additional
>devices. The necessary teams continue to remove a secondary
>communication channel tunnel across the network until visibility can be
>restored. All technical teams continue to diligently work to review the
>information obtained in an effort to isolate the root cause. 
>
> 
>
>2018-12-27 22:58:43 GMT - Multiple teams continue work to pull
>additional logs and packet captures on devices that have had visibility
>restored, which will be scrutinized during root cause analysis. The
>Tier IV Equipment Vendor Technical Support team in conjunction with
>Field Operations are working to remove a secondary communication
>channel tunnel across the network until visibility can be restored. The
>Equipment Vendor Support team has dispatched their Field Operations
>team to the site in Chicago, IL and has been obtaining data directly
>from the equipment. 
>
> 
>
>2018-12-27 21:36:58 GMT - It has been advised that visibility has been
>restored to both the Chicago, IL and Atlanta, GA sites. Engineering and
>Tier IV Equipment Vendor Technical Support are currently working to
>obtain additional logs from devices across multiple sites including
>Chicago and Atlanta to further isolate the root cause. 
>
> 
>
>2018-12-27 20:19:36 GMT - Tier IV Equipment Vendor Technical Support
>continues to work with CenturyLink Field Operations and Engineering to
>restore visibility and apply the filter to devices in Atlanta, GA and
>Chicago, IL. While those efforts are ongoing additional logs have been
>pulled from the devices in Kansas City, MO and New Orleans, LA
>following the restoral of visibility and the necessary filter
>application to obtain additional pertinent information now that the
>device is remotely accessible. 
>
> 
>
>2018-12-27 19:30:01 GMT - Spoke with Sean advised we were seeing system
>reestored he verified and advised his other circuit is showing restored
>at the same time as this circuit Related to nationwide outage. adding
>to parent ticket Sean request we leave this ticket open until they are
>handsoff please when you are available could you send us your
>traceroutes thank you 
>
> 
>
>2018-12-27 19:17:01 GMT - Efforts to regain visibility to sites in
>Atlanta, GA and Chicago, IL remain ongoing. Once visibility has been
>restored the filter will be applied to limit communication traffic
>between sites which was causing CPU spikes that in turn prevented the
>devices from functioning properly. 
>
> 
>
>2018-12-27 18:13:33 GMT - The equipment vendor驴s Tier IV Technical
>Support team continued to investigate the equipment logs to further
>assist with isolation. Field Operations were dispatched to multiple
>sites to investigate equipment in Kansas City, MO, New Orleans, LA, as
>well as Atlanta, GA. It has been advised that a controller card has
>been stabilized in New Orleans, LA restoring visibility to the
>equipment to allow additional investigations to continue. A filter was
>then applied to equipment in Kansas City, MO that further alleviated
>the impact observed. Investigations remain ongoing to further restore
>network services. 
>
> 
>
>2018-12-27 15:46:56 GMT - Following the isolation of a node in San
>Antonio, TX that alleviated some of the impact the necessary teams have
>shifted troubleshooting focus to additional nodes experiencing issues.
>A node in Atlanta, GA as well as a site in New Orleans, LA are
>currently being investigated. 
>
> 
>
>2018-12-27 14:44:42 GMT - Field Operations dispatched to various
>locations to troubleshoot cooperatively with the equipment vendor.
>During cooperative troubleshooting, a device in San Antonio, TX was
>seeming to broadcast traffic consuming capacity and impacting other
>nodes in the network. The node was isolated from the network, which
>appears to have alleviated some impact; however, troubleshooting
>efforts continue to restore all impacted services. We understand how
>important these services are to our clients and the issue has been
>escalated to the highest levels within CenturyLink Service Assurance
>Leadership. 
>
> 
>
>2018-12-27 13:50:05 GMT - On December 27, 2018 at 02:40 GMT,
>CenturyLink identified a service impact in various locations. The NOC
>engaged to begin investigations. The NOC engaged Tier IV Vendor Support
>to assist in troubleshooting and fault isolation efforts. The NOC
>engaged Field Operations to cooperatively troubleshoot. Field
>Operations arrived on site in Denver, CO. Field Operations dispatched
>additional technicians to Kansas City, MO and Omaha, NE. The NOC is
>continuing to cooperatively troubleshoot with Tier IV Vendor Support
>for a site in San Antonio, TX. 
>
> 
>
>2018-12-27 13:14:12 GMT - Field Operations has arrived on site at the
>Denver location. 
>
> 
>
>2018-12-27 12:49:55 GMT - The NOC has engaged Tier IV Vendor Support to
>assist in troubleshooting efforts. 
>
> 
>
>2018-12-27 12:31:34 GMT - Field Operations has dispatched a second
>technician to another site to assist with troubleshooting and fault
>isolation efforts. An ETA of 13:30 GMT has been provided. 
>
> 
>
>2018-12-27 12:21:46 GMT - The NOC has engaged Field Operations to
>assist in troubleshooting and fault isolation efforts. An ETA of 13:15
>GMT has been provided.
>
> 
>
>2018-12-27 11:57:40 GMT - On December 27, 2018 at 02:40 GMT,
>CenturyLink identified a service impact in New Orleans, LA. The NOC is
>engaged and investigating in order to isolate the cause. Please be
>advised that updates for this event will be relayed at a minimum of
>hourly unless otherwise noted. The information conveyed hereafter is
>associated to live troubleshooting effort and as the discovery process
>evolves through to service resolution, ticket closure, or post incident
>review, details may evolve.
>
> 
>
>On Thu, Dec 27, 2018 at 6:59 PM Biddle, Josh <JBiddle at ntst.com
><mailto:JBiddle at ntst.com> > wrote:
>
>///Thanks.  This is one for the books I guess.  I just got in to work a
>few hours ago so I am playing catch up.  It sounds to me like they
>still don’t know what happened.  In that thread I see multiple reports
>of circuits going into loopback.  I’m guessing this is CL’s way of
>trying to segregate parts of their core infrastructure in attempt to
>kill whatever is broadcasting and reconverge their network.  It sounds
>like they still have not figured out the root cause. 
>
> 
>
>Anyone have any thoughts or updates?
>
> 
>
> 
>
>From: frnkblk at iname.com <mailto:frnkblk at iname.com>  <frnkblk at iname.com
><mailto:frnkblk at iname.com> > 
>Sent: Thursday, December 27, 2018 9:10 PM
>To: Biddle, Josh <JBiddle at ntst.com <mailto:JBiddle at ntst.com> >
>Subject: RE: [Outages-discussion] CenturyLink Outages this morning
>
> 
>
>This Reddit post is probably the best/closes to what you’re asking for:
>https://old.reddit.com/r/networking/comments/a9z6tb/centurylink_outage_west_coast/ecnsbct/
><https://urldefense.proofpoint.com/v2/url?u=https-3A__old.reddit.com_r_networking_comments_a9z6tb_centurylink-5Foutage-5Fwest-5Fcoast_ecnsbct_&d=DwMFAg&c=-7HNwxqfpkdcRXCW8HB54Q&r=svX1Si7sopSBMitBL3bFwQ&m=W4lc_zzFTBgLx4j-BCYt1OS6aM_k2UNM7ApSaARhJ08&s=aOKuybu-jzAMU3kMtXlyoHlJowOoZk7j8l5nWAnrqoo&e=>
>
>
> 
>
>Frank 
>
> 
>
>From: Outages-discussion <outages-discussion-bounces at outages.org
><mailto:outages-discussion-bounces at outages.org> > On Behalf Of Biddle,
>Josh
>Sent: Thursday, December 27, 2018 6:34 PM
>To: outages-discussion at outages.org
><mailto:outages-discussion at outages.org> 
>Subject: Re: [Outages-discussion] CenturyLink Outages this morning
>
> 
>
>Anyone have root cause of Century Link outage and ETTR?
>
> 
>
> 
>
>From: Outages-discussion <outages-discussion-bounces at outages.org
><mailto:outages-discussion-bounces at outages.org> > On Behalf Of Frank
>Bulk
>Sent: Thursday, December 27, 2018 6:42 PM
>To: outages-discussion at outages.org
><mailto:outages-discussion at outages.org> 
>Subject: Re: [Outages-discussion] CenturyLink Outages this morning
>
> 
>
>My IPv6 access to www.qwest.com
><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.qwest.com&d=DwMFAg&c=-7HNwxqfpkdcRXCW8HB54Q&r=svX1Si7sopSBMitBL3bFwQ&m=W4lc_zzFTBgLx4j-BCYt1OS6aM_k2UNM7ApSaARhJ08&s=Dw0K5qO9ViQuAnU-lsg-DS_zFzzr3sACarc7Az1k7o8&e=>
>and www.centurylink.com
><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.centurylink.com&d=DwMFAg&c=-7HNwxqfpkdcRXCW8HB54Q&r=svX1Si7sopSBMitBL3bFwQ&m=W4lc_zzFTBgLx4j-BCYt1OS6aM_k2UNM7ApSaARhJ08&s=QlmOIb-_7Xxxz5CMZT6NxzOo9JvY2bcQsR61xQ91dK0&e=>
> has been stable since 3:29 pm U.S. Central.
>
> 
>
>downdetector.com <http://downdetector.com>  has flatlined for a while,
>but still not close to zero, and based on what I see in the reddit
>thread, the issue is not resolved.
>
> 
>
>Frank 
>
> 
>
>From: Outages-discussion <outages-discussion-bounces at outages.org
><mailto:outages-discussion-bounces at outages.org> > On Behalf Of Frank
>Bulk
>Sent: Thursday, December 27, 2018 9:02 AM
>To: outages-discussion at outages.org
><mailto:outages-discussion at outages.org> 
>Subject: Re: [Outages-discussion] CenturyLink Outages this morning
>
> 
>
>Thanks for sharing.  IPv6 access to www.qwest.com
><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.qwest.com&d=DwMF-g&c=-7HNwxqfpkdcRXCW8HB54Q&r=54dbn-p1oLZUKMLcLIhi8XEoEqm1EAqlQUibNY0yxyg&m=MZI5frpDmYAKCJ3LImVo75vY0h4gVuUnYEgmLhjxBPc&s=An5q7Z3-RdwFPPQ1wVimppYrcLzYIv3sRw2vDYhT4mw&e=>
>and www.centurylink.com
><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.centurylink.com&d=DwMF-g&c=-7HNwxqfpkdcRXCW8HB54Q&r=54dbn-p1oLZUKMLcLIhi8XEoEqm1EAqlQUibNY0yxyg&m=MZI5frpDmYAKCJ3LImVo75vY0h4gVuUnYEgmLhjxBPc&s=Nu02qgI6ZbcvF86z7lerio9zVCJHCW42q2uf_GMqY7k&e=>
> has been intermittent since 5:00 am U.S. Central.
>
> 
>
>Frank
>
> 
>
>From: Outages <outages-bounces at outages.org
><mailto:outages-bounces at outages.org> > On Behalf Of Erik Sundberg via
>Outages
>Sent: Thursday, December 27, 2018 8:45 AM
>To: outages at outages.org <mailto:outages at outages.org> 
>Subject: [outages] CenturyLink Outages this morning
>
> 
>
>We are seeing a lot of centurylink circuits and services down this
>morning.
>
> 
>
>Centurylink Control Center portal comes up but auth failing unable to
>login
>
> 
>
>Waves down or bouncing 
>
>Denver - Seattle (Bouncing)
>
>Denver - Chicago (Bouncing)
>
>New York - Los Angeles (Down)
>
>Chicago - Atlanta (Down)
>
>
>ENNI Down in Denver (Down)
>
> 
>
>Still tying to open tickets with them there portal is not working and
>stuck on hold right now.
>
> 
>
> 
>
>  _____  
>
>
>CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents,
>files or previous e-mail messages attached to it may contain
>confidential information that is legally privileged. If you are not the
>intended recipient, or a person responsible for delivering it to the
>intended recipient, you are hereby notified that any disclosure,
>copying, distribution or use of any of the information contained in or
>attached to this transmission is STRICTLY PROHIBITED. If you have
>received this transmission in error please notify the sender
>immediately by replying to this e-mail. You must destroy the original
>transmission and its attachments without reading or saving in any
>manner. Thank you.
>
>This email and its attachments may contain privileged and confidential
>information and/or protected health information (PHI) intended solely
>for the use of Netsmart Technologies and the recipient(s) named above.
>If you are not the recipient, or the employee or agent responsible for
>delivering this message to the intended recipient, you are hereby
>notified that any review, dissemination, distribution, printing or
>copying of this email message and/or any attachments is strictly
>prohibited. If you have received this transmission in error, please
>email compliance at NTST.com <mailto:compliance at NTST.com>  immediately and
>permanently delete this email and any attachments. 
>
>This email and its attachments may contain privileged and confidential
>information and/or protected health information (PHI) intended solely
>for the use of Netsmart Technologies and the recipient(s) named above.
>If you are not the recipient, or the employee or agent responsible for
>delivering this message to the intended recipient, you are hereby
>notified that any review, dissemination, distribution, printing or
>copying of this email message and/or any attachments is strictly
>prohibited. If you have received this transmission in error, please
>email compliance at NTST.com <mailto:compliance at NTST.com>  immediately and
>permanently delete this email and any attachments. 
>
>_______________________________________________
>Outages-discussion mailing list
>Outages-discussion at outages.org <mailto:Outages-discussion at outages.org> 
>https://puck.nether.net/mailman/listinfo/outages-discussion
>
>
>
>
> 
>
>-- 
>
>	
>
>WeWork | Erik Wooding 
>Manager of Network Engineering, US & Canada West, Latin America 
>O: 917-810-9345 
> <http://www.wework.com> wework.com 
>
>Create Your Life's Work 
>
>Get rewarded for good ideas and good people! 
>Apply for funding at  <http://creatorawards.wework.com/>
>creatorawards.wework.com or 
>help grow the community at  <http://refer.wework.com> refer.wework.com
>
> 

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/outages-discussion/attachments/20181228/adb99a8d/attachment-0001.html>


More information about the Outages-discussion mailing list