[outages] IBM Cloud / NetworkLayer Outage

Andy Jabbour andy at gate15.global
Wed Jun 10 08:47:37 EDT 2020


FWIW: https [:] //www [.] computing.co.uk/news/4016300/ibm-cloud-suffers-global-outage-bringing-customer-websites

Includes Twitter shots and links.

IBM Cloud suffered a widespread outage on Tuesday that brought down multiple services hosted on the platform, as well as IBM's own cloud status page.
The problem likely started around 2:30pm Pacific Time (PT) and escalated at around 5:30pm PT. Quickly, it started to impact services such as Cloud Object Storage, App Connect, Kubernetes Service, Continuous Delivery, Identity and Access Management (IAM), VPN for VPC, and Watson AI cloud services.
IBM's own status page also failed to load during the outage and returned an internal server error message.
DownDetector.com, a website that monitors website outages, showed that the outage was impacting IBM Cloud services in different parts of the US, South America, Japan and Australia.
IBM's cloud-based streaming and file transfer services, which come under the Aspera brand, were also mostly out.
IBM Aspera, an IBM unit that operates under hybrid cloud business, said that following the first outage, it was alerted to a service disruption affecting all regions.
Aspera further revealed that AoC Managed Storage was suffering "major outage" in Dallas, Amsterdam, Frankfurt, Melbourne, and Toronto.
"Our engineers are currently investigating the incident and we will provide updates when more information is available," an Aspera advisory stated.
IBM Cloud Transfer Clusters in Frankfurt, Washington DC, Hong Kong, Chennai, London, Montreal, Mexico, Milan, Oslo, Seoul, San Jose, Sao Paulo, Tokyo, Paris, and Singapore were also listed as suffering major outages.
It also emerged that IBM Cloud Monitoring service Sysdig had a three-hour maintenance window earlier in the day at its London facilities. IBM had informed customers that Sysdig maintenance work would be completed between 3pm to 6pm, and that it would cause only a 30-minute disruption to Cloud Monitoring service.
IBM Cloud eventually posted an update on Twitter, stating that it was aware of the issues and that the issue was being investigated.
At present, the issue appears to have been fully resolved.  IBM also said in a tweet at around 7:00pm PT that all IBM Cloud services are now working normally.


- andy


From: Outages <outages-bounces at outages.org> on behalf of Greg Dickinson via Outages <outages at outages.org>
Reply-To: Greg Dickinson <Greg.Dickinson at bryantbank.com>
Date: Wednesday, June 10, 2020 at 8:44 AM
To: "'outages at outages.org'" <outages at outages.org>
Subject: Re: [outages] IBM Cloud / NetworkLayer Outage

Found this on Reddit as an initial RFO, so keep in mind the source (although r/sysadmin is usually pretty good):

“A 3rd party network provider was advertising routes which resulted in our WorldWide traffic becoming severely impeded. This led to IBM Cloud clients being unable to log-in to their accounts, greatly limited internet/DC connectivity and other significant network route related impacts. Network Specialists have made adjustments to route policies to restore network access, and alleviate the impacts. The overall incident lasted from 5:55pm - 9:30pm ET. We will be providing a fully detailed Customer Incident Report/Root Cause Analysis as soon as possible”

Greg Dickinson
Network Engineer

[cid:image001.png at 01D63F03.CAA2D890]
234 Goodwin Crest Drive, Suite 500
Birmingham, AL  35209
Office: 205.917.2407  |  Mobile: 205.234.6427
Fax: 205.945.0515  |
greg.dickinson at bryantbank.com<mailto:greg.dickinson at bryantbank.com>  |  BryantBank.com<http://bryantbank.com/>
Unbeatable Service.  Legendary Results.
Facebook<https://facebook.com/bryantbank/>  |  Twitter<https://twitter.com/bryantbank/>

From: Outages <outages-bounces at outages.org> On Behalf Of Jason Kuehl via Outages
Sent: Tuesday, June 9, 2020 8:38 PM
To: James Brown <jbrown at easypost.com>
Cc: outages at outages.org
Subject: Re: [outages] IBM Cloud / NetworkLayer Outage



This Message originates from outside Bryant Bank.   Please use caution when opening this correspondence, attachments or hyperlinks (URLs).  If you have questions, please contact IT Support.  Thank you.
I can't wait for this post mortem.

On Tue, Jun 9, 2020 at 9:21 PM James Brown via Outages <outages at outages.org<mailto:outages at outages.org>> wrote:
As of a few minutes ago, all US SoftLayer (IBM Cloud) locations look normal to me.

On Tue, Jun 9, 2020 at 5:27 PM James Brown <jbrown at easypost.com<mailto:jbrown at easypost.com>> wrote:
Just wanted to share updates:

I got through to someone at IBM who confirmed the level of chaos over there. It looks like they've started deploying some mitigations because I'm seeing packet loss drop off but latency is now awful for all US locations. It looks like IBM has failed over all of their peering to go through their European facilities because every return packet from every POP is coming from lon03.networklayer.com<http://secure-web.cisco.com/1ZCENVaw45JMEdCoXCwG0rvb1UUCeCpWh4Gtfftz2u_gZYgdQhoKwrBfewM4QrECWqHUjB3ntgfnOudlN7Txq8mYVsGWJWeXWg-v2cvwS_INwBf66XwGN6PMKDQHpa9-nztcpzjJKAL45c9ehKsPd3RmZsKBti3c78NtK4_5orSmMjjftj2O5ihZtM3DI39J1jzcHEJkQ4qn1RaZRZvtck6dmc6mHvWEK07J6n5lfEiaFtNTzbRMqYg5PRkS5mXxpZdjs13BLwOYthPbwpG2y9sxQhA4vTT6kJ0WTImFe8A_gFn1O_YWB9bnGM_p6egB-/http%3A%2F%2Flon03.networklayer.com> or ams02.networklayer.com<http://secure-web.cisco.com/11XtT4bESggqtqUrb-8NP8jSqmiZcXpSKONGPf8fLiEIvSGY8GXfJOQvPgNH2fin3Mf1oX1qtIrUEhzncE_p6bafeLZfwQNedalFA2ZzWGHl31bfPAPv2AZR6vyTjFMCcWz92HYy6YIl1zhiQebTrb_76Ea-AEzmUrontPsu-oMbxR1e7nrBlRL1dTA7t3blZO8P3p58WDvMLWgLkm-fwMGANaDJ16Pw8BEmKXJJ-fHjl_4kXPUHTptOJJ2tBu8JYCQDVILZauSUz7kHx_JyJDNQEv4x9MojMPj2lnajAHeRvPjmy7vFPJnybP1MU8vGw/http%3A%2F%2Fams02.networklayer.com> (via Telia).

On Tue, Jun 9, 2020 at 4:12 PM James Brown <jbrown at easypost.com<mailto:jbrown at easypost.com>> wrote:
It looks like IBM Cloud / SoftLayer / NetworkLayer is having a global outage since about 15:04 Pacific time affecting routing to AWS, Level 3 / CenturyLink, and a bunch of other providers. Has anyone been able to get more details?

--
James Brown
Network Engineer


--
James Brown
Engineer


--
James Brown
Engineer
_______________________________________________
Outages mailing list
Outages at outages.org<mailto:Outages at outages.org>
https://puck.nether.net/mailman/listinfo/outages<https://secure-web.cisco.com/1ALbUsNtj5hDIBcrx-EkowvDfnZpZsLIXlTU9xpfnU2i2nr2PgCQpVywQsS39qJHljpuu-05y4c4OqVp6vnUXfkwMjdFzdWAx6ufZY6amRzi-W8ZDQuOeeScc9jnHNYi4_XXaHMDw62k5a0GGAPi2e7kdcA7oL1IKNx_qZPZumgIs16f_vwL28g93YV-Kco95rDiKdE0bmSRoXy19iByqDxxST8yHpluFq-50ihMwmJnR5cX7NXCjKltHkLQJmNokhORvV20qvLxA6jaejf_FP1PGqUQYp8tZbyW7-wXmtTDlX2L3Pi07ilf9uVHRvOBhR9NhsuDyhRSpbieu3tNNcQ/https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Foutages>


--
Sincerely,

Jason W Kuehl
Cell 920-419-8983
jason.w.kuehl at gmail.com<mailto:jason.w.kuehl at gmail.com>


NOTICE: This electronic mail message and any files transmitted with it are intended exclusively for the individual or entity to which it is addressed. The message, together with any attachment, may contain confidential and/or privileged information. Any unauthorized review, use, print, save, copy, disclosure or distribution is strictly prohibited. If you have received this message in error, please immediately advise the sender by reply email and delete copies.  Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/outages/attachments/20200610/6ef24d11/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 5978 bytes
Desc: image001.png
URL: <https://puck.nether.net/pipermail/outages/attachments/20200610/6ef24d11/attachment.png>


More information about the Outages mailing list