[outages] Baicells Outage?

Thu Jan 4 11:37:37 EST 2018

Yes, Azure sent an email yesterday noting that due to the critical nature
of the patch they were doing rolling restarts of all instances in Azure
that were still affected by the issue. There wasn't a whole lot of notice.

Note that people using Scale Sets or other Azure redundancy options were
not affected due to Azure obeying the failover/failback rules (they'd
restart one, wait for it to be available, then restart the other).

On Thu, Jan 4, 2018 at 11:34 AM, Zak Rupas via Outages <outages at outages.org>
wrote:

> Good Morning Outages-
>
>
> Is anyone feeling this issue currently? Any truth to the story?
>
>
> From Baicells facebook group:
>
> So this morning I feel a bit like the parent who woke up having discovered
> his child injured someone in a car crash. I'm not directly responsible, but
> I've some culpability -- or a lot of it -- due to the choices I've made as
> parent.
>
> Yes, it is true the discovery of the Intel x86 security flaw set the tech
> industry ablaze last night. Among the reactions Microsoft abruptly shutdown
> it's Azure cloud servers with no advance notice. Yes, that specific action
> was beyond our control. But, Amazon is dancing, while people who rely
> upon Azure -- all of us -- are justifiably frustrated and angry.
>
> On this occasion, redundancy across Azure servers AND across various LTE
> functions, like HSS and MME, did not help us.
>
> Yes, we are going to investigate adding redundancy across multiple cloud
> providers such as Amazon....but let's be blunt...who's to say that step
> still will be enough. What needs completion without excuse or qualification
> are the other options we've discussed ad nauseum, the local EPC and Halo B.
>
> Apologies from us aren't gonna cut it, nor will excuses, so I'm not going
> to waste your time offering them. I'm already this AM discussing with our
> executive team pushing as top priority the other EPC options. Here's what I
> will tell you.
>
> This past Tuesday a new hire started at Baicells North America. Ronald Mao
> is ex-Huawei and ex-Motorola and has lived in the US since 1987. His entire
> career has been centered on product line management. He is our new PLM for,
> shall we say, major things. I've already asked Jesse to brief Ronald on the
> ongoing cloud issues, as well as local where we are on the local EPC and
> Halo B. I will ask Ronald to send me an update each week, and I'll pass
> this one to the group via Facebook AND an email UNTIL THIS IS DONE.
>
> One P.S. note, thank you Cameron and Rick. While it's small consolation to
> our customers, Rick and Cam have been up all night working with the team
> overseas, reporting to you, and in general trying to manage what's been
> frankly out of their hands.
>
> Jesse Raasch
> <https://www.facebook.com/jesse.raasch?fref=gs&dti=1588455311448839&hc_location=group>
>  Cameron Kilton
> <https://www.facebook.com/ciaworks?fref=gs&dti=1588455311448839&hc_location=group>
>  Rick Harnish
> <https://www.facebook.com/rick.harnish.10?fref=gs&dti=1588455311448839&hc_location=group>
>  Savannah Lancaster
> <https://www.facebook.com/savannah.lancaster?fref=gs&dti=1588455311448839&hc_location=group> Ronald
> Mao Minchul Ho
> <https://www.facebook.com/minchul.ho?fref=gs&dti=1588455311448839&hc_location=group>
>  Sonny May
> <https://www.facebook.com/sonny.may.94?fref=gs&dti=1588455311448839&hc_location=group>
>  Nitisha Potti
> <https://www.facebook.com/nitisha.potti?fref=gs&dti=1588455311448839&hc_location=group>
>  Boun Senekham
> <https://www.facebook.com/bsenekham?fref=gs&dti=1588455311448839&hc_location=group>
>
> Update: I spoke with Ronald this morning (he is in CA). He has his
> marching orders. I'll post updates from him until we close on the local EPC
> and Halo B.
>
> So Cameron has been trying to post this, but it's getting rejected:
>
> For those still having CPE attach issues. Please instruct your customers
> to have CPE powered off for at least 5 minutes.
>
>
>
>
> Micah Deshotel That didn't work for me. The down ones stayed down. Power
>
> System Alert: OMC is currently reporting offline. We are investigating as
> of 10:12PM EST. UPDATE: 11:07pm EST Azure is rebooting servers to apply a
> major patch. Most of our instances are back online. OMC should be restored
> shortly.
>
> UPDATE 2: 1:20am EST. MME VMs also fell victim to the critical issue with
> Azure. VMs have since been restored. More information about the issue an be
> found here: https://www.geekwire.com/…/cloud-vendors-secretly-scramble…/
> <https://l.facebook.com/l.php?u=https%3A%2F%2Fwww.geekwire.com%2F2018%2Fcloud-vendors-secretly-scramble-patch-critical-flaw-intel-chips-performance-hits-expected%2F&h=ATNFZQ-do0svtoa_e5hnQZl0gvZbgu6awtfb_Gk4bHVCd87C26PuC1WMVDtgDyT8qK3fFfPSQ7b34-F2tgFtUfQHStT7qfLlstCNSV7V6lexFW_hL3D1t6kkGbeOa7beRxl6DrJuMP_oeinlJRpZ3wFWRWUumOt1zsrGTV5Hv9Hzc_Br_dwBI54WDVkxt3_FW3zyGrAVA8u2dM_lnJetjxJVrD8N7PCXs3a6glQv0d_nWDmsRvN3MmiTIZV3gBHTITs7lk2JgmLJyKs_p0yfVvwJ3TkLPouV_wTQ>
>
> We will continue to review our cloud infrastructure and investigate cross
> platform redundancy.
>
>
>
>
> Cycling the UEs hasnt worked yet either......
>
>
>
> Thanks
> Zak Rupas
> Forethought.net
>
> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/outages/attachments/20180104/29e4dc19/attachment.htm>