[outages] Wikipedia

Fri Aug 3 15:45:41 EDT 2012

Disregard previous email. It was not meant to go to this distribution list.
Ken

-----Original Message-----
From: outages-bounces at outages.org [mailto:outages-bounces at outages.org] On Behalf Of Waugh, Kenneth
Sent: Friday, August 03, 2012 3:30 PM
To: George Herbert; Robert Brockway
Cc: outages at outages.org
Subject: Re: [outages] Wikipedia

Hi George,
Okay we resolved the iperf download issue, we will be ready to go shortly Ken

Kenneth Waugh | NOC Engineer | Sentry Managed Services | CCNP Office 301-623-1926 8am to 5pm US ET or  kwaugh at presidio.com Presidio Networked Solutions | www.presidio.com

-----Original Message-----
From: outages-bounces at outages.org [mailto:outages-bounces at outages.org] On Behalf Of George Herbert
Sent: Friday, August 03, 2012 2:58 PM
To: Robert Brockway
Cc: outages at outages.org
Subject: Re: [outages] Wikipedia

On Fri, Aug 3, 2012 at 12:51 AM, Robert Brockway <robert at timetraveller.org> wrote:
> On Thu, 2 Aug 2012, George Herbert wrote:
>
>> I reported it on their internal/external tech list, I was seeing the
>> outage for about 5-8 min and then it is back working in the last
>> 5-ish.
>
>
> Several times over the last few years I've seen WP outages which
> turned out to be bad config pushed in to production and then quickly
> reverted.  A few were patches to the Mediawiki software, for example.
>
> I guess they don't have a preprod/UAT environment :)  While I can
> understand them being able to simulate the scale, a small UAT
> environment to test config rationality wouldn't go astray.
>
> I hear Wikipedia has a monitoring system.  It involves alerts issued
> by millions of people around the world :)
>
> Cheers,
>
> Rob

I know some of the ops folks and have talked about ops stability on and off with the deputy director and VP of technology of the Wikimedia Foundation.  I haven't professionally consulted per se, but have some info about the ops.

They do have a preprod environment, but there are limitations to it, and the systems management process is not perfect.  They have been focused over the last couple of years on stability and disaster recovery, but with the user growth they see and budget envelope, it's hard to make huge leaps ahead on stability while growing.

Frankly, most of the large commercial environments I have seen were run worse, all things considered...

--
-george william herbert
george.herbert at gmail.com
_______________________________________________
Outages mailing list
Outages at outages.org
https://puck.nether.net/mailman/listinfo/outages
This message w/attachments (message) is intended solely for the use of the intended recipient(s) and may contain information that is privileged, confidential or proprietary. If you are not an intended recipient, please notify the sender, and then please delete and destroy all copies and attachments. Please be advised that any review or dissemination of, or the taking of any action in reliance on, the information contained in or attached to this message is prohibited.

_______________________________________________
Outages mailing list
Outages at outages.org
https://puck.nether.net/mailman/listinfo/outages