[VoiceOps] Plivo Offline, Domain Expired; Out of Business?

Alex Balashov abalashov at evaristesys.com
Sun Apr 23 19:56:46 EDT 2017


Yes, it is good to know when things are down. Where I see people getting trouble with self-monitoring is making SLA claims to their vendors, or other financial or contract claims that require ironclad attribution of blame.

Any piece of information like this is just a tool. It has epistemic limits. It may be valuable, but it is definitely not priceless.

-- Alex

> On Apr 23, 2017, at 6:42 PM, Peter Beckman <beckman at angryox.com> wrote:
> 
> Still priceless. If our application servers cannot access critical
> resources from a carrier, regardless of the cause (network, application
> outage on their end, domain went un-registered), I now know it isn't
> working.
> 
> That still is priceless. Monitoring should tell you what, not why. Blackbox
> monitoring (measure the experience customers experience) is wildly more
> valuable than whitebox monitoring (a DB query took more than 10 seconds,
> once).
> 
> Your metrics system should help you answer why, once you know that
> something is wrong.
> 
> Beckman
> 
> Good reading:
> https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit
> 
> From a prior Google SRE
> 
>> On Sun, 23 Apr 2017, Alex Balashov wrote:
>> 
>> With regard to the so-called priceless information, I would be careful.
>> Monitoring vendors onesself is good, but it can lead to the
>> misapprehension that the blame for an outage lies with the vendor where
>> it in fact does not, but is instead caused by intermediate routing or
>> whatever.
>> 
>> In my experience, if used with conceit, such monitoring can cause as many
>> problems as it solves. It certainly isn't priceless. That depends on the
>> specific situation.
>> 
>>> On April 23, 2017 4:27:00 PM EDT, Alexander Lopez <alex.lopez at opsys.com> wrote:
>>> The information you gather by monitoring your vendors is priceless, if
>>> you are able to determine within a short period of time that your
>>> vendor is the problem,you have just saved precious time in identifying
>>> the problem. You can then reroute around the issue, instead of spending
>>> time in looking at your platform
>>> 
>>> On Apr 23, 2017 3:10 PM, Peter Beckman <beckman at angryox.com> wrote:
>>> Yep, we monitor our vendors. In some cases, better than they monitor
>>> themselves. It's frustrating that they don't/can't/won't, but wanting
>>> them
>>> to change doesn't help our customer experience. So we do it for
>>> them proactively.
>>> 
>>> Too many outages on our vendor services have been caused by their
>>> ineptitude, such as this Plivo outage. We decided as a result of this
>>> Plivo
>>> ouatge that, while it was only merely annoying to our Operations team
>>> and
>>> didn't result in any customer-facing issues, we could have seen this
>>> coming
>>> and maybe averted this disaster for everyone by adding two config lines
>>> in
>>> our monitoring platform and a process to ensure notification to,
>>> followup
>>> with and closure of the issue with Plivo.
>>> 
>>> The cost to our Operations team is small, the automated monitoring
>>> costs
>>> nothing, but the impact of knowing that our vendors are having (or will
>>> have) issues before they tell us or know themselves, AND before our
>>> customers complain, improves our customer experience and operational
>>> excellence despite our vendor's failings.
>>> 
>>> It also gives us a chance to write defensive code to handle the
>>> situations
>>> where the vendor is not meeting their contractually obligated level of
>>> service.
>>> 
>>> Beckman
>>> 
>>> 
>>>> On Sun, 23 Apr 2017, Keln Taylor wrote:
>>>> 
>>>> Just to clarify, you are saying that you monitor the domain and SSL
>>> cert of
>>>> your vendors so you can notify them?
>>>> That's cool.
>>>> 
>>>> 
>>>> Sincerely,
>>>> Keln Taylor
>>>> 870-204-2121
>>>> kelntaylor at gmail.com
>>>> 
>>>> On Sun, Apr 23, 2017 at 12:31 PM, Peter Beckman <beckman at angryox.com>
>>> wrote:
>>>> 
>>>>> We should all strive to NOT do that. We integrated a once a day
>>> check into
>>>>> our Monitoring platform that starts warning Operations 30 days
>>> before the
>>>>> domain expires, and actually pages people starting at 9am on
>>> Weekdays 7
>>>>> days before if it hasn't been renewed. We had to tweak it for how
>>> our
>>>>> registrar publishes that information, and we automated renewals so
>>> it
>>>>> rarely goes off, but when it does we can get in front of it.
>>>>> 
>>>>> We have the same thing in place for our public and internal SSL/TLS
>>>>> Certificates.
>>>>> 
>>>>> If you are running a business on the web and don't automate
>>> monitoring of
>>>>> critical infrastructure, you get outages like this. Heck, we started
>>>>> monitoring the domain and SSL certs of our critical-path dependent
>>>>> services/vendors since another outage many years ago after an SSL
>>> cert
>>>>> expired.
>>>>> 
>>>>> Plivo wasn't in our mix, as they aren't critical-path, but they are
>>> now,
>>>>> and they are still in alarm.  Operations now will be automatically
>>> notified
>>>>> when we can actually see Plivo again.
>>>>> 
>>>>> Beckman
>>>>> 
>>>>> On Sun, 23 Apr 2017, Gavin Henry wrote:
>>>>> 
>>>>> On 23 April 2017 at 17:31, Alex Balashov <abalashov at evaristesys.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> I've done it. Just got distracted, got to fiddling around and
>>> thinkin'
>>>>>>> bout things, and before I know it, the domain's expired and I'm no
>>> longer
>>>>>>> on the register of respectable party guests...
>>>>>>> 
>>>>>> 
>>>>>> Embarrassing. I'm sure I'll do it at some point. :-)
>>>>>> 
>>>>>> 
>>>>> ------------------------------------------------------------
>>>>> ---------------
>>>>> Peter Beckman
>>> Internet Guy
>>>>> beckman at angryox.com
>>>>> http://www.angryox.com/
>>>>> ------------------------------------------------------------
>>>>> ---------------
>>>>> _______________________________________________
>>>>> VoiceOps mailing list
>>>>> VoiceOps at voiceops.org
>>>>> https://puck.nether.net/mailman/listinfo/voiceops
>>>>> 
>>>> 
>>> 
>>> ---------------------------------------------------------------------------
>>> Peter Beckman                                                  Internet
>>> Guy
>>> beckman at angryox.com
>>> http://www.angryox.com/
>>> ---------------------------------------------------------------------------
>>> _______________________________________________
>>> VoiceOps mailing list
>>> VoiceOps at voiceops.org
>>> https://puck.nether.net/mailman/listinfo/voiceops
>> 
>> 
>> -- Alex
>> 
>> --
>> Principal, Evariste Systems LLC (www.evaristesys.com)
>> 
>> Sent from my Google Nexus.
>> _______________________________________________
>> VoiceOps mailing list
>> VoiceOps at voiceops.org
>> https://puck.nether.net/mailman/listinfo/voiceops
>> 
> 
> ---------------------------------------------------------------------------
> Peter Beckman                                                  Internet Guy
> beckman at angryox.com                                 http://www.angryox.com/
> ---------------------------------------------------------------------------



More information about the VoiceOps mailing list