[Outages-discussion] [outages] Ping to Google 8.8.8.8

Lukas Tribus lukas at ltri.eu
Wed Feb 9 17:06:20 EST 2022


On Wed, 9 Feb 2022 at 20:10, Jeff Shultz <jeffshultz at sctcweb.com> wrote:
>
> This is sort of silly, and thank you Grant for pointing that out. Each ISP ought to have, within their network/ASN/segment of network that does not involve traversing the internet, at least one reliably pingable box, whether it be a gateway router or an odd jobs server sitting on your backbone.
>
> Not until you can demonstrate that the customer's connection to your network is up and running do you have any business thinking about pinging some box out on the internet somewhere. And I'm going to guess that if your network's connection to the wider internet goes down, you're going to know about it very soon, and you won't have to try and ping 8.8.8.8 to demonstrate it.
>
> And after you prove that your customer can ping your internal box by IP, then you can have them try it by DNS name. Then, and only then, do you need to try and test connectivity to the wider internet. I personally like news sites like CNN or Fox News, since they change all the time and are unlikely to be cached by the customer's web browser. They're also likely to be CDN'd to somewhere nearby.
>
> This is reminding me of my early days in tech support where if a customer couldn't access their #emailprovider for some reason, "the internet was down."

I don't think there is a big concern about humans pinging 8.8.8.8.
That's a customer support issue, along with speedtest from Wifi and
crappy speedtest servers.

The problem is equipment pinging 8.8.8.8 and making routing decisions
based on it's results. I don't think ICMP is the right tool for this
job  - it is in an ideal world where there *is* a universal ping
target - but that's not the case in an IOT/MULTI-WAN-FW context,
because there is no one setting up a globally reliable anycasted ICMP
response service, there is no money to be made with that. If I'd be an
SD-WAN/MULTI-WAN-FW operator, operating on global scale with a dozen
of ISPs, I would certainly:

- NOT trust suggested ISP ping destination, as that would set me up
for failure when the AS is isolated;
- but not only that, likely a ISP could bring the ping target so close
to the users that it would defeat the purpose: pop outage? metro
outage? regional outage? Healthchecks don't do any good if they don't
detect actual connectivity loss to internet destinations
- NOT want to handle 12 different recommendations and configure the
health checks manually on the entire planet

Of course an actual SD-WAN solution would have more intelligence based
on interface metrics/traffic. Maybe. Or maybe that would make it less
reliable. I'm sure YMMV.


So what should equipment manufacturers (especially firewall,
multi-WAN, SD-WAN kind of solutions) implement as health checks for
WAN Links, so that it can be universally usable and not dependent on a
single ISP.

Or should every SD-WAN and Firewall vendor provide their own anycasted
ping responders? Yeah, that doesn't sound like a good idea. I was
about to praise Ubiquiti for setting up their own infrastructure, but
...

$ host ping.ubnt.com
ping.ubnt.com is an alias for ping2.ui.com.
ping2.ui.com has address 8.8.8.8
ping2.ui.com has address 1.1.1.1
$

If health checks require DNS lookups, then DNS intercepting captive
portals pointing to the RFC1918 address of the router will break it.
Not nice if ping.ubnt.com resolves to 192.168.1.1 when your WAN link
is down, and your fancy multi WAN FW doesn't failover.


Captive portals and similar things can really mess with health checks.

- should we use DNS queries and reject if the response has A/AAAA
records from RFC1918/bogus networks? I don't think periodic DNS
requests against public resolvers like Google or Cloudflare is an
issue at that scale
- HTTP requests against certain endpoints that are made for it
(captive portal detection), like:
 - http://google.com/generate_204
 - http://connectivitycheck.gstatic.com/generate_204
 - http://www.apple.com/library/test/success.html
 - http://www.msftconnecttest.com/connecttest.txt

But HTTP will also be intercepted from the Captive portal, so the
result needs to be checked. HTTPS doesn't work, because it requires
date/time which could be a bootstrap issue.

I don't think healthchecking WAN links in a reliable way across
multiple ISPs is trivial at all.


Lukas


More information about the Outages-discussion mailing list