[outages] IPv6 tunnels in FRA1 on HE.net down?

Constantine A. Murenin mureninc at gmail.com
Tue May 14 18:50:40 EDT 2013


For what it is worth, further details about the issue have surfaced.
I found a friend who also has a tunnel on tserv1.fra1.he.net., and he
has been running smokeping to various IPv4 and IPv6 resources for
quite a while.

According to several of his smokeping reports, it can be concluded
that this very outage occurred during 14T18:00/05 and 14T18:45/50; but
we've also noticed that there was another, 6 hour (yes, 6 hour) outage
a day earlier, ~13T12 to ~13T18 (which corresponds to Monday early to
late morning Pacific Time).

I've contacted he.net again this time around, and they said that
they're trying to hunt some obscure kernel bug that is causing these
issues.

The tunnelbroker.net is a free service, but to have a 6 hour outage,
clearly spanning 1/4th of a whole day, is absolutely ridiculous.  I'm
stunned that IPv6 connectivity of tserv1.fra1.he.net. is, apparently,
still not monitored, even though it's known to be having these issues.
 ???

Alternatively, it is, of course, possible that some engineer has been
troubleshooting the root cause of this issue for those whole 5 or 6
hours on Sunday/Monday night; but I find that somewhat hard to
believe; more like it got busted, and noone responsible knew about it
being busted for most of the time that it was.

Even more troubling, is that they don't even publish any reports about
these extended outages.

For tserv1.fra1.he.net. end users:  if you can `ping6 ordns.he.net`
(it runs on tserv itself, try $(host `dig +short -6 @ordns.he.net
whoami.akamai.net`)), but cannot `ping6 ns4.he.net`, then it most
likely means that tserv1.fra1.he.net IPv6-connectivity is down again,
and you must open a ticket with HE.net ASAP.  Perhaps someone should
setup a smokeping with automatic emails to support at he.net?

C.

On 14 May 2013 09:42, Constantine A. Murenin <mureninc at gmail.com> wrote:
> This just happened again:  all IPv6 tunnels on tserv1.fra1.he.net.
> were inaccessible; from within a tunnel, cannot access any IPv6
> resource, other than ordns.he.net, which runs on the tunnel server
> itself.
>
> Why does FRA1 loses IPv6 connectivity so often?
>
> Update: Seems like it has been resolved as I've been writing this
> email, but this would seem to happen a few times too many.
>
> C.
>
> On 23 April 2013 19:28, Constantine A. Murenin <mureninc at gmail.com> wrote:
>> On 23 April 2013 18:28, Constantine A. Murenin <mureninc at gmail.com> wrote:
>>> As of a couple of minutes ago, my IPv6 tunnel seems to have no
>>> connectivity, weirdly other than being able to access ordns.he.net
>>> (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other
>>> IPv6 host.
>>
>> Update: reported to he.net support@ on 18:34, including a follow-up phone call;
>> everything's back online, as of at least 18:51 PT.
>> Pretty fast resolution, for a free service. :-)
>>
>> According to he.net, tserv wasn't responding on its IPv6 address, and has
>> henceforth been rebooted.
>>
>> Which adds up as per my mtr from a Linode:
>>
>> # mtr --report{,-wide,-cycles=60} --order "SRL BGAWV" -6 XXXXXXXXX ; date
>>   2. 10gigabitethernet2-3.core1.fmt1.he.net     60    60  0.0%    0.4   2.5   8.5  85.5  15.2
>>   3. 10gigabitethernet1-2.core1.sjc2.he.net     60    60  0.0%    0.8   2.5   5.3  51.6   8.5
>>   4. 10gigabitethernet3-3.core1.den1.he.net     60    60  0.0%   27.8  31.7  32.3  70.0   7.3
>>   5. 10gigabitethernet5-5.core1.mci3.he.net     60    60  0.0%   39.7  43.6  44.3 114.3  10.1
>>   6. 10gigabitethernet5-2.core1.chi1.he.net     60    60  0.0%   52.0  56.2  57.4 177.2  17.0
>>   7. 100gigabitethernet7-2.core1.nyc4.he.net    60    59  1.7%   69.1  72.2  72.5 119.8   7.1
>>   8. 10gigabitethernet1-2.core1.lon1.he.net     60    60  0.0%  137.8 141.8 142.2 229.2  12.0
>>   9. 10gigabitethernet4-2.core1.fra1.he.net     60    60  0.0%  149.4 154.0 154.4 242.8  12.7
>>  10. ???                                        60     0 100.0    0.0   0.0   0.0   0.0   0.0
>> Tue Apr 23 18:14:29 PDT 2013
>>
>> ...
>>
>>   2. 10gigabitethernet2-3.core1.fmt1.he.net     60    60  0.0%    0.4   2.8 9.3 78.3 16.8
>>   3. 10gigabitethernet1-2.core1.sjc2.he.net     60    60  0.0%    0.8   2.3   3.9  13.9   3.8
>>   4. 10gigabitethernet3-3.core1.den1.he.net     60    60  0.0%   27.8  30.3  30.5  39.2   3.5
>>   5. 10gigabitethernet5-5.core1.mci3.he.net     60    60  0.0%   39.7  43.4  43.6  52.8   4.4
>>   6. 10gigabitethernet5-2.core1.chi1.he.net     60    60  0.0%   52.0  54.1  54.2  62.2   3.2
>>   7. 100gigabitethernet7-2.core1.nyc4.he.net    60    60  0.0%   69.1  71.5  71.6  82.1   3.5
>>   8. 10gigabitethernet1-2.core1.lon1.he.net     60    60  0.0%  137.8 140.3 140.4 149.1   3.5
>>   9. 10gigabitethernet4-2.core1.fra1.he.net     60    60  0.0%  149.4 152.3 152.3 162.5   3.7
>>  10. tserv1.fra1.he.net                         60    60  0.0%  154.1 155.5 155.5 163.4   2.2
>>  11. IPv6.XXXXXX                                60    59  1.7%  155.2 156.0 156.1 162.9   1.2
>> Tue Apr 23 18:55:34 PDT 2013
>>
>>
>>
>> And I guess ordns.he.net (2001:470:20::2) really runs on tserv
>> (and hence wasn't affected during the outage).
>>
>> Cns# echo {ordns,ns{2,3,4,5}}.he.net | xargs -n1 traceroute6 -l; date
>> traceroute6 to ordns.he.net (2001:470:20::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
>>  1  ordns.he.net (2001:470:20::2)  6.39 ms  6.495 ms  6.213 ms
>> traceroute6 to ns2.he.net (2001:470:200::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
>>  1  XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)  9.353 ms  8.783 ms  9.328 ms
>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  13.12 ms  15.134 ms  6.252 ms
>>  3  10gigabitethernet5-3.core1.lon1.he.net (2001:470:0:1d2::1)  18.113 ms  23.554 ms  20.342 ms
>>  4  ns2.he.net (2001:470:200::2)  20.449 ms  22.873 ms  20.517 ms
>> traceroute6 to ns3.he.net (2001:470:300::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
>>  1  * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)  8.743 ms  9.2 ms
>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  5.934 ms  10.749 ms  6.197 ms
>>  3  10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1)  18.567 ms  17.728 ms  13.282 ms
>>  4  ns3.he.net (2001:470:300::2)  13.462 ms  13.412 ms  13.525 ms
>> traceroute6 to ns4.he.net (2001:470:400::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
>>  1  XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)  8.838 ms  8.684 ms  8.682 ms
>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  6.233 ms  13.208 ms  5.97 ms
>>  3  ns4.he.net (2001:470:400::2)  6.145 ms  6.38 ms  6.384 ms
>> traceroute6 to ns5.he.net (2001:470:500::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
>>  1  * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)  8.655 ms  8.848 ms
>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  12.607 ms  6.527 ms  11.929 ms
>>  3  10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1)  13.289 ms  13.968 ms  16.754 ms
>>  4  ns5.he.net (2001:470:500::2)  14.111 ms  13.575 ms  13.385 ms
>> Tue Apr 23 18:59:48 PDT 2013
>>
>>
>> However, it's unclear why IPv6 connectivity of tserv1.fra1.he.net
>> doesn't seem to be monitored otherwise. :/
>>
>> Cheers,
>> Constantine.



More information about the Outages mailing list