[outages] IPv6 tunnels in FRA1 on HE.net down?

Constantine A. Murenin mureninc at gmail.com
Wed Jun 12 18:11:12 EDT 2013


he.net fra1 tserv1 is down since about 10 minutes ago (~14:57 PT),
this time it seems like the whole thing is down, cannot even ping
ordns.he.net over IPv6; a connection of my friend who's running a
smokeping is also down, e.g. this is definitely widespread.

Not sure what's exactly down, but it seems to be bgp-related, perhaps:

 1  2600:3c01::8678:acff:fe0d:79c1 (2600:3c01::8678:acff:fe0d:79c1)
0.699 ms  0.828 ms  0.970 ms
 2  10gigabitethernet7-6.core3.fmt2.he.net (2001:470:1:3b8::1)  5.936
ms  5.877 ms  5.859 ms
 3  10gigabitethernet5-4.core1.pao1.he.net (2001:470:0:263::2)  6.743
ms  6.877 ms  6.978 ms
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *

C.

On 14 May 2013 15:50, Constantine A. Murenin <mureninc at gmail.com> wrote:
> For what it is worth, further details about the issue have surfaced.
> I found a friend who also has a tunnel on tserv1.fra1.he.net., and he
> has been running smokeping to various IPv4 and IPv6 resources for
> quite a while.
>
> According to several of his smokeping reports, it can be concluded
> that this very outage occurred during 14T18:00/05 and 14T18:45/50; but
> we've also noticed that there was another, 6 hour (yes, 6 hour) outage
> a day earlier, ~13T12 to ~13T18 (which corresponds to Monday early to
> late morning Pacific Time).
>
> I've contacted he.net again this time around, and they said that
> they're trying to hunt some obscure kernel bug that is causing these
> issues.
>
> The tunnelbroker.net is a free service, but to have a 6 hour outage,
> clearly spanning 1/4th of a whole day, is absolutely ridiculous.  I'm
> stunned that IPv6 connectivity of tserv1.fra1.he.net. is, apparently,
> still not monitored, even though it's known to be having these issues.
>  ???
>
> Alternatively, it is, of course, possible that some engineer has been
> troubleshooting the root cause of this issue for those whole 5 or 6
> hours on Sunday/Monday night; but I find that somewhat hard to
> believe; more like it got busted, and noone responsible knew about it
> being busted for most of the time that it was.
>
> Even more troubling, is that they don't even publish any reports about
> these extended outages.
>
> For tserv1.fra1.he.net. end users:  if you can `ping6 ordns.he.net`
> (it runs on tserv itself, try $(host `dig +short -6 @ordns.he.net
> whoami.akamai.net`)), but cannot `ping6 ns4.he.net`, then it most
> likely means that tserv1.fra1.he.net IPv6-connectivity is down again,
> and you must open a ticket with HE.net ASAP.  Perhaps someone should
> setup a smokeping with automatic emails to support at he.net?
>
> C.
>
> On 14 May 2013 09:42, Constantine A. Murenin <mureninc at gmail.com> wrote:
>> This just happened again:  all IPv6 tunnels on tserv1.fra1.he.net.
>> were inaccessible; from within a tunnel, cannot access any IPv6
>> resource, other than ordns.he.net, which runs on the tunnel server
>> itself.
>>
>> Why does FRA1 loses IPv6 connectivity so often?
>>
>> Update: Seems like it has been resolved as I've been writing this
>> email, but this would seem to happen a few times too many.
>>
>> C.
>>
>> On 23 April 2013 19:28, Constantine A. Murenin <mureninc at gmail.com> wrote:
>>> On 23 April 2013 18:28, Constantine A. Murenin <mureninc at gmail.com> wrote:
>>>> As of a couple of minutes ago, my IPv6 tunnel seems to have no
>>>> connectivity, weirdly other than being able to access ordns.he.net
>>>> (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other
>>>> IPv6 host.
>>>
>>> Update: reported to he.net support@ on 18:34, including a follow-up phone call;
>>> everything's back online, as of at least 18:51 PT.
>>> Pretty fast resolution, for a free service. :-)
>>>
>>> According to he.net, tserv wasn't responding on its IPv6 address, and has
>>> henceforth been rebooted.
>>>
>>> Which adds up as per my mtr from a Linode:
>>>
>>> # mtr --report{,-wide,-cycles=60} --order "SRL BGAWV" -6 XXXXXXXXX ; date
>>>   2. 10gigabitethernet2-3.core1.fmt1.he.net     60    60  0.0%    0.4   2.5   8.5  85.5  15.2
>>>   3. 10gigabitethernet1-2.core1.sjc2.he.net     60    60  0.0%    0.8   2.5   5.3  51.6   8.5
>>>   4. 10gigabitethernet3-3.core1.den1.he.net     60    60  0.0%   27.8  31.7  32.3  70.0   7.3
>>>   5. 10gigabitethernet5-5.core1.mci3.he.net     60    60  0.0%   39.7  43.6  44.3 114.3  10.1
>>>   6. 10gigabitethernet5-2.core1.chi1.he.net     60    60  0.0%   52.0  56.2  57.4 177.2  17.0
>>>   7. 100gigabitethernet7-2.core1.nyc4.he.net    60    59  1.7%   69.1  72.2  72.5 119.8   7.1
>>>   8. 10gigabitethernet1-2.core1.lon1.he.net     60    60  0.0%  137.8 141.8 142.2 229.2  12.0
>>>   9. 10gigabitethernet4-2.core1.fra1.he.net     60    60  0.0%  149.4 154.0 154.4 242.8  12.7
>>>  10. ???                                        60     0 100.0    0.0   0.0   0.0   0.0   0.0
>>> Tue Apr 23 18:14:29 PDT 2013
>>>
>>> ...
>>>
>>>   2. 10gigabitethernet2-3.core1.fmt1.he.net     60    60  0.0%    0.4   2.8 9.3 78.3 16.8
>>>   3. 10gigabitethernet1-2.core1.sjc2.he.net     60    60  0.0%    0.8   2.3   3.9  13.9   3.8
>>>   4. 10gigabitethernet3-3.core1.den1.he.net     60    60  0.0%   27.8  30.3  30.5  39.2   3.5
>>>   5. 10gigabitethernet5-5.core1.mci3.he.net     60    60  0.0%   39.7  43.4  43.6  52.8   4.4
>>>   6. 10gigabitethernet5-2.core1.chi1.he.net     60    60  0.0%   52.0  54.1  54.2  62.2   3.2
>>>   7. 100gigabitethernet7-2.core1.nyc4.he.net    60    60  0.0%   69.1  71.5  71.6  82.1   3.5
>>>   8. 10gigabitethernet1-2.core1.lon1.he.net     60    60  0.0%  137.8 140.3 140.4 149.1   3.5
>>>   9. 10gigabitethernet4-2.core1.fra1.he.net     60    60  0.0%  149.4 152.3 152.3 162.5   3.7
>>>  10. tserv1.fra1.he.net                         60    60  0.0%  154.1 155.5 155.5 163.4   2.2
>>>  11. IPv6.XXXXXX                                60    59  1.7%  155.2 156.0 156.1 162.9   1.2
>>> Tue Apr 23 18:55:34 PDT 2013
>>>
>>>
>>>
>>> And I guess ordns.he.net (2001:470:20::2) really runs on tserv
>>> (and hence wasn't affected during the outage).
>>>
>>> Cns# echo {ordns,ns{2,3,4,5}}.he.net | xargs -n1 traceroute6 -l; date
>>> traceroute6 to ordns.he.net (2001:470:20::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
>>>  1  ordns.he.net (2001:470:20::2)  6.39 ms  6.495 ms  6.213 ms
>>> traceroute6 to ns2.he.net (2001:470:200::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
>>>  1  XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)  9.353 ms  8.783 ms  9.328 ms
>>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  13.12 ms  15.134 ms  6.252 ms
>>>  3  10gigabitethernet5-3.core1.lon1.he.net (2001:470:0:1d2::1)  18.113 ms  23.554 ms  20.342 ms
>>>  4  ns2.he.net (2001:470:200::2)  20.449 ms  22.873 ms  20.517 ms
>>> traceroute6 to ns3.he.net (2001:470:300::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
>>>  1  * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)  8.743 ms  9.2 ms
>>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  5.934 ms  10.749 ms  6.197 ms
>>>  3  10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1)  18.567 ms  17.728 ms  13.282 ms
>>>  4  ns3.he.net (2001:470:300::2)  13.462 ms  13.412 ms  13.525 ms
>>> traceroute6 to ns4.he.net (2001:470:400::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
>>>  1  XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)  8.838 ms  8.684 ms  8.682 ms
>>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  6.233 ms  13.208 ms  5.97 ms
>>>  3  ns4.he.net (2001:470:400::2)  6.145 ms  6.38 ms  6.384 ms
>>> traceroute6 to ns5.he.net (2001:470:500::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
>>>  1  * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)  8.655 ms  8.848 ms
>>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  12.607 ms  6.527 ms  11.929 ms
>>>  3  10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1)  13.289 ms  13.968 ms  16.754 ms
>>>  4  ns5.he.net (2001:470:500::2)  14.111 ms  13.575 ms  13.385 ms
>>> Tue Apr 23 18:59:48 PDT 2013
>>>
>>>
>>> However, it's unclear why IPv6 connectivity of tserv1.fra1.he.net
>>> doesn't seem to be monitored otherwise. :/
>>>
>>> Cheers,
>>> Constantine.



More information about the Outages mailing list