[outages] IPv6 tunnels in FRA1 on HE.net down?

Wed Jun 12 18:30:09 EDT 2013

Indeed, traceroute to my fra1 tunnel is definately getting mislaid within
he.net.

Time to ask for a refund?  oh wait... ;)

Tracing route to tmcc.me [2001:470:1f0a:3ee::2]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms
 6.9.1.2.2.d.e.f.f.f.f.f.6.7.a.0.3.f.7.9.d.8.4.6.0.b.8.0.1.0.0.2.ip6.arpa
[2001:8b0:648d:97f3:a76:ffff:fed2:2196]
  2    65 ms    65 ms    66 ms  c.gormless.thn.aa.net.uk [2001:8b0:0:53::53]
  3    56 ms    58 ms    61 ms  2001:7f8:4::50e8:1
  4    70 ms    74 ms    74 ms
40gigabitethernet1-1.core1.lon1.he.net[2001:7f8:4::1b1b:1]
  5   145 ms   141 ms   132 ms
10gigabitethernet10-4.core1.nyc4.he.net[2001:470:0:128::1]
  6   146 ms   150 ms   154 ms
100gigabitethernet7-2.core1.chi1.he.net[2001:470:0:298::1]
  7   211 ms   216 ms   212 ms
10gigabitethernet11-4.core1.pao1.he.net[2001:470:0:283::1]
  8     *        *        *     Request timed out.
  9     *        *        *     Request timed out.
 10     *        *        *     Request timed out.

On 12 June 2013 23:19, Constantine A. Murenin <mureninc at gmail.com> wrote:

> I smell a big outage:
>
> % date; traceroute tserv1.fra1.he.net
> Wed Jun 12 15:17:04 PDT 2013
> traceroute to tserv1.fra1.he.net (216.66.80.30), 30 hops max, 60 byte
> packets
>  1  192.168.105.3 (192.168.105.3)  0.673 ms  0.773 ms  0.923 ms
>  2  10gigabitethernet7-6.core3.fmt2.he.net (65.49.10.217)  1.795 ms
> 1.811 ms  1.795 ms
>  3  10gigabitethernet12-1.core1.lax1.he.net (184.105.213.26)  19.773
> ms 10gigabitethernet10-1.core1.sjc2.he.net (184.105.222.14)  0.786 ms
> 0.756 ms
>  4  10gigabitethernet10-8.core1.nyc4.he.net (72.52.92.225)  71.714 ms
> 76.569 ms 10gigabitethernet14-2.core1.nyc4.he.net (184.105.213.198)
> 71.690 ms
>  5  * * *
>  6  * * *
>  7  * * *
>  8  * * *
>  9  * * *
> 10  * * *
> 11  * * *
> 12  * * *
> 13  * * *
> 14  * * *
> 15  * *^C
>
> However, a "reverse" traceroute works fine:
>
> % date; traceroute ns1.he.net
> Wed Jun 12 15:18:47 PDT 2013
> traceroute to ns1.he.net (216.218.130.2), 64 hops max, 40 byte packets
>  1  static.33.203.4.46.clients.your-server.de (46.4.203.33)  0.682 ms
> 0.531 ms  0.481 ms
>  2  hos-tr4.juniper2.rz13.hetzner.de (213.239.224.97)  0.245 ms
> hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65)  0.242 ms
> hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33)  0.241 ms
>  3  hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150)  5.951 ms
> hos-bb1.juniper1.ffm.hetzner.de (213.239.240.224)  4.803 ms  4.786 ms
>  4  30gigabitethernet4-3.core1.fra1.he.net (80.81.192.172)  6.354 ms
> 6.81 ms  5.451 ms
>  5  10gigabitethernet10-2.core1.par2.he.net (72.52.92.26)  22.803 ms
> 24.531 ms  24.787 ms
>  6  10gigabitethernet15-1.core1.ash1.he.net (184.105.213.93)  101.504
> ms  99.563 ms  99.958 ms
>  7  10gigabitethernet11-1.core1.pao1.he.net (184.105.213.177)  163.687
> ms  171.711 ms  175.34 ms
>  8  10gigabitethernet1-2.core1.fmt1.he.net (184.105.213.65)  163.940
> ms  171.362 ms  167.67 ms
>  9  ns1.he.net (216.218.130.2)  163.52 ms  165.265 ms  164.143 ms
>
> C.
>
> On 12 June 2013 15:11, Constantine A. Murenin <mureninc at gmail.com> wrote:
> > he.net fra1 tserv1 is down since about 10 minutes ago (~14:57 PT),
> > this time it seems like the whole thing is down, cannot even ping
> > ordns.he.net over IPv6; a connection of my friend who's running a
> > smokeping is also down, e.g. this is definitely widespread.
> >
> > Not sure what's exactly down, but it seems to be bgp-related, perhaps:
> >
> >  1  2600:3c01::8678:acff:fe0d:79c1 (2600:3c01::8678:acff:fe0d:79c1)
> > 0.699 ms  0.828 ms  0.970 ms
> >  2  10gigabitethernet7-6.core3.fmt2.he.net (2001:470:1:3b8::1)  5.936
> > ms  5.877 ms  5.859 ms
> >  3  10gigabitethernet5-4.core1.pao1.he.net (2001:470:0:263::2)  6.743
> > ms  6.877 ms  6.978 ms
> >  4  * * *
> >  5  * * *
> >  6  * * *
> >  7  * * *
> >  8  * * *
> >  9  * * *
> > 10  * * *
> > 11  * * *
> > 12  * * *
> > 13  * * *
> > 14  * * *
> > 15  * * *
> > 16  * * *
> > 17  * * *
> > 18  * * *
> >
> > C.
> >
> > On 14 May 2013 15:50, Constantine A. Murenin <mureninc at gmail.com> wrote:
> >> For what it is worth, further details about the issue have surfaced.
> >> I found a friend who also has a tunnel on tserv1.fra1.he.net., and he
> >> has been running smokeping to various IPv4 and IPv6 resources for
> >> quite a while.
> >>
> >> According to several of his smokeping reports, it can be concluded
> >> that this very outage occurred during 14T18:00/05 and 14T18:45/50; but
> >> we've also noticed that there was another, 6 hour (yes, 6 hour) outage
> >> a day earlier, ~13T12 to ~13T18 (which corresponds to Monday early to
> >> late morning Pacific Time).
> >>
> >> I've contacted he.net again this time around, and they said that
> >> they're trying to hunt some obscure kernel bug that is causing these
> >> issues.
> >>
> >> The tunnelbroker.net is a free service, but to have a 6 hour outage,
> >> clearly spanning 1/4th of a whole day, is absolutely ridiculous.  I'm
> >> stunned that IPv6 connectivity of tserv1.fra1.he.net. is, apparently,
> >> still not monitored, even though it's known to be having these issues.
> >>  ???
> >>
> >> Alternatively, it is, of course, possible that some engineer has been
> >> troubleshooting the root cause of this issue for those whole 5 or 6
> >> hours on Sunday/Monday night; but I find that somewhat hard to
> >> believe; more like it got busted, and noone responsible knew about it
> >> being busted for most of the time that it was.
> >>
> >> Even more troubling, is that they don't even publish any reports about
> >> these extended outages.
> >>
> >> For tserv1.fra1.he.net. end users:  if you can `ping6 ordns.he.net`
> >> (it runs on tserv itself, try $(host `dig +short -6 @ordns.he.net
> >> whoami.akamai.net`)), but cannot `ping6 ns4.he.net`, then it most
> >> likely means that tserv1.fra1.he.net IPv6-connectivity is down again,
> >> and you must open a ticket with HE.net ASAP.  Perhaps someone should
> >> setup a smokeping with automatic emails to support at he.net?
> >>
> >> C.
> >>
> >> On 14 May 2013 09:42, Constantine A. Murenin <mureninc at gmail.com>
> wrote:
> >>> This just happened again:  all IPv6 tunnels on tserv1.fra1.he.net.
> >>> were inaccessible; from within a tunnel, cannot access any IPv6
> >>> resource, other than ordns.he.net, which runs on the tunnel server
> >>> itself.
> >>>
> >>> Why does FRA1 loses IPv6 connectivity so often?
> >>>
> >>> Update: Seems like it has been resolved as I've been writing this
> >>> email, but this would seem to happen a few times too many.
> >>>
> >>> C.
> >>>
> >>> On 23 April 2013 19:28, Constantine A. Murenin <mureninc at gmail.com>
> wrote:
> >>>> On 23 April 2013 18:28, Constantine A. Murenin <mureninc at gmail.com>
> wrote:
> >>>>> As of a couple of minutes ago, my IPv6 tunnel seems to have no
> >>>>> connectivity, weirdly other than being able to access ordns.he.net
> >>>>> (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other
> >>>>> IPv6 host.
> >>>>
> >>>> Update: reported to he.net support@ on 18:34, including a follow-up
> phone call;
> >>>> everything's back online, as of at least 18:51 PT.
> >>>> Pretty fast resolution, for a free service. :-)
> >>>>
> >>>> According to he.net, tserv wasn't responding on its IPv6 address,
> and has
> >>>> henceforth been rebooted.
> >>>>
> >>>> Which adds up as per my mtr from a Linode:
> >>>>
> >>>> # mtr --report{,-wide,-cycles=60} --order "SRL BGAWV" -6 XXXXXXXXX ;
> date
> >>>>   2. 10gigabitethernet2-3.core1.fmt1.he.net     60    60  0.0%
>  0.4   2.5   8.5  85.5  15.2
> >>>>   3. 10gigabitethernet1-2.core1.sjc2.he.net     60    60  0.0%
>  0.8   2.5   5.3  51.6   8.5
> >>>>   4. 10gigabitethernet3-3.core1.den1.he.net     60    60  0.0%
> 27.8  31.7  32.3  70.0   7.3
> >>>>   5. 10gigabitethernet5-5.core1.mci3.he.net     60    60  0.0%
> 39.7  43.6  44.3 114.3  10.1
> >>>>   6. 10gigabitethernet5-2.core1.chi1.he.net     60    60  0.0%
> 52.0  56.2  57.4 177.2  17.0
> >>>>   7. 100gigabitethernet7-2.core1.nyc4.he.net    60    59  1.7%
> 69.1  72.2  72.5 119.8   7.1
> >>>>   8. 10gigabitethernet1-2.core1.lon1.he.net     60    60  0.0%
>  137.8 141.8 142.2 229.2  12.0
> >>>>   9. 10gigabitethernet4-2.core1.fra1.he.net     60    60  0.0%
>  149.4 154.0 154.4 242.8  12.7
> >>>>  10. ???                                        60     0 100.0    0.0
>   0.0   0.0   0.0   0.0
> >>>> Tue Apr 23 18:14:29 PDT 2013
> >>>>
> >>>> ...
> >>>>
> >>>>   2. 10gigabitethernet2-3.core1.fmt1.he.net     60    60  0.0%
>  0.4   2.8 9.3 78.3 16.8
> >>>>   3. 10gigabitethernet1-2.core1.sjc2.he.net     60    60  0.0%
>  0.8   2.3   3.9  13.9   3.8
> >>>>   4. 10gigabitethernet3-3.core1.den1.he.net     60    60  0.0%
> 27.8  30.3  30.5  39.2   3.5
> >>>>   5. 10gigabitethernet5-5.core1.mci3.he.net     60    60  0.0%
> 39.7  43.4  43.6  52.8   4.4
> >>>>   6. 10gigabitethernet5-2.core1.chi1.he.net     60    60  0.0%
> 52.0  54.1  54.2  62.2   3.2
> >>>>   7. 100gigabitethernet7-2.core1.nyc4.he.net    60    60  0.0%
> 69.1  71.5  71.6  82.1   3.5
> >>>>   8. 10gigabitethernet1-2.core1.lon1.he.net     60    60  0.0%
>  137.8 140.3 140.4 149.1   3.5
> >>>>   9. 10gigabitethernet4-2.core1.fra1.he.net     60    60  0.0%
>  149.4 152.3 152.3 162.5   3.7
> >>>>  10. tserv1.fra1.he.net                         60    60  0.0%
>  154.1 155.5 155.5 163.4   2.2
> >>>>  11. IPv6.XXXXXX                                60    59  1.7%  155.2
> 156.0 156.1 162.9   1.2
> >>>> Tue Apr 23 18:55:34 PDT 2013
> >>>>
> >>>>
> >>>>
> >>>> And I guess ordns.he.net (2001:470:20::2) really runs on tserv
> >>>> (and hence wasn't affected during the outage).
> >>>>
> >>>> Cns# echo {ordns,ns{2,3,4,5}}.he.net | xargs -n1 traceroute6 -l; date
> >>>> traceroute6 to ordns.he.net (2001:470:20::2) from
> 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
> >>>>  1  ordns.he.net (2001:470:20::2)  6.39 ms  6.495 ms  6.213 ms
> >>>> traceroute6 to ns2.he.net (2001:470:200::2) from
> 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
> >>>>  1  XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)
>  9.353 ms  8.783 ms  9.328 ms
> >>>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  13.12 ms
>  15.134 ms  6.252 ms
> >>>>  3  10gigabitethernet5-3.core1.lon1.he.net (2001:470:0:1d2::1)
>  18.113 ms  23.554 ms  20.342 ms
> >>>>  4  ns2.he.net (2001:470:200::2)  20.449 ms  22.873 ms  20.517 ms
> >>>> traceroute6 to ns3.he.net (2001:470:300::2) from
> 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
> >>>>  1  * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)
>  8.743 ms  9.2 ms
> >>>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  5.934 ms
>  10.749 ms  6.197 ms
> >>>>  3  10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1)
>  18.567 ms  17.728 ms  13.282 ms
> >>>>  4  ns3.he.net (2001:470:300::2)  13.462 ms  13.412 ms  13.525 ms
> >>>> traceroute6 to ns4.he.net (2001:470:400::2) from
> 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
> >>>>  1  XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)
>  8.838 ms  8.684 ms  8.682 ms
> >>>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  6.233 ms
>  13.208 ms  5.97 ms
> >>>>  3  ns4.he.net (2001:470:400::2)  6.145 ms  6.38 ms  6.384 ms
> >>>> traceroute6 to ns5.he.net (2001:470:500::2) from
> 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets
> >>>>  1  * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1)
>  8.655 ms  8.848 ms
> >>>>  2  gige-g2-20.core1.fra1.he.net (2001:470:0:69::1)  12.607 ms
>  6.527 ms  11.929 ms
> >>>>  3  10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1)
>  13.289 ms  13.968 ms  16.754 ms
> >>>>  4  ns5.he.net (2001:470:500::2)  14.111 ms  13.575 ms  13.385 ms
> >>>> Tue Apr 23 18:59:48 PDT 2013
> >>>>
> >>>>
> >>>> However, it's unclear why IPv6 connectivity of tserv1.fra1.he.net
> >>>> doesn't seem to be monitored otherwise. :/
> >>>>
> >>>> Cheers,
> >>>> Constantine.
>
>
>
> --
> В. В. Путин о совершенстве, 24 декабря 2000 года: Если человека все
> устраивает, то он полный идиот. Здорового человека в нормальной памяти
> не может всегда и всё устраивать.
>
> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/outages/attachments/20130612/d52976fd/attachment.htm>