[j-nsp] Parallel BGP sessions for v6 prefixes over v4 and v6

Andrey Kostin ankost at podolsk.ru
Mon Jul 8 11:33:48 EDT 2024


Hi juniper-nsp readers,

Recently we encountered an issue with L3-incompletes counters started 
incrementing on internal backbone links. It began after adding new PE, 
core routers and route-reflectors.
After quite long investigation with TAC involved the problem was 
identified: v6 traffic was sent over RSVP tunnels without explicit-null 
label and was arriving with v4 Ethertype in MAC header to the egress PE.

The issue with missing explicit-null label turned out to be caused by 
having both inet6 unicast (over ipv6) and inet6 labeled-unicast 
explicit-null (over ipv4) BGP sessions running in parallel.
Route-reflector receives the same prefix from originating PE over v4 and 
v6 BGP session and installs both paths in inet6.0 table.

akostin at rr02> show route 2a03:2880:f10e::/48 receive-protocol bgp 
X.X.X.130 detail   <<< Received over v4 BGP session with family inet6 
labeled-unicast explicit-null and has Label 2 accordingly

inet6.0: 195655 destinations, 1173973 routes (195655 active, 6 holddown, 
0 hidden)

* 2a03:2880:f10e::/48 (2 entries, 0 announced)
      Accepted Multipath
      Route Label: 2
      Nexthop: ::ffff:X.X.X.130
      MED: 95
      Localpref: 106
      AS path: 32934 I
      Communities: Y:30000 Y:30127
      Addpath Path ID: 1
      Accepted MultipathContrib MultipathDup
      Route Label: 2
      Nexthop: ::ffff:X.X.X.140
      MED: 95
      Localpref: 106
      AS path: 32934 I  (Originator)
      Cluster list:  X.X.2.4
      Originator ID: X.X.X.140
      Communities: Y:30000 Y:30127
      Addpath Path ID: 2

akostin at rr02> show route 2a03:2880:f10e::/48 receive-protocol bgp 
2607:X:X::1:130 detail     <<<< Received over v6 BGP session and has v6 
nexthop

inet6.0: 195656 destinations, 1173985 routes (195657 active, 6 holddown, 
0 hidden)

   2a03:2880:f10e::/48 (1 entry, 0 announced)
      Accepted
      Nexthop: 2607:X:X::1:130
      MED: 95
      Localpref: 106
      AS path: 32934 I
      Communities: Y:30000 Y:30127

So far so good, but when route-reflector advertises the prefix to a 
rr-client it picks up one or more best paths if add-path is configured. 
In this case RR chooses the path with mapped IPv4 address and sends it 
over ipv6 BGP session, obviously without implicit-null label.

akostin at rr02> show route 2a03:2880:f10e::/48 advertising-protocol bgp 
X.X.X.237 detail       <<<< Correctly advertised over v4 BGP session 
with mapped v4 nexthop and explicit-null label

inet6.0: 195756 destinations, 1174580 routes (195756 active, 6 holddown, 
0 hidden)

* 2a03:2880:f10e::/48 (6 entries, 0 announced)
  BGP group internal-rr-v4 type Internal
      Route Label: 2
      Nexthop: ::ffff:X.X.X.130
      MED: 95
      Localpref: 106
      AS path: [Y] 32934 I
      Communities: Y:30000 Y:30127
      Cluster ID: X.X.X.155
      Originator ID: X.X.X.130
      Addpath Path ID: 1
  BGP group internal-rr-v4 type Internal
      Route Label: 2
      Nexthop: ::ffff:X.X.X.140
      MED: 95
      Localpref: 106
      AS path: [Y] 32934 I
      Communities: Y:30000 Y:30127
      Cluster ID: X.X.X.155
      Originator ID: X.X.X.140
      Addpath Path ID: 2

akostin at rr02> show route 2a03:2880:f10e::/48 advertising-protocol bgp 
2607:X:X::1:237 detail     <<<< The path, received over v4 BGP session, 
is advertised over v6 session. Important, that this path has mapped IPv4 
nexthop but doesn't have explicit-null label.

inet6.0: 195760 destinations, 1174603 routes (195760 active, 7 holddown, 
0 hidden)

* 2a03:2880:f10e::/48 (6 entries, 0 announced)
  BGP group internal-rr-v6 type Internal
      Nexthop: ::ffff:X.X.X.130
      MED: 95
      Localpref: 106
      AS path: [Y] 32934 I
      Communities: Y:30000 Y:30127
      Cluster ID: X.X.X.155
      Originator ID: X.X.X.130

On the receiving router all paths are installed because of BGP 
multipath. If the last path is used, v6 packets are sent without 
explicit-null label, arrive to the egress PE with wrong ethertype and 
dropped as L3-incompletes.

akostin at re0.agg02> show route  2a03:2880:f10e::/48  table inet6.0

+ = Active Route, - = Last Active, * = Both

2a03:2880:f10e::/48*[BGP/170] 2d 21:46:57, MED 95, localpref 106, from 
X.X.X.154
                       AS path: 32934 I, validation-state: unverified
                        to X.X.X.14 via ae0.0, label-switched-path 
BE-agg02-to-bdr01-1
                     >  to X.X.X.14 via ae0.0, label-switched-path 
BE-agg02-to-bdr01-2
                     [BGP/170] 2d 21:54:26, MED 95, localpref 106, from 
X.X.X.155
                       AS path: 32934 I, validation-state: unverified
                        to X.X.X.14 via ae0.0, Push 2, Push 129063(top)
                     >  to X.X.X.14 via ae0.0, Push 2, Push 129001(top)
                     [BGP/170] 2d 21:47:17, MED 95, localpref 106, from 
X.X.X.154
                       AS path: 32934 I, validation-state: unverified
                        to X.X.X.14 via ae0.0, Push 2, Push 129314(top)
                     >  to X.X.X.14 via ae0.0, Push 2, Push 128995(top)
                     [BGP/170] 2d 21:47:17, MED 95, localpref 106, from 
X.X.X.155
                       AS path: 32934 I, validation-state: unverified
                        to X.X.X.14 via ae0.0, Push 2, Push 129314(top)
                     >  to X.X.X.14 via ae0.0, Push 2, Push 128995(top)
                     [BGP/170] 2d 21:47:17, MED 95, localpref 106, from 
2607:X:X::1:154
                       AS path: 32934 I, validation-state: unverified
                        to X.X.X.14 via ae0.0, Push 129314
                     >  to X.X.X.14 via ae0.0, Push 128995
                     [BGP/170] 2d 21:47:17, MED 95, localpref 106, from 
2607:X:X::1:155
                       AS path: 32934 I, validation-state: unverified
                        to X.X.X.14 via ae0.0, Push 129314
                     >  to X.X.X.14 via ae0.0, Push 128995

The first four paths are correct, but the last two are missing Label 2 
because they are received over v6 BGP session without explicit-null. If 
incorrect path is used, the mapped ipv4 nexthop is resolved over MPLS 
tunnel but packets are sent with only transport label (129314 or 128995 
in this case) that's removed on a penultimate hop. Because of missing 
label 2, packets arrive to the egress PE with wrong ethertype and 
dropped as L3-incompletes.

The problem here is that route-reflector selects a path with ipv4 mapped 
nexthop and advertises it over ipv6 session. I'm wondering, is anybody 
already encountered this problem and found a solution how to make a RR 
to advertise paths with a correct nexthop?
I know that having two session for ipv6 adds complexity and one of them 
can be removed, but interested to find out an elegant solution for this 
issue.

Kind regards,
Andrey


More information about the juniper-nsp mailing list