[j-nsp] Parallel BGP sessions for v6 prefixes over v4 and v6
Andrey Kostin
ankost at podolsk.ru
Mon Jul 8 11:33:48 EDT 2024
Hi juniper-nsp readers,
Recently we encountered an issue with L3-incompletes counters started
incrementing on internal backbone links. It began after adding new PE,
core routers and route-reflectors.
After quite long investigation with TAC involved the problem was
identified: v6 traffic was sent over RSVP tunnels without explicit-null
label and was arriving with v4 Ethertype in MAC header to the egress PE.
The issue with missing explicit-null label turned out to be caused by
having both inet6 unicast (over ipv6) and inet6 labeled-unicast
explicit-null (over ipv4) BGP sessions running in parallel.
Route-reflector receives the same prefix from originating PE over v4 and
v6 BGP session and installs both paths in inet6.0 table.
akostin at rr02> show route 2a03:2880:f10e::/48 receive-protocol bgp
X.X.X.130 detail <<< Received over v4 BGP session with family inet6
labeled-unicast explicit-null and has Label 2 accordingly
inet6.0: 195655 destinations, 1173973 routes (195655 active, 6 holddown,
0 hidden)
* 2a03:2880:f10e::/48 (2 entries, 0 announced)
Accepted Multipath
Route Label: 2
Nexthop: ::ffff:X.X.X.130
MED: 95
Localpref: 106
AS path: 32934 I
Communities: Y:30000 Y:30127
Addpath Path ID: 1
Accepted MultipathContrib MultipathDup
Route Label: 2
Nexthop: ::ffff:X.X.X.140
MED: 95
Localpref: 106
AS path: 32934 I (Originator)
Cluster list: X.X.2.4
Originator ID: X.X.X.140
Communities: Y:30000 Y:30127
Addpath Path ID: 2
akostin at rr02> show route 2a03:2880:f10e::/48 receive-protocol bgp
2607:X:X::1:130 detail <<<< Received over v6 BGP session and has v6
nexthop
inet6.0: 195656 destinations, 1173985 routes (195657 active, 6 holddown,
0 hidden)
2a03:2880:f10e::/48 (1 entry, 0 announced)
Accepted
Nexthop: 2607:X:X::1:130
MED: 95
Localpref: 106
AS path: 32934 I
Communities: Y:30000 Y:30127
So far so good, but when route-reflector advertises the prefix to a
rr-client it picks up one or more best paths if add-path is configured.
In this case RR chooses the path with mapped IPv4 address and sends it
over ipv6 BGP session, obviously without implicit-null label.
akostin at rr02> show route 2a03:2880:f10e::/48 advertising-protocol bgp
X.X.X.237 detail <<<< Correctly advertised over v4 BGP session
with mapped v4 nexthop and explicit-null label
inet6.0: 195756 destinations, 1174580 routes (195756 active, 6 holddown,
0 hidden)
* 2a03:2880:f10e::/48 (6 entries, 0 announced)
BGP group internal-rr-v4 type Internal
Route Label: 2
Nexthop: ::ffff:X.X.X.130
MED: 95
Localpref: 106
AS path: [Y] 32934 I
Communities: Y:30000 Y:30127
Cluster ID: X.X.X.155
Originator ID: X.X.X.130
Addpath Path ID: 1
BGP group internal-rr-v4 type Internal
Route Label: 2
Nexthop: ::ffff:X.X.X.140
MED: 95
Localpref: 106
AS path: [Y] 32934 I
Communities: Y:30000 Y:30127
Cluster ID: X.X.X.155
Originator ID: X.X.X.140
Addpath Path ID: 2
akostin at rr02> show route 2a03:2880:f10e::/48 advertising-protocol bgp
2607:X:X::1:237 detail <<<< The path, received over v4 BGP session,
is advertised over v6 session. Important, that this path has mapped IPv4
nexthop but doesn't have explicit-null label.
inet6.0: 195760 destinations, 1174603 routes (195760 active, 7 holddown,
0 hidden)
* 2a03:2880:f10e::/48 (6 entries, 0 announced)
BGP group internal-rr-v6 type Internal
Nexthop: ::ffff:X.X.X.130
MED: 95
Localpref: 106
AS path: [Y] 32934 I
Communities: Y:30000 Y:30127
Cluster ID: X.X.X.155
Originator ID: X.X.X.130
On the receiving router all paths are installed because of BGP
multipath. If the last path is used, v6 packets are sent without
explicit-null label, arrive to the egress PE with wrong ethertype and
dropped as L3-incompletes.
akostin at re0.agg02> show route 2a03:2880:f10e::/48 table inet6.0
+ = Active Route, - = Last Active, * = Both
2a03:2880:f10e::/48*[BGP/170] 2d 21:46:57, MED 95, localpref 106, from
X.X.X.154
AS path: 32934 I, validation-state: unverified
to X.X.X.14 via ae0.0, label-switched-path
BE-agg02-to-bdr01-1
> to X.X.X.14 via ae0.0, label-switched-path
BE-agg02-to-bdr01-2
[BGP/170] 2d 21:54:26, MED 95, localpref 106, from
X.X.X.155
AS path: 32934 I, validation-state: unverified
to X.X.X.14 via ae0.0, Push 2, Push 129063(top)
> to X.X.X.14 via ae0.0, Push 2, Push 129001(top)
[BGP/170] 2d 21:47:17, MED 95, localpref 106, from
X.X.X.154
AS path: 32934 I, validation-state: unverified
to X.X.X.14 via ae0.0, Push 2, Push 129314(top)
> to X.X.X.14 via ae0.0, Push 2, Push 128995(top)
[BGP/170] 2d 21:47:17, MED 95, localpref 106, from
X.X.X.155
AS path: 32934 I, validation-state: unverified
to X.X.X.14 via ae0.0, Push 2, Push 129314(top)
> to X.X.X.14 via ae0.0, Push 2, Push 128995(top)
[BGP/170] 2d 21:47:17, MED 95, localpref 106, from
2607:X:X::1:154
AS path: 32934 I, validation-state: unverified
to X.X.X.14 via ae0.0, Push 129314
> to X.X.X.14 via ae0.0, Push 128995
[BGP/170] 2d 21:47:17, MED 95, localpref 106, from
2607:X:X::1:155
AS path: 32934 I, validation-state: unverified
to X.X.X.14 via ae0.0, Push 129314
> to X.X.X.14 via ae0.0, Push 128995
The first four paths are correct, but the last two are missing Label 2
because they are received over v6 BGP session without explicit-null. If
incorrect path is used, the mapped ipv4 nexthop is resolved over MPLS
tunnel but packets are sent with only transport label (129314 or 128995
in this case) that's removed on a penultimate hop. Because of missing
label 2, packets arrive to the egress PE with wrong ethertype and
dropped as L3-incompletes.
The problem here is that route-reflector selects a path with ipv4 mapped
nexthop and advertises it over ipv6 session. I'm wondering, is anybody
already encountered this problem and found a solution how to make a RR
to advertise paths with a correct nexthop?
I know that having two session for ipv6 adds complexity and one of them
can be removed, but interested to find out an elegant solution for this
issue.
Kind regards,
Andrey
More information about the juniper-nsp
mailing list