[c-nsp] 6500 SXI9 broken MPLS L3VPN with per-prefix label allocation
Bernhard Schmidt
berni at birkenwald.de
Tue Mar 12 13:45:24 EDT 2013
Hello everyone,
I have a quite weird problem I cannot wrap my head around. I think it's
an annoying bug, but I'm not sure.
We are currently experimenting with MPLS in our network. The first use
will be L3VPN to get rid of some multi-step PBR when our clients with
RFC1918 addresses want to go to the internet and are redirected through
the NAT cluster, which is not on the same location as the transit.
For this we have the following "test" setup in the live network:
Router R1
|
Client --- PE 1 ----- P....P ----- PE 2 ---- NAT-Cluster
NX 6.1(2) NX 6.1(2) VSS1440
IOS SXJ* IOS SXI9
The VRF only carries a default route pointing towards the NAT-Cluster,
on a global SVI and thus a global next-hop
---
vrf definition SECOMAT
rd 129.187.0.9:9000
!
address-family ipv4
route-target export 12816:9000
route-target import 12816:9000
exit-address-family
!
!
router bgp 12816
!
address-family ipv4 vrf SECOMAT
redistribute static
no synchronization
network 0.0.0.0
exit-address-family
!
ip route vrf SECOMAT 0.0.0.0 0.0.0.0 Vlan1644 138.246.99.33
---
vss1-2wr#sh ip route vrf SECOMAT
Routing Table: SECOMAT
[...]
Gateway of last resort is 138.246.99.33 to network 0.0.0.0
S* 0.0.0.0/0 [1/0] via 138.246.99.33, Vlan1644
At the PE 1 traffic from RFC1918 to !our destination addresses are
supposed to be PBRed into the VRF. At the moment it is a very easy
route-map PRIVATE_TO_SECOMAT permit 10
set vrf SECOMAT
As far as I can tell this works quite well, a trace from the client
follows the normal path to the PE 2 (not the internet transit, which was
the whole point), but then it gets ugly
traceroute to 83.170.0.1 (83.170.0.1), 30 hops max, 60 byte packets
1 10.155.0.254 (10.155.0.254) 0.320 ms 0.446 ms 0.587 ms
2 * * * <--- this is a NX-OS device which does not answer
3 vl-3004.csr1-0gz.lrz.de (129.187.0.142) 1.189 ms 1.279 ms 1.318
ms
4 * * * <--- this is a NX-OS device which does not answer
5 * * * <--- this is the egress PE
6 vl-3016.csr1-2wr.lrz.de (129.187.0.253) 0.790 ms 0.934 ms 0.937
Hop 6 is the upstream router of the PE 2, so at this point the traffic
is in the global routing table.
I was pretty sure this is a configuration error, but now I don't think
it is. Observe:
Ingress PE:
0.0.0.0/0, ubest/mbest: 1/0
*via 129.187.0.9%default, [200/0], 00:16:54, bgp-12816, internal,
tag 12816 (mpls-vpn)
MPLS[0]: Label=875 E=0 TTL=0 S=0 (VPN)
client-specific data: 4f59d
recursive next hop: 129.187.0.9/32%default
extended route information: BGP origin AS 12816 BGP peer AS
Egress PE:
vss1-2wr#sh mpls forwarding-table labels 875
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or Tunnel Id Switched interface
875 No Label 0.0.0.0/0[V] 845000 Vl1644
138.246.99.33
So we have a per-prefix label, with the right egress interface and
next-hop
vss1-2wr#sh mls cef mpls labels 875
Codes: + - Push label, - - Pop Label * - Swap Label, E - exp1
Index Local Label Out i/f
Label Op
8009 875 (EOS) (-) recirc
Okay, I think this is the problem. If label 875 (which thanks to PHP is
the only label) is popped, the packet is untagged. Recirculation means
lookup in the global routing table, so it gets sent out to the upstream
router.
Every other L3VPN setup where I have used the out i/f is set correctly.
Incidentally, when I set the route to another interface, it works as
well
vss1-2wr#sh mls cef mpls labels 875
Codes: + - Push label, - - Pop Label * - Swap Label, E - exp1
Index Local Label Out i/f
Label Op
8009 875 (EOS) (-) Vl60 ,
0050.568f.0167
As far as I can tell, Vl60 is not so different from Vlan1644. GRT,
next-hop directly connected, next-hop in ARP table, next-hop pingable. I
have tried several next-hops in Vlan1644 and all of them lead to recirc.
Special thing about Vlan1644 is that one next-hop (.43) has a static ARP
entry towards a multicast MAC and that multicast MAC is sent to a fixed
set of ports (CLUSTERIP netfilter extension, similar to Microsoft NLB),
but I tried normal unicast next-hops as well (i.e. .33 as above)
I have found a workaround, which is the hidden and undocumented
mpls label mode vrf SECOMAT protocol bgp-vpnv4 per-vrf
which leads to
0.0.0.0/0, ubest/mbest: 1/0
*via 129.187.0.9%default, [200/0], 00:00:15, bgp-12816, internal,
tag 12816 (mpls-vpn)
MPLS[0]: Label=1253 E=0 TTL=0 S=0 (VPN)
client-specific data: 4f59d
recursive next hop: 129.187.0.9/32%default
extended route information: BGP origin AS 12816 BGP peer AS
12816
vss1-2wr#sh mpls forwarding-table labels 1253
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or Tunnel Id Switched interface
1253 Pop Label IPv4 VRF[V] 590186 aggregate/SECOMAT
and everything works as planned.
Anyone ever observed something like that?
Bernhard
More information about the cisco-nsp
mailing list