[j-nsp] Weird traceroute across MPLS core using labeled-unicastIBGP

Mon Mar 8 14:56:55 EST 2004

In-line please

> -----Original Message-----
> From: juniper-nsp-bounces at puck.nether.net 
> [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf Of 
> Daniel Roesen
> Sent: Monday, March 08, 2004 11:25 AM
> To: juniper-nsp at puck.nether.net
> Subject: Re: [j-nsp] Weird traceroute across MPLS core using 
> labeled-unicastIBGP
> 
> 
> On Mon, Mar 08, 2004 at 10:37:34AM -0800, harry wrote:
> > I will investigate a PR on the monitor traffic. Does seem broke. I 
> > will let you know.
> 
> Thanks.
> 
> > 1. If the traffic is sourced from the CE, then monitor traffic does 
> > not function on PE/P routers (as you know.) On traffic 
> from/to the CE 
> > there is no MPLS/VPN encapsulation, again as I am sure that 
> you know.
> 
> ACK.
> 
> > 2. If you ping/traceroute from the local PE, say PE-1, to 
> the remote 
> > VRF interface or CE, then the path of the ICMP traffic is from the 
> > local PE to remote PE via mpls forwarding. When the remote PE/CE 
> > responds to the ping or traceroute the returning traffic, 
> even though 
> > addressed to the VRF interface on the local PE, is *not* initially 
> > processed by the local PE. Instead, the local PE pops the label and 
> > forwards the traffic to the local CE, which then routes the traffic 
> > back to the local PE where the response traffic is detected.
> 
> Uhm, when I do a ping/trace from PE-1 (local) to PE-2's CE 
> facing interface IP, the return traffic (ICMP echo reply 
> [ping] or Port Unreachable [traceroute]) does _not_ come 
> MPLS-encapsulated but by native forwarding. I don't see how 
> any CE is involved at all. And given that the ping/traceroute 
> source is PE-1's loopback IP, I don't see any reason that 
> PE-1 forwards the response to a CE-1, given that it's 
> terminating the traffic itself - and the traffic not carrying 
> any label anyway.

This strikes me as odd. The remote PE-CE VRF IP should not be in the remote
PE's main routing table, or in the inet.0 table of P routers, so the only
way a reply can come back to the originating (local) PE is for MPLS
forwarding, AFAIK (assumes that GRE is not used in lieu of MPLS across the
backbone..

> 
> > Note that the traffic coming back from the remote PE/CE is 
> handled as 
> > transit traffic so tcpdump at the local PE core-facing 
> interface will 
> > not display the response traffic.
> 
> Why transit traffic? In case of ping/traceroute originating 
> on PE-1, it shouldn't be transit traffic. In case of 
> originating on CE-1, it's clear. VPN label gets popped by the 
> PFE and packet forwarded to CE-1. RE doesn't see the packet at all.

Our forwarding plane is setup that upon receipt of a VRF labeled packet we
(by default) pop the label, which identifies the egress VRF interface, and
then forward the native IP out that interface to the CE, hence transit
designation. This default behavior is a function of the IP II being used to
index the VRF label to VRF-interface, and therefore not being able to also
do a L3 match/forward. You can alter this behavior with a tunnel PIC (VT
interface), or with vrf-table-label. Both result in a Layer 3 IP II lookup
*after* the VRF label is popped.

> 
> > 3. My testing shows that ping/traceroute traffic originated by the 
> > local PE can be monitored on the core-facing interface.
> 
> Indeed. I'm not sure why I didn't see it in the first place, 
> but now I'm clearly observing it.
> 
> > Note the absence of response traffic in the captures, 
> despite the test 
> > succeeding.
> > 
> > My Set up is:
> > 
> > 
> > CE	PE	  p	    PE	CE
> > hk-----sj-----de-------mo-----am
> > 	     ^
> > 	so-0/1/1			192.168.24.1
> 
> And you are pinging from PE sj, or from CE hk?

Ping/trace originating at PE SJ.
> 
> My setup here:
> 
> PE     P      P      PE
> A1-----B------C------D
> 
>  A1  lo0.0 192.168.0.1
>  | ge-0/0/0.100
>  |
>  |
>  | ge-0/0/0.100
>  B   lo0.0 192.168.0.2
>  | ge-1/0/0
>  |
>  |
>  | ge-0/0/0
>  C   lo0.0 192.168.0.4
>  | ge-0/1/0
>  |
>  |
>  | ge-0/0/0
>  D   lo0.0 192.168.0.5
>  | so-0/2/0 10.0.0.1/30  with sonet-options loopback local
> 
> I'm now pinging from A1 lo0.0 to 10.0.0.1, and monitoring 
> traffic on ge-0/0/0.100 on A1:

I think I see why you get a native reply. By default the local PE sources
traffic from the VRF interfaces bound to the routing instance. You are
souring from your PE's lo0, which should be routable by PE and P routers.
Also, the loopback emulates an attached CE nicely. The remote PE pops the
label, and sends it to the loopback, which result in the traffic coming back
to the remote PE, just as it would if the CE was attached and you pinged the
remote PE's VRF IP address. Note that the TTL is one higher than it should
be, due to lack of CE forwarding.

What confounds me is that the route to the local PE's loopback address for
native forwarding should be in inet.0, not the VRF. Even if you had a
default route in the remote PE's VRF pointing to the local PE's VRF I would
expect to see MPLS forwarding.  Thinking out loud that it seems when the
incoming exception traffic is passed to the remote PE's RE, and that some
how the VPN context is lost. This results in the remote PE consulting the
inet.0 table when attempting to reply.  

Either this, or you do not have the so-0/2/0 interface at the remote PE
bound to a VRF.

> 
> ping:
> =====
> 19:44:22.140159 Out VID [0: 100] MPLS (label 100064, exp 7, 
> ttl 255) (label 100192, exp 0, [S], ttl 255), IP, length: 92 
> 19:44:22.159164  In IP 10.0.0.1 > 192.168.0.1: icmp 64: echo reply
> 
> ICMP echo request goes out with double push, ICMP echo reply 
> from D is received via normal IP forwarding.
> 
> traceroute:
> ===========
> (only first probe per hop shown)
> 19:47:08.461990 Out VID [0: 100] MPLS (label 100064, exp 7, 
> ttl 255) (label 100192, exp 0, [S], ttl 1), IP, length: 48 
> 19:47:08.462772  In IP 10.0.0.1 > 192.168.0.1: icmp 148: time 
> exceeded in-transit 19:47:08.468193 Out VID [0: 100] MPLS 
> (label 100064, exp 7, ttl 255) (label 100192, exp 0, [S], ttl 
> 2), IP, length: 48 19:47:08.468853  In IP 10.0.0.1 > 
> 192.168.0.1: icmp 36: 10.0.0.1 udp port 33438 unreachable
> 
> Why do I get two responses? One TTL exceeded, and then the 
> final port unreachable - both sourced by 10.0.0.1? I would 
> have expected only the latter, not the former.
> 
> Theory:
> D receives the packet from C, with only the VPN label 100192 
> attached to it (C does penultimate hop popping of the outer 
> label). Instead of popping the VPN label and looking at the 
> destination IP of the packet (and thus detecting that the 
> traffic is for the local so-0/2/0 interface IP), D pops the 
> VPN label and blindly forwards the packet to the interface 
> associated with the VPN label 100192:
> 
> D> show route label 100192
>  
> mpls.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)
> + = Active Route, - = Last Active, * = Both
>  
> 100192             *[VPN/170] 16:29:39
>                     > via so-0/2/0.0, Pop
> 
> This results in the packet looping back as native, untagged 
> IP packet, and D then noticing that it's destined to itself 
> (so-0/2/0.0), and thus replying with the ICMP port unreachable.
> 
> If there would be an actual CE attached to so-0/2/0 with 
> 10.0.0.2 as IP address, and the traceroute would go to 
> 10.0.0.2, I wouldn't see this TTL exceeded response at all. I 
> guess the traceroute from A1 lo0.0 to 10.0.0.2 would then 
> look something like:
> 
> traceroute to 10.0.0.2 (10.0.0.2), 30 hops max, 40 byte 
> packets  1  10.0.0.2 (10.0.0.2)  0.798 ms  0.775 ms  0.750 ms
> 
> 
> Is this theory accurate?

I think So. Note that the remote CE will have trouble routing back the
response that is sourced from local PE's lo0.

> 
> I'll attach some real CE to D in the next few days and check facts.
> 
> 
> Thanks for your time!

NP. Glad to assist.

> 
> 
> Best regards,
> Daniel
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net 
> http://puck.nether.net/mailman/listinfo/junipe> r-nsp
>