[j-nsp] Weird traceroute across MPLS core using labeled-unicast IBGP

Mon Mar 8 14:25:23 EST 2004

On Mon, Mar 08, 2004 at 10:37:34AM -0800, harry wrote:
> I will investigate a PR on the monitor traffic. Does seem broke. I
> will let you know.

Thanks.

> 1. If the traffic is sourced from the CE, then monitor traffic does not
> function on PE/P routers (as you know.) On traffic from/to the CE there is
> no MPLS/VPN encapsulation, again as I am sure that you know.

ACK.

> 2. If you ping/traceroute from the local PE, say PE-1, to the remote VRF
> interface or CE, then the path of the ICMP traffic is from the local PE to
> remote PE via mpls forwarding. When the remote PE/CE responds to the ping or
> traceroute the returning traffic, even though addressed to the VRF interface
> on the local PE, is *not* initially processed by the local PE. Instead, the
> local PE pops the label and forwards the traffic to the local CE, which then
> routes the traffic back to the local PE where the response traffic is
> detected.

Uhm, when I do a ping/trace from PE-1 (local) to PE-2's CE facing
interface IP, the return traffic (ICMP echo reply [ping] or Port
Unreachable [traceroute]) does _not_ come MPLS-encapsulated but by
native forwarding. I don't see how any CE is involved at all.
And given that the ping/traceroute source is PE-1's loopback IP, I
don't see any reason that PE-1 forwards the response to a CE-1, given
that it's terminating the traffic itself - and the traffic not carrying
any label anyway.

> Note that the traffic coming back from the remote PE/CE is handled
> as transit traffic so tcpdump at the local PE core-facing interface will not
> display the response traffic.

Why transit traffic? In case of ping/traceroute originating on PE-1,
it shouldn't be transit traffic. In case of originating on CE-1, it's
clear. VPN label gets popped by the PFE and packet forwarded to CE-1.
RE doesn't see the packet at all.

> 3. My testing shows that ping/traceroute traffic originated by the
> local PE can be monitored on the core-facing interface.

Indeed. I'm not sure why I didn't see it in the first place, but now
I'm clearly observing it.

> Note the absence of response traffic in the captures, despite the
> test succeeding.
> 
> My Set up is:
> 
> 
> CE	PE	  p	    PE	CE
> hk-----sj-----de-------mo-----am
> 	     ^
> 	so-0/1/1			192.168.24.1

And you are pinging from PE sj, or from CE hk?

My setup here:

PE     P      P      PE
A1-----B------C------D

 A1  lo0.0 192.168.0.1
 | ge-0/0/0.100
 |
 |
 | ge-0/0/0.100
 B   lo0.0 192.168.0.2
 | ge-1/0/0
 |
 |
 | ge-0/0/0
 C   lo0.0 192.168.0.4
 | ge-0/1/0
 |
 |
 | ge-0/0/0
 D   lo0.0 192.168.0.5
 | so-0/2/0 10.0.0.1/30  with sonet-options loopback local

I'm now pinging from A1 lo0.0 to 10.0.0.1, and monitoring
traffic on ge-0/0/0.100 on A1:

ping:
=====
19:44:22.140159 Out VID [0: 100] MPLS (label 100064, exp 7, ttl 255)
(label 100192, exp 0, [S], ttl 255), IP, length: 92
19:44:22.159164  In IP 10.0.0.1 > 192.168.0.1: icmp 64: echo reply

ICMP echo request goes out with double push, ICMP echo reply from
D is received via normal IP forwarding.

traceroute:
===========
(only first probe per hop shown)
19:47:08.461990 Out VID [0: 100] MPLS (label 100064, exp 7, ttl 255)
(label 100192, exp 0, [S], ttl 1), IP, length: 48
19:47:08.462772  In IP 10.0.0.1 > 192.168.0.1: icmp 148: time exceeded
in-transit
19:47:08.468193 Out VID [0: 100] MPLS (label 100064, exp 7, ttl 255)
(label 100192, exp 0, [S], ttl 2), IP, length: 48
19:47:08.468853  In IP 10.0.0.1 > 192.168.0.1: icmp 36: 10.0.0.1 udp
port 33438 unreachable

Why do I get two responses? One TTL exceeded, and then the final
port unreachable - both sourced by 10.0.0.1? I would have expected
only the latter, not the former.

Theory:
D receives the packet from C, with only the VPN label 100192 attached
to it (C does penultimate hop popping of the outer label). Instead
of popping the VPN label and looking at the destination IP of the
packet (and thus detecting that the traffic is for the local so-0/2/0
interface IP), D pops the VPN label and blindly forwards the packet
to the interface associated with the VPN label 100192:

D> show route label 100192

mpls.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100192             *[VPN/170] 16:29:39
                    > via so-0/2/0.0, Pop

This results in the packet looping back as native, untagged IP packet,
and D then noticing that it's destined to itself (so-0/2/0.0), and
thus replying with the ICMP port unreachable.

If there would be an actual CE attached to so-0/2/0 with 10.0.0.2
as IP address, and the traceroute would go to 10.0.0.2, I wouldn't
see this TTL exceeded response at all. I guess the traceroute from
A1 lo0.0 to 10.0.0.2 would then look something like:

traceroute to 10.0.0.2 (10.0.0.2), 30 hops max, 40 byte packets
 1  10.0.0.2 (10.0.0.2)  0.798 ms  0.775 ms  0.750 ms

Is this theory accurate?

I'll attach some real CE to D in the next few days and check facts.

Thanks for your time!

Best regards,
Daniel