[c-nsp] mpls ip -> lost packets

Tue Jun 22 08:37:24 EDT 2010

> You have "mpls ip" on both ends, right? Not that it should disrupt
> traffic like this though.

Doesn't work otherwise. :)

> > Shortly thereafter, TCP connections from clients off device B to vendor
> > off device A running via VLAN 300 started dropping packets. Not
> > consistently, but in bursts every few seconds.
> Does the switch say anything about drops? E.g. "show queueing interface
> GiX/Y" and "show interface GiX/Y | incl drop" on the physical interface
> carrying VLAN 300.

Nada. Switch reports clean. 

> Is there any system in the drops? Only large packets? Or grouped
> drops?

It's packets that are being NATted by dev B. Anything else is fine, labeled or unlabeled. (The NAT is in the global domain, edge handoff to a vendor, so they are all egress points from the mpls mesh.) 

>From doing a bunch of SPAN sessions and sniffing, it appears that it is device B (the sup32) that is doing the dropping. It's hard to be entirely sure because it's hard to pull it out of the tcpdumps on the WAN link - all of the traffic flow out of dev B to dev A is mpls-encapsulated tag 275 (tag is associated with a summarizing /16 static route on dev A pointing out an L3 portchannel into the datacenter core), and tcpdump will filter on tags but tags-then-contents... not mastered that yet. 

One can argue, based on my imperfect reading of Luc De Ghein's book, that the above-mentioned traffic in tag 275 really shouldn't be encapsulated at all - since dev A is the egress LSR, dev B would be the penultimate and should be popping the label (indeed it shouldn't be labeled at all, I suppose). 

I think I understand _why_ it's doing it - the summary route isn't a true summary, it's a static /16 on dev A, so it's redisted into EIGRP, so it gets a label. With a bit of work I can change this structure and advertise a /16 via EIGRP into dev A. 

Then same might be argued the other way - dev B is really the egress LSR for the NATted traffic, so dev A shouldn't even bother labeling it. 

Or am I misreading something? 

And why would it matter anyway?

(dev A and dev B are merely two hops of a larger mesh of 6500s, slowly having mpls implemented on them. I'm indifferent about whether the traffic between them is tagged or not; the point of the exercise is to be able to create TE paths and VPNs across a multi-hop fiber mesh. But the basics still have to work, too.)