[c-nsp] Bad routes in MPLS

Tony td_miles at yahoo.com
Tue Nov 19 18:45:28 EST 2013


Hi all,

We've been having an issue recently where we have routes on PE routers that look to be ok, but are not forwarding any traffic. Usually this can be resolved by doing "clear ip route vrf <vrf_name> <ip_prefix>" which causes the PE to re-learn the route and everything works again.

This problem appears to be triggered by routing flap/changes. This can either be within a vrf (ie. PE-CE, BGP or OSPF) or our core IGP (PE-PE, OSPF). When it is a PE-CE route change it seems to only affect routes within that vrf, but when it is an IGP topology change it can affect pretty much any vrf.

My initial though was that something is wrong in the stuff that is underlying the routing (eg. MPLS, LDP, CEF) and so I started looking, but can't see anything that is different from when it works and when it doesn't work. I've worked my way through most of "show ip bgp vpnv4", "show mpls forwarding-table", "show mpls ip binding", etc. You can also "work around the problem" by adding a more specific route, which would bypass the dodgy route in question (but not fix the route, so that when the more specific route is removed it goes back to not working). I've also found that sometimes clearing the route (clear ip route ...) doesn't work, in which case the best way I've found to resolve that is to create a static route exactly the same on the PE but with a destination of null0, then remove this route again. This again causes the bad route to be flushed and replaced with one that now works.

This has been happening for a couple of weeks, but was typically only affecting a prefix here or there and so we thought it just a bad quirk, but more recently we have had some occasions where it has caused widespread havoc affecting a good portion of routes on a single PE.

The boxes in question are 7609 with dual sup720-3B and a variety of card (mostly SIP-400 + SPA-GE & ES20+) running 12.2(33)SRD4. I'm aware the software is a bit out of date and we are looking to schedule a maintenance window for upgrading (at this stage a week away from now). The boxes are not heavily loaded from a CPU, memory or traffic perspective.

Any thoughts on where I might look to try and see an actual error (ie. something corrupt) or any other troubleshooting that could be done. I have opened a case with TAC (and as per recent thread, OMFG was that PAINFUL !) and will see where that goes. I imagine their first response might be upgrade/reboot but I'm hoping not and they might have some sensible suggestions.


Happy to post output of some of the troubleshooting I've already done, but thought that would just add a heap of extra information to this post and wuld just cause people's eye's to glaze over.

Thanks,
Tony.


More information about the cisco-nsp mailing list