[nsp] 6500 multicast issues

Colin Byelong c.byelong at ucl.ac.uk
Tue May 25 11:15:33 EDT 2004


Folks,

I have been trawling though last years archives, does anyone know if the 
following was resolved as we are seeing this here, where hw switching of 
multicast stops.

thanks

Colin
 >> We've found similar MMLS brokenness with Sup2/MSFC2/PFC2 running
  >> hybrid 6.3(10) / 12.1(13)E3 (we just switched it off to avoid it).
  >
  >     We were using exactly the same CatOS/IOS combination. From Cisco
  >     support the only recommendation was to upgrade the sw. Well, the
  >     sw upgrade did solve a nasty issue with default route not learned
  >     via OSPF after supervisor switchover.
  >
  >     I'm planning to convert the box to IOS which might help. We do
  >     have 6509s running IOS and they don't have the same problem.

We've been having problems with MMLS hardware forwarding of multicast
packets across CatOS-based SUP2 boxes which we finally traced to an
intermittent failure of egress multicast forwarding on certain ports
on the SUP2 boxes that are connected p2p to other multicast-enabled
routers.

Our local SEs have been working with us, and they found an
undocumented IGMP rate limiting command in the SUP2 under CatOS. You
can see it with the CatOS command:
"show igmp ratelimit info".

Cisco appears to have extended this IGMP rate limiter to also rate
limit relatively low levels of multicast control plane traffic,
including PIM v2 Hellos. I suspect that this is being done as part of
their efforts to protect the control plane from floods of packets
launched in DDoS attacks by cracked hosts.

The default limit in the IGMP ratelimiter is 100 packets seen in 30
seconds. On our core boxes the PIM v2 rate limit level had somehow got
set to 30 packets in 30 seconds, after which the ratelimiter would
block the PIM v2 Hellos on some (all?) ports. (Since the command is
undocumented it's hard to tell how it is supposed to work.)

None of us recall setting the IGMP rate limiter lower than 100 for PIM
Hello packets, so where that came from is a mystery.

We currently have 40 PIM-enabled routers connected as a dual fiber
optic star backbone, with two SUP2 CatOS boxes in the core. Because of
the IGMP rate limit function, we saw random and intermittent multicast
forwarding failures when the rate limiter would trigger.

The loss of PIM v2 Hellos on the egress ports would cause those ports
to drop out of the list of multicast routers seen in the CatOS
command:
"show multicast router"

Once the port is no longer marked as a multicast router, then IP
multicast packets will not be forwarded on it until the ratelimiter
times out and restores operation.

You can generate a trace of IGMP rate limiter activity with the CatOS
commands:
"set trace monitor enable"
"set trace mcast 5"

Be warned that this can generate a lot of output, and might overload a
busy box.

You can disable the trace by setting it to zero:
"set trace mcast 0"

The "fix" for our problem was to set the PIM Hello rate limiting to
1000 packets in 30 seconds, to avoid any rate limiting of the PIM
control plane packets.

Note that the help display for this command incorrectly
states that it sets the interval, when in fact it sets the packet
count:
++++++++++++++++++++++++++++++++++++++++
Lab-Cat6k-SUP2> (enable) set igmp ratelimit pimv2
        set igmp ratelimit <dvmrp|general-query|mospf1|mospf2|pimv2> <interval>
        (interval = 1..65535 seconds)

Lab-Cat6k-SUP2> (enable) set igmp ratelimit pimv2 1000
PIMV2 Watermark set to allow 1000 messages in 30 seconds

Lab-Cat6k-SUP2> (enable) sh igmp ratelimit-info
IGMP Ratelimiting: No of messages allowed in 30 seconds
-------------------------------------------------------
Igmp General Queries : 100
Dvmrp Probes         : 100
Mospf1 Hellos        : 100
Mospf2 Hellos        : 100
PimV2 Hellos         : 1000
++++++++++++++++++++++++++++++++++++++++

Since doing this we haven't seen any multicast forwarding failures in
our tests.

BTW, the IGMP rate limiting feature does not appear to exist under
CatIOS, so converting switches to CatIOS would avoid this problem.

I am now sending Cisco an updated version of my standard rant about
their need to:

1. provide a config switch to expose all default settings in CatOS and
CatIOS. "show defaults all"

2. extensively document any and all features that can cause control
plane or data plane packet forwarding to stop. I'm tired of finding
that the network is broken and that Cisco has caused it with a feature
that they have not described or documented in any detail.

Chief among these that we have been able to discover (the hard way)
has been "errdisable" which we have come to hate. But now "igmp
ratelimiting" has joined the set. One wonders how many others there
are...

3. provide customers with a switch to shut off _all_ uneccesary
features by default, (i.e. those not required for vanilla
IPv4/Ethernet packet forwarding) which would allow expert users to
enable only those features they want (understanding those features
would require the first two points to be in effect).

-Charles

Charles E. Spurgeon / UTnet
UT Austin ITS / Networking
<http://puck.nether.net/mailman/listinfo/cisco-nsp>c.spurgeon at 
its.utexas.edu / 512.475.9265



-----------------------------------------------------------------------

Colin Byelong                             Email: C.Byelong at ucl.ac.uk
Network Group
Information Systems Division
University College London
Gower Street                              Phone: 020 7679-2572
London WC1E 6BT
------------------------------------------------------------------------ 



More information about the cisco-nsp mailing list