[f-nsp] Multicast is being switched by LP CPU on MLXe?

Wilbur Smith wsmith at brocade.com
Tue Dec 20 15:31:25 EST 2016


Multicast is always loads of fun to debug! Not really though. 

So it looks like your upstream querier (router) for VLAN 450 is hanging off of e9/5, while the multicast source in on e11/12 and the subscriber to the multicast group is on e 7/1. Does this sound correct?

If you're seeing high LP usage and suspect cast traffic is being processed by the LP and not the hardware, this normally tells me the hardware tables are being repeatedly reprogramed or there is some table churn happening somewhere. It's normal to see a burst of high LP CPU in the instant the initial hardware programming takes place, but this should only last 1 second or so and the FPGA takes over.

I've dealt a lot with this in the past and here are some things that I've seen cause this:

1) A hardware defect on the LP. This is very rare and when this does happen on routers with supported code normally you would also see sys-mon alerts in the logs

2) A duplicate multicast group address or source IP is used for two different cast streams. Example would be for encoding devices both using the same group or even the same IP address on their interface. Even if they were in separate VLANs this would cause issues because it triggers the MLXe to constantly update the FID entry on the LP. I've seen this a few times in the past.

3) Duplicate IP or MAC on two subscribers. This is more common that you think with Linux host based  receivers or a semi-custom device that is manually programmed. Sometimes a sys-admin copies the ifcfg file between two hosts and forgets to delete the MAC from the file. I've also seen hardware based devices where the IP address and/or MAC is flashed on to the device and an older config is re-used causing a duplication.

4) Something going on in our code. If you have a LAG between the two routers, try disabling all but one of the ports if possibly. Bump the last port to make sure the table entries are reset and see it the issue still happens. If so, contact TAC because there may be something going on with cast FID programming on the LAG.
 
If possible, also try moving both the source and receiver to the same MLXe in a separate VLAN and just set that VLAN to 'multicast active'; make sure there's a VE with an IP on local VLAN though. 

With multicast active, the router provides the same IGMP messaging that's needed for IGMP Snooping or Multicast Passive to work, but without needing to run PIM. It can be a good way to confirm if this is an issue with PIM or a lower level issue. 

I can tell you that this does normally work very well on the MLX. I have some customers pushing 3.2 Terrabits of multicast on an MLXe-16 all with LPs in the 1-3% CPU range. The trick is to figure out what's causing table turn at the LP level and preventing the hardware table from being used.


Wilbur


________________________________________
From: foundry-nsp <foundry-nsp-bounces at puck.nether.net> on behalf of Alexander Shikoff <minotaur at crete.org.ua>
Sent: Tuesday, December 20, 2016 03:57 AM
To: Eldon Koyle
Cc: foundry-nsp
Subject: Re: [f-nsp] Multicast is being switched by LP CPU on MLXe?

Hi!

On Mon, Dec 19, 2016 at 04:14:28PM -0700, Eldon Koyle wrote:
> For IGMP snooping to work, there must be an L3 device acting as an
> IGMP querier on your L2 domain (typically a router).  This device is
> in charge of keeping track of which IGMP clients have asked for which
> multicast groups, and periodically asking if they still want it.  The
> MLX would not need to be the querier, but there has to be one in that
> VLAN.
>
> If there is no IGMP querier, your only real option would be to flood
> all the multicast (unless you are connecting a group of routers that
> are speaking PIM, then pim snooping might be able to help you).

There IS one querier:

telnet at lsr1-gdr.ki#show ip multicast vlan 450
----------+-----+---------+---------------+-----+-----+------
VLAN       State Mode      Active          Time (*, G)(S, G)
                           Querier         Query Count Count
----------+-----+---------+---------------+-----+-----+------
450        Ena   Passive   192.168.210.1   119   1     1
----------+-----+---------+---------------+-----+-----+------

Router ports: 12/8 (11s)

Flags-  R: Router Port,  V2|V3: IGMP Receiver,  P_G|P_SG: PIM Join

  1    (*, 239.32.4.130) 00:34:48       NumOIF: 1       profile: none
          Outgoing Interfaces:
               e9/5 vlan 450 ( V2) 00:34:48/40s

  1    (91.238.195.1, 239.32.4.130) in e11/2 vlan 450 00:34:48          NumOIF: 1  profile: none
          Outgoing Interfaces:
               TR(e9/5,e7/1) vlan 450 ( V2) 00:34:48/0s
          FID: 0xa0a9     MVID: None


--
MINO-RIPE
_______________________________________________
foundry-nsp mailing list
foundry-nsp at puck.nether.net
https://urldefense.proofpoint.com/v2/url?u=http-3A__puck.nether.net_mailman_listinfo_foundry-2Dnsp&d=DgICAg&c=IL_XqQWOjubgfqINi2jTzg&r=l86Fj-WC0GHHSCjQjuUvTzxOj0iW25AHL3VIC5Dog8o&m=Fq8aP4YieEZRBv4fbUkqtf-OXS8oD5s4dAMZCdABNeM&s=h8wpi4-zrQ_TvVCgKi38R_3w4FGktHATrHZU6vSuR6Q&e=


More information about the foundry-nsp mailing list