[f-nsp] High LP CPU After Upgrade 4001a to 54c Multicast

Kennedy, Joseph Joseph.Kennedy at purchase.edu
Tue Nov 5 01:21:32 EST 2013


The problem for us was so severe that both MLX MP’s were running at 99% CPU and the LP’s were flooding unicast.

After a lot of work testing in a lab environment looking for an issue in multicast routing that fit the symptoms(lol...no it wasn’t easy), I confirmed that the source of the
problem was in 5.2 and above (5.2.00 to 5.4.00d) and processing of IGMP reports. Brocade's code updated
mcache entries for every IGMP report even when a matching mcache OIF entry already existed.

All updates in a given IGMP query window in the problem code could be represented as O(M(N^2))
where M is the number of OIF's and N is the number of group members in a single
group. For example, in an environment with 100 OIF's and 300 group members this equates to 9,000,000
updates per IGMP query window. This is in relation to previous code releases where the
updates could be represented by O(MN) or given the same environment values as above, 30,000 updates per
query window.

Many may not have noticed the issue because they don’t have a large number of OIF’s or large number of group members in a single group.
Some may have run into this previously and just filtered the UPnP/SSDP IPv4 group (239.255.255.250) to resolve it. If you are running PIM-SM,
have upgraded to 5.2.00 or above and afterwards noted periods of abnormally high MP/LP CPU, or you attempted the upgrade
but had to revert due to high MP CPU usage and unicast flooding (as we were seeing) then this may be
the root of your issue.

After reporting the problem to Brocade they provided a fix build and incorporated the fix into 5.4.00e. This problem "should be" resolved in 5.4.00e.
The problem is not specific to running PIM-SM with VRF’s.

Related closed defect information from 5.4.00e:

Defect ID: DEFECT000468056
Technical Severity: Medium
Summary: High MP CPU utilization from IGMP reports after upgrade
Symptom: After upgrading from 4.x to 5.4,high CPU utilization from IGMP reports in VRF
Feature: IPv4-MC PIM-SM Routing
Function: PERFORMANCE
Reported In Release: NI 05.4.00

--JK

We have seen issues when our MLXes receive multicast traffic for which
there have been no IGMP join messages sent (on edge ports).  I'm
assuming that not getting any PIM joins would have the same effect.
There are some applications that do not send IGMP messages if they
expect their traffic to remain on the same L2 domain.  Apparently if the
MLX doesn't have an entry for it, it punts it to the LP CPU.

To get an idea of which traffic is hitting the CPU, you can connect to
the LP (rconsole <slot_number>, then enable) and run 'debug packet
capture'.  That will show you a few packets as they hit the LP CPU, and
should at least tell you the source IP, interface, and multicast group
for the offending traffic.

HTH,

--
Eldon Koyle
--
BOFH excuse #319:
Your computer hasn't been returning all the bits it gets from the Internet.

On  Jun 03 10:32-0400, Walter Meyer wrote:
> We are seeing high CPU on our LPs after upgrading from 4001a to 54c on two
> MLXs.
>
> We are using PIM-SM and the mcast process is using a large amount of LP
> CPU, but only after the upgrade. We are stable on the same config prior to
> the upgrade. Also, the MLX that is the RP for networks with a large number
> of multicast streams is the one that has a high CPU. The other core doesn't
> have an issue (aside from being unstable because of the other MLX with high
> CPU). We are pretty sure it has something to do with multicast routing we
> just can't figure out why.
>
> We do have a large number of group/OIF entries spanning multiple physical
> ints and ves, but this shouldn't be an issue because of the OIF
> optimization feature on the platform...right? On 4001a and 54c we have a
> shareabilitiy coefficient / optimization of 98%...So it doesn't seem like a
> resource problem...But we can't figure out why the traffic is hitting CPU.
>
> Has anyone seen mcast problems after upgrading or have any troubleshooting
> tips?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/foundry-nsp/attachments/20131105/f3d3866a/attachment.html>


More information about the foundry-nsp mailing list