<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="page" title="Page 33">
<div class="section">
<div class="layoutArea">
<div class="column">
<div>
<div style="margin: 0in 0in 0.0001pt;">The problem for us was so severe that both MLX MP’s were running at 99% CPU and the LP’s were flooding unicast.</div>
<div style="margin: 0in 0in 0.0001pt;"><br>
</div>
<div style="margin: 0in 0in 0.0001pt;">After a lot of work testing in a lab environment looking for an issue in multicast routing that fit the symptoms(lol...no it wasn’t easy), I confirmed that the source of the<o:p></o:p></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt;">problem was in 5.2 and above (5.2.00 to 5.4.00d) and processing of IGMP reports. Brocade's code updated</div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt;"><o:p></o:p></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt;">mcache entries for every IGMP report even when a matching mcache OIF entry already existed.</div>
</div>
<div style="margin: 0in 0in 0.0001pt;"><br>
</div>
<div>
<div>
<div style="margin: 0in 0in 0.0001pt;">All updates in a given IGMP query window in the problem code could be represented as O(M(N^2))<o:p></o:p></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt;">where M is the number of OIF's and N is the number of group members in a single</div>
<div style="margin: 0in 0in 0.0001pt;">group. For example, in an environment with 100 OIF's and 300 group members this equates to 9,000,000</div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt;"><o:p></o:p></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt;">updates per IGMP query window. This is in relation to previous code releases where the<o:p></o:p></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt;">updates could be represented by O(MN) or given the same environment values as above, 30,000 updates per<o:p></o:p></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt;">query window.</div>
</div>
</div>
<div style="margin: 0in 0in 0.0001pt;"><br>
</div>
<div style="margin: 0in 0in 0.0001pt;">
<div style="margin: 0in 0in 0.0001pt;">Many may not have noticed the issue because they don’t have a large number of OIF’s or large number of group members in a single group.</div>
<div style="margin: 0in 0in 0.0001pt;">Some may have run into this previously and just filtered the UPnP/SSDP IPv4 group (239.255.255.250) to resolve it. If you are running PIM-SM,</div>
<div style="margin: 0in 0in 0.0001pt;">have upgraded to 5.2.00 or above and afterwards noted periods of abnormally high MP/LP CPU, or you attempted the upgrade</div>
<div style="margin: 0in 0in 0.0001pt;">but had to revert due to high MP CPU usage and unicast flooding (as we were seeing) then this may be</div>
<div style="margin: 0in 0in 0.0001pt;">the root of your issue. </div>
<div style="margin: 0in 0in 0.0001pt;"><br>
</div>
<div style="margin: 0in 0in 0.0001pt;">After reporting the problem to Brocade they provided a fix build and incorporated the fix into
<b>5.4.00e</b>. This problem "<i>should be"</i> resolved in <b>5.4.00e</b>.</div>
<div style="margin: 0in 0in 0.0001pt;">The problem is not specific to running PIM-SM with VRF’s.</div>
<div style="margin: 0in 0in 0.0001pt;"><br>
</div>
<div style="margin: 0in 0in 0.0001pt;">Related closed defect information from <b>
5.4.00e</b>:</div>
<div style="margin: 0in 0in 0.0001pt;"><br>
</div>
<div style="margin: 0in 0in 0.0001pt;">Defect ID: DEFECT000468056</div>
<div style="margin: 0in 0in 0.0001pt;">Technical Severity: Medium</div>
<div style="margin: 0in 0in 0.0001pt;">Summary: High MP CPU utilization from IGMP reports after upgrade</div>
<div style="margin: 0in 0in 0.0001pt;">Symptom: After upgrading from 4.x to 5.4,high CPU utilization from IGMP reports in VRF</div>
<div style="margin: 0in 0in 0.0001pt;">Feature: IPv4-MC PIM-SM Routing</div>
<div style="margin: 0in 0in 0.0001pt;">
<div style="margin: 0in 0in 0.0001pt;">Function: PERFORMANCE</div>
</div>
<div style="margin: 0in 0in 0.0001pt;">Reported In Release: NI 05.4.00 </div>
<div><br>
</div>
<div>--JK</div>
</div>
<div>
<pre style="white-space: pre-wrap;"></pre>
<blockquote type="cite">
<pre style="white-space: pre-wrap;">We have seen issues when our MLXes receive multicast traffic for which
there have been no IGMP join messages sent (on edge ports). I'm
assuming that not getting any PIM joins would have the same effect.
There are some applications that do not send IGMP messages if they
expect their traffic to remain on the same L2 domain. Apparently if the
MLX doesn't have an entry for it, it punts it to the LP CPU.
To get an idea of which traffic is hitting the CPU, you can connect to
the LP (rconsole <slot_number>, then enable) and run 'debug packet
capture'. That will show you a few packets as they hit the LP CPU, and
should at least tell you the source IP, interface, and multicast group
for the offending traffic.
HTH,
--
Eldon Koyle
--
BOFH excuse #319:
Your computer hasn't been returning all the bits it gets from the Internet.
On Jun 03 10:32-0400, Walter Meyer wrote:
><i> We are seeing high CPU on our LPs after upgrading from 4001a to 54c on two
</i>><i> MLXs.
</i>><i>
</i>><i> We are using PIM-SM and the mcast process is using a large amount of LP
</i>><i> CPU, but only after the upgrade. We are stable on the same config prior to
</i>><i> the upgrade. Also, the MLX that is the RP for networks with a large number
</i>><i> of multicast streams is the one that has a high CPU. The other core doesn't
</i>><i> have an issue (aside from being unstable because of the other MLX with high
</i>><i> CPU). We are pretty sure it has something to do with multicast routing we
</i>><i> just can't figure out why.
</i>><i>
</i>><i> We do have a large number of group/OIF entries spanning multiple physical
</i>><i> ints and ves, but this shouldn't be an issue because of the OIF
</i>><i> optimization feature on the platform...right? On 4001a and 54c we have a
</i>><i> shareabilitiy coefficient / optimization of 98%...So it doesn't seem like a
</i>><i> resource problem...But we can't figure out why the traffic is hitting CPU.
</i>><i>
</i>><i> Has anyone seen mcast problems after upgrading or have any troubleshooting
</i>><i> tips?</i></pre>
</blockquote>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>