[f-nsp] NI-MLX-10Gx4 shown as "Invalid Module" maybe due to very old code on lp while chassis is on 5.3

Wilbur Smith wsmith at brocade.com
Tue Jul 30 14:51:53 EDT 2013


Gunther,
Just getting a chance to respond. My personal recommendation is that a downgrade won't fix this issue. I've never had a case where I can't log into an LP with the rcon command and you cant see any of the boot process of the line module over the rcon connection. I think this is an actual issue with the hardware module and not just getting the right code on the box.

You could try downgrading an entire chassis to a previous version, but you may have the same problem if the card is still not recognized. At this point, I would look into an RMA and get a replacement module.

Wilbur

From: Gunther Stammwitz [mailto:gstammw at gmx.net]
Sent: Thursday, July 25, 2013 2:25 PM
To: Wilbur Smith; foundry-nsp at puck.nether.net
Subject: AW: [f-nsp] NI-MLX-10Gx4 shown as "Invalid Module" maybe due to very old code on lp while chassis is on 5.3

Wilbur,

thanks for your effort, I'm sure we will solve this mystery :)

I have now plugged the module in another MLX that runs 5.2 and the symptoms are the same: Whatever I do I cannot see anything on the rconsole.
There has never been a module in slot 4 before and the config is empty for this slot:
#show running-config | inc module
module 1 ni-mlx-20-port-1g-100fx

As you said I tried it from SSH as well as from the serial console of the MP:
#rconsole 4
Remote connection to LP slot 4 established
Press CTRL-X or type 'exit' to disconnect it
(and nothing happens)

When I try a "#lp boot system interactive 4" nothing happens on rconsole, only the log says "Module 4 is reset by mgmt (reason: CLI command)"
Powering off and on again doesn't help also "lp sync 4" it fails:
LP SYNC for Slot4: monitor image sync is timed out.
LP SYNC for Slot4: primary image sync is timed out.
LP SYNC for Slot4: No LP secondary image in MP's flash.

The rconsole shows "Remote connection terminated" and when I reconnect once again it is still stuck.


#show mod
S4: Invalid Module  CARD_STATE_REBOOT   0        000c.dbf5.9d90

I really hope that you have some sort of magic that will fix this situation? Come on ;-)

Well and if not this will be a lengthy downgrading-procedure, won't it?
When looking in kp.brocade.com the first firmware is 0300 from 2006. We have 03500 from 2008 or 03800 from 2009.
Does brocade have a downgrade-matrix or do I need to read through all of these release-notes?? Maybe there's some sort of thumb rule?

Thank you once again.

Gunther



Von: Wilbur Smith [mailto:wsmith at brocade.com]
Gesendet: Donnerstag, 25. Juli 2013 19:07
An: Gunther Stammwitz; foundry-nsp at puck.nether.net<mailto:foundry-nsp at puck.nether.net>
Betreff: RE: [f-nsp] NI-MLX-10Gx4 shown as "Invalid Module" maybe due to very old code on lp while chassis is on 5.3

Gunther,
Sorry, I was pulled into a project, so I'm just getting a chance to reply. So far, it looks like you have done everything correctly. I read through some of the other comments, so I'm also going to recommend you remove the card type you specified manually through CLI; let's make sure that when you run a "show muldule" the entry for this module is completely blank. You may need to unseat the module to be able to remove the entry.

Next, ran you rcon to that module and tell me exactly what you? If the module is stuck in a reboot loop, you should see the entire boot process through the console, including the point where the local app code on the LP detects a problem and triggers the reboot. If you can't see any output after performing the rcon to the LP, then we have a larger issue. If this doesn't seem to work over SSH or Telnet, try a direct connection to the serial console port on the active MP.

If you are getting output from rcon, then you can hit "b" right after the LP reboot to break into the module. From here we can delete the Primary and Secondary flash manually, or force a boot to interactive mode from within the LP.

Let me know what output you are seeing and I'll try to help you work through it.

Wilbur

From: Gunther Stammwitz [mailto:gstammw at gmx.net]
Sent: Wednesday, July 24, 2013 3:22 PM
To: Wilbur Smith; foundry-nsp at puck.nether.net<mailto:foundry-nsp at puck.nether.net>
Subject: AW: [f-nsp] NI-MLX-10Gx4 shown as "Invalid Module" maybe due to very old code on lp while chassis is on 5.3

Hello Wilbur,

thank you very much for your reply.

First of all I have now removed the manual module-type-setting for slot 4, unplugged the lp from slot 4 and reinserted it.
Now the MLX says: S4: Invalid Module  CARD_STATE_REBOOT   0        000c.dbe1.2f90

This is where I tried "lp boot system interactive 4" which is being confirmed in syslog with "cr1.fra1.mainlab.net Module 4 is reset by mgmt (reason: CLI command)".
Unfortunately the lp stays in "CARD_STATE_REBOOT". The same applies then powering off and on again. Interactive-boot mode didn't help :(

My software - including mbridge - is up to date:
#show ver
Boot     : Version 5.3.0T165 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
Monitor  : Version 5.3.0T165 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
IronWare : Version 5.3.0eT163 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
Board ID : 00 MBRIDGE Revision : 37



Do you have any other idea how I can get the module out of the reboot-loop and access is?

Your help is being appreciated - thanks in advance.
Gunther



Von: Wilbur Smith [mailto:wsmith at brocade.com]
Gesendet: Mittwoch, 24. Juli 2013 22:30
An: Gunther Stammwitz; foundry-nsp at puck.nether.net<mailto:foundry-nsp at puck.nether.net>
Betreff: RE: [f-nsp] NI-MLX-10Gx4 shown as "Invalid Module" maybe due to very old code on lp whlile chassis is on 5.3

Gunther,
Usually when this happens to me, I can still force the LP to boot into "interactive mode" with the command I mentioned  and then manually powering on and off the LP ( power-off lp 4 , power-on lp 4). Since the MP's access to the LP uses a separate  out-of-band link, the MP should still be able to upgrade the LP if it is in interactive mode.

The MP uses a separate FPGA image to allow it to talk to the chassis backplane; we call this the MBRIDGE image. I would make sure that the MBRIDGE image is the recommended release for your version of code and update if needed. The required version is listed in the release notes for the specific release of code (just search for mbridge). In rare cases, a patch release may need a newer FPGA or MBRIDGE image to fix a problem, so make sure you check the release notes for the exact release (5.3.0 vs. 5.3.0C).

The error you are seeing when you tried to push the fpga code to the LP is triggered because that LP is stuck in reboot. The MP can't communicate with that LP to see if it needs the newer version; we need to stop the reboot loop before we can update the LP. I connected to one of my MLX routers in a lab and double-checked the command you will need. Make sure you are using this command to reboot a module in slot 4:

SSH at EA_CORE-1#lp boot system interactive 4

Let me know if this helps with your issue.

Wilbur

From: foundry-nsp [mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of Gunther Stammwitz
Sent: Tuesday, July 23, 2013 11:46 AM
To: foundry-nsp at puck.nether.net<mailto:foundry-nsp at puck.nether.net>
Subject: [f-nsp] NI-MLX-10Gx4 shown as "Invalid Module" maybe due to very old code on lp whlile chassis is on 5.3

Hello colleagues,

I have more or less a similar problem. I have pushed a NI-MLX-10gX4 into  a chassis running 5300e and it didn't work: The module in slot 4 is not even being recognized: it is an "invalid module".

Most probably this is due to very old code on the lp while the chassis runs a newer code.

#show mod
        Module                             Status      Ports  Starting MAC
M1 (left): NI-MLX-MR Management Module     Active
M2 (right):
F1: NI-X-SF  Switch Fabric Module         Active
F2: NI-X-SF  Switch Fabric Module         Active
F3:
S4: Invalid Module  CARD_STATE_REBOOT   0        xxx
(S4: Configured as NI-MLX-10Gx4 4-port 10GbE Module)

I can neither access it with "rconsole 4" nor does "boot lp boot system interactive 4" help.
Power-off and power-on didn't help.

Wilbur wrote that MP and LP cannot communicate and that the FPGA needs to be upgraded. I tried to do so, but "copy tftp lp 1.2.3.4 lpfpga05300e.bin fpga-all 4" didn't work.
Copying FPGA images to the applicable slot(s), this may take several minutes... --> No FPGA image to be copied. :(

Using copy tftp lp with individual-fpga-images doesn't work either, the system always claims that the fpga doesn't match.


Any idea how to get a LP with very old code working in a chassis running 5.3-code?

You help is being appreciated.

Kind regards
Gunther



Von: foundry-nsp [mailto:foundry-nsp-bounces at puck.nether.net] Im Auftrag von Wilbur Smith
Gesendet: Freitag, 5. Juli 2013 00:53
An: Jeroen Wunnink | Atrato IP Networks; foundry-nsp at puck.nether.net<mailto:foundry-nsp at puck.nether.net>
Betreff: Re: [f-nsp] NI-MLX-10Gx8-D falls to CARD_STATE_INTERACTIVE mode

Hello Folks,
Using auto lp-syc is a good improvement over the previous methods of upgrading a line card, but there are times it may not do the trick. Because the management modules also use an FPGA (the mbridge) to communicate with the backplane, you can run into a situation where the older module's FPGA image cannot talk to the management module. The LP and management modules have their own separate out-of-band network that bypasses the backplane, so auto-lp will attempt to stop the reboot cycle and use this connection to push the correct code to a misbehaving LP. Unfortunately, this may not work in every situation, so you may need to manually upgrade the code.

If you have a linecard constantly rebooting, you should use "boot lp 3 interactive" from Enable mode to force the card to start in interactive mode (this will stop the reboot loop). You can then push the correct software and fpga image to the card. I tend to use the older "copy tftp image" method because it will automatically copy the correct IronWare LP code and you can target a specific LP with it vs. the whole chassis. I also use "copy tftp lp .... Fpga-all lp 3" to automatically sync the correct FPGA code to a specific module.

Alternately, you can use the new "copy tftp system" method to automatically up grade all code and fpga images on an LP. The command should only push to a module that needs to be upgraded, but I would make sure you have an outage window if you do this on a live system.

There is some good guides on doing this at the Brocade Communities site and the software upgrade guide for each version of IronWare has details notes on how to use the methods I mentioned and how to fix common problems when things go wrong. Check my.brocade.com for more info.

Wilbur


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/foundry-nsp/attachments/20130730/b4330f4c/attachment.html>


More information about the foundry-nsp mailing list