[c-nsp] HEADS UP: vlan_mgr crashing in NX-OS 5.2(3)

Bernhard Schmidt berni at birkenwald.de
Tue Dec 13 10:05:39 EST 2011


Bernhard Schmidt <berni at birkenwald.de> wrote:

>> just a quick heads up, maybe someone is hitting that, too. Since
>> upgrading our test Nexus 7000 from 5.2(1) to 5.2(3) this morning we have
>> a failover due to a crashing vlan_mgr process every hour. It turns out
>> "sh vlan" (which is executed by RANCID every hour) reliably kills the
>> box.
>
> Update, after having forced a failover several times by just sshing to
> the box and executing "show vlan" it did a full crash of both sup and a
> cold reboot. Took us out for about 15 Minutes, details are still
> pending.

For us 5.2(3) seems to be a real brown paperbag release.

Dec 13 12:32:00 %KERN-2-SYSTEM_MSG: node=4 sap=22856 rq
=0(0) lq=0(0) pq=0(0) nq=0(0) sq=0(0) buf_in_transit=8956,
bytes_in_transit=14259024 - kernel
Dec 13 12:32:00 %KERN-2-SYSTEM_MSG: node=4 sap=1 rq=449
19(77590214) lq=0(0) pq=0(0) nq=0(0) sq=0(0) buf_in_transit=0,
bytes_in_transit=0 - kernel
Dec 13 12:32:00 %KERN-2-SYSTEM_MSG: node=4 sap=28 rq=0(
0) lq=0(0) pq=0(0) nq=0(0) sq=0(0) buf_in_transit=8956,
bytes_in_transit=18915072 - kernel

(about one hour uptime after the full reload)

30 seconds later all iBGP sessions started to flap with "holdtimer
expired error" and are still flapping. Fortunately the box is out of
production.

Bernhard



More information about the cisco-nsp mailing list