[j-nsp] CFEB restart due to incorrect temperature reading ?

Alexandre Snarskii snar at snar.spb.ru
Wed Jul 14 11:18:16 EDT 2010


Hi!

Some days ago one of our M10i's CFEB restarted with strange diagnostics - 
in the same second when BFD sessions get down there were following
line in chassisd log: 

Jul  2 14:03:08 CFEB 0 temperature is -60 degrees C, which is outside operating 
range

(not sure if it appeared before or after BFD failures though, 
these lines found in different log files. BFD messages timestamped
as 14:03:08.695, and no messages logged to remote syslog server since
14:01, so it's quite possible that CFEB failed first). 

Then there are lots of messages about CFEB not running/FPC restarts
and so on messages, ends with CFEB restart and finally everything
stabilised: 


Jul  2 14:03:12.663 2010  alarmd[1278]: Alarm set: CFEB color=RED, class=CHASSIS, reason=CFEB not online, the box is not forwarding
Jul  2 14:03:12.672 2010  /kernel: rdp keepalive expired, connection dropped - src 0x00000001:1021 dest 0x00000002:2048
Jul  2 14:03:12.663 2010  chassisd[1277]: CHASSISD_SHUTDOWN_NOTICE: Shutdown reason: CFEB connection lost
Jul  2 14:03:12.669 2010  craftd[1279]:  Major alarm set, CFEB not online, the box is not forwarding

There are no coredumps and no logs about reason of restart, 
just power-on notice in internal CFEB log: 

[0+00:00:00.255 LOG: Info] CSBR: Reset reason (0x4): Power On
[0+00:00:00.255 LOG: Info] On-board NVRAM contains diagnostic information.

and last logs in NVRAM is just 

CSBR: Reset reason (0x4): Power On

Routing engine was not reset during this failure, and temperature 
readings shows just normal datacenter temperature of 22C/71F...

Any ideas on what had happened ? 



More information about the juniper-nsp mailing list