[j-nsp] MX960 transient errors on high capacity AC power supplies
Karl Gerhard
karl_gerh at gmx.at
Tue Sep 4 03:38:33 EDT 2018
Hello,
we have bought two Juniper MX960 and we're having serious trouble with power supplies triggering alarms and then clearing alarms a few seconds later:
2x RE-S-X6-64G
3x SCBE-2-MX
MX960 Premium 3 chassis
4x High Capacty AC PEMs
$ show log messages | match alarmd
Aug 30 08:09:02 router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug 30 08:09:12 router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug 30 08:12:30 router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug 30 08:12:35 router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug 30 08:14:53 router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug 30 08:14:58 router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
$ show log messages
Aug 31 06:12:33 router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
Aug 31 06:13:29 router1 kernel: %KERN-3: PCF8584(RD): target ack timeout
Aug 31 06:13:29 router1 kernel: %KERN-3: PCF8584(RD): (i2c_s1=0x08, group=0x3, device=0x51)
Aug 31 06:13:29 router1 kernel: %KERN-3: PCF8584(WR): target ack failure on byte 0
Aug 31 06:13:29 router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
Aug 31 06:13:50 router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug 31 06:13:50 router1 craftd[12162]: %DAEMON-4: Major alarm set, PEM 1 Not OK
Aug 31 06:13:50 router1 kernel: %KERN-3: PCF8584(WR): target ack failure on byte 0
Aug 31 06:13:50 router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
Aug 31 06:13:50 router1 kernel: %KERN-3: PCF8584(WR): target ack failure on byte 1
Aug 31 06:13:50 router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
Aug 31 06:13:50 router1 chassisd[12159]: %DAEMON-4-CHASSISD_PEM_INPUT_BAD: status failure for power supply 1 (status bits: 0x0); check circuit breaker
Aug 31 06:13:55 router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug 31 06:13:55 router1 craftd[12162]: %DAEMON-4: Major alarm cleared, PEM 1 Not OK
Oddly enough the errors show up only every few weeks. The power supplies work for weeks without a hitch and then start throwing alerts for a day or a few days and then stop throwing alerts and work flawlessly again for a few weeks.
We've checked and swapped everything. It's not the cables, not the connectors, not the power source.
Then we started sending power supplies back to our supplier. But the errors keep showing up even with brand new, swapped power supplies.
We've found PR1299284 which seems to be related to non-hc power supplies.
Could those errors be related to a software problem which affects RE-S-X6-64G/SCBE-2-MX in combination with High Capacity AC PEMs?
Anyone else experienced errors like that?
Regards
Karl
More information about the juniper-nsp
mailing list