[j-nsp] MX960 transient errors on high capacity AC power supplies

Karl Gerhard karl_gerh at gmx.at
Tue Sep 4 03:38:33 EDT 2018


Hello,

we have bought two Juniper MX960 and we're having serious trouble with power supplies triggering alarms and then clearing alarms a few seconds later:
2x RE-S-X6-64G
3x SCBE-2-MX
MX960 Premium 3 chassis
4x High Capacty AC PEMs

$ show log messages | match alarmd
Aug  30 08:09:02  router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug  30 08:09:12  router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug  30 08:12:30  router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug  30 08:12:35  router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug  30 08:14:53  router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug  30 08:14:58  router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
 
 
$ show log messages
Aug  31 06:12:33  router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
Aug  31 06:13:29  router1 kernel: %KERN-3: PCF8584(RD): target ack timeout
Aug  31 06:13:29  router1 kernel: %KERN-3: PCF8584(RD): (i2c_s1=0x08, group=0x3, device=0x51)
Aug  31 06:13:29  router1 kernel: %KERN-3: PCF8584(WR): target ack failure on byte 0
Aug  31 06:13:29  router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
Aug  31 06:13:50  router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug  31 06:13:50  router1 craftd[12162]: %DAEMON-4:  Major alarm set, PEM 1 Not OK
Aug  31 06:13:50  router1 kernel: %KERN-3: PCF8584(WR): target ack failure on byte 0
Aug  31 06:13:50  router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
Aug  31 06:13:50  router1 kernel: %KERN-3: PCF8584(WR): target ack failure on byte 1
Aug  31 06:13:50  router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
Aug  31 06:13:50  router1 chassisd[12159]: %DAEMON-4-CHASSISD_PEM_INPUT_BAD: status failure for power supply 1 (status bits: 0x0); check circuit breaker
Aug  31 06:13:55  router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
Aug  31 06:13:55  router1 craftd[12162]: %DAEMON-4: Major alarm cleared, PEM 1 Not OK

Oddly enough the errors show up only every few weeks. The power supplies work for weeks without a hitch and then start throwing alerts for a day or a few days and then stop throwing alerts and work flawlessly again for a few weeks.

We've checked and swapped everything. It's not the cables, not the connectors, not the power source.
Then we started sending power supplies back to our supplier. But the errors keep showing up even with brand new, swapped power supplies.
We've found PR1299284 which seems to be related to non-hc power supplies.

Could those errors be related to a software problem which affects RE-S-X6-64G/SCBE-2-MX in combination with High Capacity AC PEMs?
Anyone else experienced errors like that?

Regards
Karl



More information about the juniper-nsp mailing list