[j-nsp] MX960 transient errors on high capacity AC power supplies

zh73 zh73 at 163.com
Tue Sep 4 20:22:45 EDT 2018


What type of MPCs are you using? which junos release?
Better upgrade to 17.3R3 which have PR1312336, PR1325271, PR1349179 fix.  
Or open a case to JTAC.


At 2018-09-04 15:38:33, "Karl Gerhard" <karl_gerh at gmx.at> wrote:
>Hello,
>
>we have bought two Juniper MX960 and we're having serious trouble with power supplies triggering alarms and then clearing alarms a few seconds later:
>2x RE-S-X6-64G
>3x SCBE-2-MX
>MX960 Premium 3 chassis
>4x High Capacty AC PEMs
>
>$ show log messages | match alarmd
>Aug  30 08:09:02  router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
>Aug  30 08:09:12  router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
>Aug  30 08:12:30  router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
>Aug  30 08:12:35  router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
>Aug  30 08:14:53  router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
>Aug  30 08:14:58  router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
> 
> 
>$ show log messages
>Aug  31 06:12:33  router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
>Aug  31 06:13:29  router1 kernel: %KERN-3: PCF8584(RD): target ack timeout
>Aug  31 06:13:29  router1 kernel: %KERN-3: PCF8584(RD): (i2c_s1=0x08, group=0x3, device=0x51)
>Aug  31 06:13:29  router1 kernel: %KERN-3: PCF8584(WR): target ack failure on byte 0
>Aug  31 06:13:29  router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
>Aug  31 06:13:50  router1 alarmd[12567]: %DAEMON-4: Alarm set: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
>Aug  31 06:13:50  router1 craftd[12162]: %DAEMON-4:  Major alarm set, PEM 1 Not OK
>Aug  31 06:13:50  router1 kernel: %KERN-3: PCF8584(WR): target ack failure on byte 0
>Aug  31 06:13:50  router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
>Aug  31 06:13:50  router1 kernel: %KERN-3: PCF8584(WR): target ack failure on byte 1
>Aug  31 06:13:50  router1 kernel: %KERN-3: PCF8584(WR): (i2c_s1=0x08, group=0x3, device=0x51)
>Aug  31 06:13:50  router1 chassisd[12159]: %DAEMON-4-CHASSISD_PEM_INPUT_BAD: status failure for power supply 1 (status bits: 0x0); check circuit breaker
>Aug  31 06:13:55  router1 alarmd[12567]: %DAEMON-4: Alarm cleared: Pwr supply color=RED, class=CHASSIS, reason=PEM 1 Not OK
>Aug  31 06:13:55  router1 craftd[12162]: %DAEMON-4: Major alarm cleared, PEM 1 Not OK
>
>Oddly enough the errors show up only every few weeks. The power supplies work for weeks without a hitch and then start throwing alerts for a day or a few days and then stop throwing alerts and work flawlessly again for a few weeks.
>
>We've checked and swapped everything. It's not the cables, not the connectors, not the power source.
>Then we started sending power supplies back to our supplier. But the errors keep showing up even with brand new, swapped power supplies.
>We've found PR1299284 which seems to be related to non-hc power supplies.
>
>Could those errors be related to a software problem which affects RE-S-X6-64G/SCBE-2-MX in combination with High Capacity AC PEMs?
>Anyone else experienced errors like that?
>
>Regards
>Karl
>
>_______________________________________________
>juniper-nsp mailing list juniper-nsp at puck.nether.net
>https://puck.nether.net/mailman/listinfo/juniper-nsp


More information about the juniper-nsp mailing list