[j-nsp] MPC4D-32*GE Major Alarms

Masood Ahmad Shah masoodnt10 at gmail.com
Sun Feb 14 03:08:20 EST 2016


Some of the alarms are transient (should generate Syslog trap though), and
they generate a Chassis alarm upon occurrence (i.e. PFE<>Fabric plane took
a hit of CRC errors and then got recovered through fabric healing).
Sometimes Chassis does not clear alarm when the transient state gets
cleared and that requires a reboot treatment to the RE (yeah Routing engine
:)

I think chassisd (a daemon) not getting signaled from the relevant other
processes when the states get cleared.

On Sun, Feb 14, 2016 at 6:39 PM, Alex K. <nsp.lists at gmail.com> wrote:

> Hello everyone,
>
> For some time now, one of my customers are getting "major alarms" from the
> MPC mentioned above on one of their MX960s.
>
> The issue is that nothing more than that message (+alarm) seems to be
> present. Nothing preceding that error, neither in "log messages" nor in
> "chassisd". There seems to be output rate drop, at the time of those
> incidents till the MPC get restarted (by the appropriate network team) and
> than everything gets back to normal.
>
> It's worth mentioning that they have a second MX960 serving the other half
> of their end-users, but configured exactly the same - which never had that
> issue (therefore it's probably not traffic related).
>
> They are running 12.3R6.6. The linecard was already replaced. There is
> seems to be no trace options available for monitoring MPCs and their
> internal status and Juniper web site lacks potential explanations and
> leads, therefore I'm addressing the community -  any advice for getting to
> the bottom of this, will be welcomed! Additionally, any experience with
> troubleshooting similar hardware issues might be as helpful as any advice.
>
> Thank you.
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


More information about the juniper-nsp mailing list