[j-nsp] MPC4D-32*GE Major Alarms

Sun Feb 14 03:40:09 EST 2016

Hello Masood,

This problem probably isn't transient, since the output rate from that
linecard is cut in half (and restored after MPC get restarted).

I was thinking in the direction you mentioned but CRC errors at the fabric
should be indeed, transient in nature but they're "seem to persist" till
MPC restart. And if I recall correctly, no PFE related errors were present
(maybe I was trying other show commands, you are welcomed to point out the
ones you thought of).

Thank you.
On 14 Feb 2016 10:08 a.m., "Masood Ahmad Shah" <masoodnt10 at gmail.com> wrote:

> Some of the alarms are transient (should generate Syslog trap though), and
> they generate a Chassis alarm upon occurrence (i.e. PFE<>Fabric plane took
> a hit of CRC errors and then got recovered through fabric healing).
> Sometimes Chassis does not clear alarm when the transient state gets
> cleared and that requires a reboot treatment to the RE (yeah Routing engine
> :)
>
> I think chassisd (a daemon) not getting signaled from the relevant other
> processes when the states get cleared.
>
> On Sun, Feb 14, 2016 at 6:39 PM, Alex K. <nsp.lists at gmail.com> wrote:
>
>> Hello everyone,
>>
>> For some time now, one of my customers are getting "major alarms" from the
>> MPC mentioned above on one of their MX960s.
>>
>> The issue is that nothing more than that message (+alarm) seems to be
>> present. Nothing preceding that error, neither in "log messages" nor in
>> "chassisd". There seems to be output rate drop, at the time of those
>> incidents till the MPC get restarted (by the appropriate network team) and
>> than everything gets back to normal.
>>
>> It's worth mentioning that they have a second MX960 serving the other half
>> of their end-users, but configured exactly the same - which never had that
>> issue (therefore it's probably not traffic related).
>>
>> They are running 12.3R6.6. The linecard was already replaced. There is
>> seems to be no trace options available for monitoring MPCs and their
>> internal status and Juniper web site lacks potential explanations and
>> leads, therefore I'm addressing the community -  any advice for getting to
>> the bottom of this, will be welcomed! Additionally, any experience with
>> troubleshooting similar hardware issues might be as helpful as any advice.
>>
>> Thank you.
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
>
>