[j-nsp] MPC4D-32*GE Major Alarms

Tue Feb 16 02:59:01 EST 2016

Hello Diogo,

Thank you for answering. Unfortunately, in my humble opinion, Juniper has
no clear procedure for us to follow.

The cumulative effect of all the test we ran and those you and others
courteously pointed out, is basically none. This in my opinion, due the the
very fact that Juniper has no clear procedure published, to deal with those
kind of errors. Yes, there are some procedures out there for dealing with
clearly defined hardware errors but those are few. Recently, I was dealing
with some hardware related issues on some Cisco gear and I can clearly see
now the plenty of documented hardware show commands and such on Ciscos'
side and the lack of such for Juniper.

I maybe wrong, but that sees to me as like Juniper would like me to add a
case for their JTAC pile, for every issue. Nevermind the fact that we all
have replacement stock available and could replace every part by ourselves,
given we have a way to recognize the faulty part.

I have nothing for or against opening a case with JTAC, besides it's proven
to be a relic of the past. Many other vendors recognized a long time ago,
there's professional force out there and it works quite well. In fact, I
can hardly remember a case I was forced to open with Cisco, since I wasn't
sure what hardware part need replacement. And this given that we have much
more Cisco gear.

Anyhow, I'll welcome any additional ideas from everyone.

Thank you.
On 14 Feb 2016 11:04 a.m., "Diogo Montagner" <diogo.montagner at gmail.com>
wrote:

> That should give you some indication of which subsystem is having problem.
>
> Also, check if there are no core-dumps generated fornthe FPC.
>
> Without additional information will be very hard to pinpoint where to look.
>
> On Sunday, 14 February 2016, Alex K. <nsp.lists at gmail.com> wrote:
>
>> Hello Diogo,
>>
>> I'm currently not on site, so I'll definitely try it when I'll get there.
>> Now I'm considering a plan of actions. What should I look for in that
>> command?
>>
>> Thank you.
>> On 14 Feb 2016 10:00, "Diogo Montagner" <diogo.montagner at gmail.com>
>> wrote:
>>
>>> Alex,
>>>
>>> What do you see in the show nvram at the FPC shell ?
>>>
>>> Do you have a case open with JTAC ?
>>>
>>> Thanks
>>>
>>> On Sunday, 14 February 2016, Alex K. <nsp.lists at gmail.com> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> For some time now, one of my customers are getting "major alarms" from
>>>> the
>>>> MPC mentioned above on one of their MX960s.
>>>>
>>>> The issue is that nothing more than that message (+alarm) seems to be
>>>> present. Nothing preceding that error, neither in "log messages" nor in
>>>> "chassisd". There seems to be output rate drop, at the time of those
>>>> incidents till the MPC get restarted (by the appropriate network team)
>>>> and
>>>> than everything gets back to normal.
>>>>
>>>> It's worth mentioning that they have a second MX960 serving the other
>>>> half
>>>> of their end-users, but configured exactly the same - which never had
>>>> that
>>>> issue (therefore it's probably not traffic related).
>>>>
>>>> They are running 12.3R6.6. The linecard was already replaced. There is
>>>> seems to be no trace options available for monitoring MPCs and their
>>>> internal status and Juniper web site lacks potential explanations and
>>>> leads, therefore I'm addressing the community -  any advice for getting
>>>> to
>>>> the bottom of this, will be welcomed! Additionally, any experience with
>>>> troubleshooting similar hardware issues might be as helpful as any
>>>> advice.
>>>>
>>>> Thank you.
>>>> _______________________________________________
>>>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>>>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>>>
>>>
>>>
>>> --
>>> ./diogo -montagner
>>> JNCIE-SP 0x41A
>>>
>>
>
> --
> ./diogo -montagner
> JNCIE-SP 0x41A
>