[j-nsp] mx960 crashed
Aaron Gould
aaron1 at gvtc.com
Wed Apr 4 15:02:09 EDT 2018
Thanks
Jtac says “DMA errors indicate a faulty SSD” processing RMA of RE1
agould at mx960> show chassis routing-engine | grep "reason|slot|uptime"
Slot 0:
Uptime 19 days, 23 hours, 37 minutes, 40 seconds
Last reboot reason 0x4000:VJUNOS reboot
Slot 1:
Uptime 5 hours, 52 minutes, 22 seconds
Last reboot reason 0x800:reboot due to exception
No /dev/mapper/ seen anywhere
RE0
…
/dev/kmem Last changed: Mar 15 14:14:49
/dev/ksyms Last changed: Mar 15 14:15:16
/dev/led/ Last changed: Mar 15 14:14:52
/dev/mch Last changed: Mar 15 14:23:50
/dev/md0 Last changed: Mar 15 14:23:50
/dev/md0.uzip Last changed: Mar 15 14:14:49
/dev/md1 Last changed: Mar 15 14:14:53
/dev/md1.uzip Last changed: Mar 15 14:14:53
…
BU RE1
…
/dev/ksyms Last changed: Apr 04 08:08:38
/dev/led/ Last changed: Apr 04 08:08:16
/dev/mch Last changed: Apr 04 08:09:06
/dev/md0 Last changed: Apr 04 08:09:06
/dev/md0.uzip Last changed: Apr 04 08:08:13
…
From: Graham Brown [mailto:juniper-nsp at grahambrown.info]
Sent: Wednesday, April 4, 2018 12:55 PM
To: Aaron Gould
Cc: juniper-nsp at puck.nether.net
Subject: Re: [j-nsp] mx960 crashed
Hi Aaron,
Looks like you have a core file there, raise a TAC case and they’ll be able to determine the root cause for you.
Also see if there is anything readable in /dev/mapper/jvg_P-jlvmrootrw/var/crash/override_reboot_reason that may point you to the root cause.
HTH,
Graham
On Thu, 5 Apr 2018 at 01:13, Aaron Gould <aaron1 at gvtc.com> wrote:
Still in the process of turning up this new 5-node mx960 100 gig ring, and I
went to the backup re console and went to login and saw this happen.
Any idea why this happened and how do I tshoot cause ?
FreeBSD/amd64 (mx960) (ttyu0)
login: root
Password:SysRq : Trigger a crash
dmar: DRHD: handling fault status reg 2
dmar: DMAR:[DMA Write] Request device [02:10.1] fault addr 151d9000
DMAR:[fault reason 01] Present bit in root entry is clear
dmar: DRHD: handling fault status reg 102
dmar: DMAR:[DMA Write] Request device [02:10.1] fault addr 1540c000
DMAR:[fault reason 01] Present bit in root entry is clear
dmar: DRHD: handling fault status reg 202
dmar: DMAR:[DMA Write] Request device [02:10.1] fault addr 152fc000
DMAR:[fault reason 01] Present bit in root entry is clear
dmar: DRHD: handling fault status reg 302
dmar: DMAR:[DMA Write] Request device [06:0a.0] fault addr 1ed10d000
DMAR:[fault reason 01] Present bit in root entry is clear
dmar: DRHD: handling fault status reg 402
dmar: DMAR:[DMA Read] Request device [06:0a.0] fault addr ad4a000
DMAR:[fault reason 01] Present bit in root entry is clear
dmar: INTR-REMAP: Request device [[05:00.1] fault index 40
INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
dmar: DRHD: handling fault status reg 602
dmar: DMAR:[DMA Read] Request device [06:0a.0] fault addr ad4a000
DMAR:[fault reason 01] Present bit in root entry is clear
dmar: DRHD: handling fault status reg 702
dmar: DMAR:[DMA Read] Request device [06:0a.0] fault addr ad4a000
DMAR:[fault reason 01] Present bit in root entry is clear
4 logical volume(s) in volume group "jvg_S" now active
4 logical volume(s) in volume group "jvg_P" now active
Override SW Exception reboot reason saved to
/dev/mapper/jvg_P-jlvmrootrw/var/crash/override_reboot_reason
Compressed level 31 vmhost kernel core will be dumped to jvg_P-jlvmrootrw.
Please use crash utility to analyze the core
Copying data : [ 13 %]
.
.
.
Copying data : [100 %]
The dumpfile is saved to vmcore-0-compressed-201804041305.
makedumpfile Completed.
(many more boot messages seen)
-Aaron
_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
--
-sent from my iPhone; please excuse spelling, grammar and brevity-
More information about the juniper-nsp
mailing list