[j-nsp] what happens if HDD on routing-engine fails during the router operation?

Martin T m4rtntns at gmail.com
Wed Jun 26 04:58:23 EDT 2013


Hi,

I did now :) However, it had no effect. On the other hand, dismounting
the /var is not near the same as completely removing or failure of the
HDD on a working routing-engine.


Example with M20:

root at M20> show configuration chassis
routing-engine {
    on-disk-failure disk-failure-action reboot;
}

root at M20> show system processes brief
last pid:  1475;  load averages:  0.00,  0.12,  0.15  up 0+00:11:35    07:08:28
105 processes: 3 running, 86 sleeping, 16 waiting

Mem: 136M Active, 115M Inact, 32M Wired, 132M Cache, 69M Buf, 1580M Free
Swap: 2048M Total, 2048M Free




root at M20> start shell csh
root at M20% mount
/dev/ad0s1a on / (ufs, local, noatime)
devfs on /dev (devfs, local)
devfs on /dev/ (devfs, local, noatime, noexec, read-only)
/dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only)
/dev/md1 on /packages/mnt/jkernel-9.4R3.5 (cd9660, local, noatime, read-only)
/dev/md2 on /packages/mnt/jpfe-M40-9.4R3.5 (cd9660, local, noatime, read-only)
/dev/md3 on /packages/mnt/jdocs-9.4R3.5 (cd9660, local, noatime, read-only)
/dev/md4 on /packages/mnt/jroute-9.4R3.5 (cd9660, local, noatime, read-only)
/dev/md5 on /packages/mnt/jcrypto-9.4R3.5 (cd9660, local, noatime, read-only)
/dev/md6 on /packages/mnt/jpfe-common-9.4R3.5 (cd9660, local, noatime,
read-only)
/dev/md7 on /tmp (ufs, local, noatime, soft-updates)
/dev/md8 on /mfs (ufs, local, noatime, soft-updates)
/dev/ad0s1e on /config (ufs, local, noatime)
procfs on /proc (procfs, local, noatime)
/dev/ad1s1f on /var (ufs, local, noatime)
root at M20% umount -f /var
root at M20% mount
/dev/ad0s1a on / (ufs, local, noatime)
devfs on /dev (devfs, local)
devfs on /dev/ (devfs, local, noatime, noexec, read-only)
/dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only)
/dev/md1 on /packages/mnt/jkernel-9.4R3.5 (cd9660, local, noatime, read-only)
/dev/md2 on /packages/mnt/jpfe-M40-9.4R3.5 (cd9660, local, noatime, read-only)
/dev/md3 on /packages/mnt/jdocs-9.4R3.5 (cd9660, local, noatime, read-only)
/dev/md4 on /packages/mnt/jroute-9.4R3.5 (cd9660, local, noatime, read-only)
/dev/md5 on /packages/mnt/jcrypto-9.4R3.5 (cd9660, local, noatime, read-only)
/dev/md6 on /packages/mnt/jpfe-common-9.4R3.5 (cd9660, local, noatime,
read-only)
/dev/md7 on /tmp (ufs, local, noatime, soft-updates)
/dev/md8 on /mfs (ufs, local, noatime, soft-updates)
/dev/ad0s1e on /config (ufs, local, noatime)
procfs on /proc (procfs, local, noatime)
root at M20% exit
exit

root at M20> ?
No valid completions
root at M20>
error: unknown command: .noop-command


root at M20> Jun 26 07:09:49 init: can't chdir to /var/tmp/: No such file
or directory
Jun 26 07:09:54 init: can't chdir to /var/tmp/: No such file or directory
Jun 26 07:09:59 init: can't chdir to /var/tmp/: No such file or directory
Jun 26 07:10:04 init: can't chdir to /var/tmp/: No such file or directory
Jun 26 07:10:04 init: can't chdir to /var/tmp/: No such file or directory



Example with M10i:

root at M10i> show configuration chassis
routing-engine {
    on-disk-failure disk-failure-action reboot;
}

root at M10i> show system processes brief
last pid:  1473;  load averages:  3.97,  1.22,  0.47  up 0+00:02:46    08:17:13
111 processes: 5 running, 89 sleeping, 17 waiting

Mem: 181M Active, 54M Inact, 33M Wired, 216M Cache, 69M Buf, 1012M Free
Swap: 2048M Total, 2048M Free




root at M10i> start shell csh
root at M10i% mount
/dev/ad0s1a on / (ufs, local, noatime)
devfs on /dev (devfs, local, multilabel)
/dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only, verified)
/dev/md1 on /packages/mnt/jkernel-10.4R12.4 (cd9660, local, noatime,
read-only, verified)
/dev/md2 on /packages/mnt/jpfe-M7i-10.4R12.4 (cd9660, local, noatime, read-only)
/dev/md3 on /packages/mnt/jdocs-10.4R12.4 (cd9660, local, noatime,
read-only, verified)
/dev/md4 on /packages/mnt/jroute-10.4R12.4 (cd9660, local, noatime,
read-only, verified)
/dev/md5 on /packages/mnt/jcrypto-10.4R12.4 (cd9660, local, noatime,
read-only, verified)
/dev/md6 on /packages/mnt/jpfe-common-10.4R12.4 (cd9660, local,
noatime, read-only)
/dev/md7 on /packages/mnt/jruntime-10.4R12.4 (cd9660, local, noatime,
read-only, verified)
/dev/md8 on /tmp (ufs, asynchronous, local, noatime)
/dev/md9 on /mfs (ufs, asynchronous, local, noatime)
/dev/ad0s1e on /config (ufs, local, noatime)
procfs on /proc (procfs, local, noatime)
/dev/ad1s1f on /var (ufs, local, noatime)
root at M10i% umount -f /var
root at M10i% mount
/dev/ad0s1a on / (ufs, local, noatime)
devfs on /dev (devfs, local, multilabel)
/dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only, verified)
/dev/md1 on /packages/mnt/jkernel-10.4R12.4 (cd9660, local, noatime,
read-only, verified)
/dev/md2 on /packages/mnt/jpfe-M7i-10.4R12.4 (cd9660, local, noatime, read-only)
/dev/md3 on /packages/mnt/jdocs-10.4R12.4 (cd9660, local, noatime,
read-only, verified)
/dev/md4 on /packages/mnt/jroute-10.4R12.4 (cd9660, local, noatime,
read-only, verified)
/dev/md5 on /packages/mnt/jcrypto-10.4R12.4 (cd9660, local, noatime,
read-only, verified)
/dev/md6 on /packages/mnt/jpfe-common-10.4R12.4 (cd9660, local,
noatime, read-only)
/dev/md7 on /packages/mnt/jruntime-10.4R12.4 (cd9660, local, noatime,
read-only, verified)
/dev/md8 on /tmp (ufs, asynchronous, local, noatime)
/dev/md9 on /mfs (ufs, asynchronous, local, noatime)
/dev/ad0s1e on /config (ufs, local, noatime)
procfs on /proc (procfs, local, noatime)
root at M10i% Jun 26 08:18:04 init: can't chdir to /var/tmp/: No such
file or directory
exit
exit

root at M10i> Jun 26 08:18:09 init: can't chdir to /var/tmp/: No such
file or directory
?
No valid completions
root at M10i> Jun 26 08:18:15 init: can't chdir to /var/tmp/: No such
file or directory
Jun 26 08:18:20 init: can't chdir to /var/tmp/: No such file or directory
Jun 26 08:18:20 init: can't chdir to /var/tmp/: No such file or directory


One other important thing what happens if HDD fails is that swap space
is lost. This is probably rather critical with for example RE-333-256.
In addition, looks like the RE-850 has no problems with booting up
without the HDD while RE-600 or RE-333 do not boot up without HDD..


Still, what exactly makes the RE reload when HDD is lost?


regards,
Martin

2013/6/26, Per Granath <per.granath at gcc.com.cy>:
> Did you try it with this configuration?
>
> chassis {
>     redundancy {
>         failover {
>             on-loss-of-keepalives;
>             on-disk-failure;
>         }
>     }
> }
>
>
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


More information about the juniper-nsp mailing list