[j-nsp] SSD disks high failure ratio ?

Pierre-Yves Maunier j-nsp at maunier.org
Tue Oct 8 10:50:40 EDT 2013


I confirmed just by serial number and also by the fact that during the
reboot after a software upgrade my filesystem died on the /var partition.

I'm still waiting a confirmation from the TAC.


On Tuesday, October 8, 2013, Paul Stewart wrote:

> Did you confirm by serial number that you were effected?  The reason I ask
> is we had a pair of RE1800's that matched on part number but after JTAC
> ran the serial numbers they re-assured us that we were not actually
> effected (which is kind of scary in itself).
>
> Paul
>
>
> On 2013-10-07 7:58 PM, "Pierre-Yves Maunier" <j-nsp at maunier.org<javascript:;>>
> wrote:
>
> >Hello,
> >
> >I have affected REs, and before I had the knowledge of the KB, I found a
> >workaround to repair the filesystem because the TAC was unable to tell me
> >anything about this KB.
> >
> >After an upgrade from 12.2R1.8 to 12.3R4.6 I got this :
> >
> >=================== Bootstrap installer starting ===================
> >Initialized the environment
> >Routing engine model is RE-S-1800x4
> >HW model is Intel(R) Xeon(R) CPU           C5518  @ 1.73GHz
> >[: kontron: unexpected operator
> >Discovered that flash disk = ad0 , hard disk = ad1
> >mount: /dev/ad1s1f : Invalid argument
> >ERROR: mount_partition: Mount /dev/ad1s1f /mnt failed
> >You are now in a debugging subshell (you may not see a prompt)Š
> >#
> >
> >And after a reboot I got this :
> >
> >Automatic reboot in progress...
> >** /dev/ad1s1a
> >FILE SYSTEM CLEAN; SKIPPING CHECKS
> >clean, 1673532 free (124 frags, 209176 blocks, 0.0% fragmentation)
> >** /dev/ad1s1e
> >FILE SYSTEM CLEAN; SKIPPING CHECKS
> >clean, 201639 free (31 frags, 25201 blocks, 0.0% fragmentation)
> >Cannot find file system superblock
> >32 is not a file system superblock
> >28740192 is not a file system superblock
> >** /dev/ad1s1f
> >
> >
> >LOOK FOR ALTERNATE SUPERBLOCKS? yes
> >
> >
> >SEARCH FOR ALTERNATE SUPER-BLOCK FAILED. YOU MUST USE THE
> >-b OPTION TO FSCK TO SPECIFY THE LOCATION OF AN ALTERNATE
> >SUPER-BLOCK TO SUPPLY NEEDED INFORMATION; SEE fsck(8).
> >tunefs: /var: could not read superblock to fill out disk
> >mount: /dev/ad1s1f : Invalid argument
> >WARNING:
> >WARNING: /var mount failed, building emergency /var
> >WARNING:
> >Creating initial configuration...mgd: commit complete
> >Setting initial options:  debugger_on_panic=NO debugger_on_break=NO.
> >Starting optional daemons:  usbd.
> >Doing initial network setup:
> >.
> >Initial interface configuration:
> >
> >
> >So the /var partition on /dev/ad1s1f (SSD) needed a fsck but it failed
> >because of a 'bad superblock'
> >
> >Going in the shell as root, I issued the following command to get a lisk
> >of
> >'backup' super-blocks :
> >
> >root at CORE-01% newfs -N /dev/ad1s1f
> >/dev/ad1s1f: 18342.8MB (37566076 sectors) block size 16384, fragment size
> >2048
> >     using 100 cylinder groups of 183.69MB, 11756 blks, 23552 inodes.
> >super-block backups (for fsck -b #) at:
> > 32, 376224, 752416, 1128608, 1504800, 1880992, 2257184, 2633376, 3009568,
> > 3385760, 3761952, 4138144, 4514336, 4890528, 5266720, 5642912, 6019104,
> > 6395296, 6771488, 7147680, 7523872, 7900064, 8276256, 8652448, 9028640,
> > 9404832, 9781024, 10157216, 10533408, 10909600, 11285792, 11661984,
> >12038176,
> > 12414368, 12790560, 13166752, 13542944, 13919136, 14295328, 14671520,
> > 15047712, 15423904, 15800096, 16176288, 16552480, 16928672, 17304864,
> > 17681056, 18057248, 18433440, 18809632, 19185824, 19562016, 19938208,
> > 20314400, 20690592, 21066784, 21442976, 21819168, 22195360, 22571552,
> > 22947744, 23323936, 23700128, 24076320, 24452512, 24828704, 25204896,
> > 25581088, 25957280, 26333472, 26709664, 27085856, 27462048, 27838240,
> > 28214432, 28590624, 28966816, 29343008, 29719200, 30095392, 30471584,
> > 30847776, 31223968, 31600160, 31976352, 32352544, 32728736, 33104928,
> > 33481120, 33857312, 34233504, 34609696, 34985888, 35362080, 35738272,
> > 36114464, 36490656, 36866848, 37243040
> >
> >Then this command fixed the problem (376224 is the first super-block after
> >'32' which seem to have an issue) :
> >
> >root at CORE-01% fsck_ufs -y -b 376224 /dev/ad1s1f
> >
> >Does anyone knows what is the 'software solution' that 'has also been
> >developed to correct the affected REs in the field' as said in the KB ?
> >
> >Pierre-Yves
> >
> >
> >
> >2013/10/4 Phil Mayers <p.mayers at imperial.ac.uk>
> >
> >> Saku Ytti <saku at ytti.fi> wrote:
> >> >On (2013-10-03 18:08 -0400), Paul Stewart wrote:
> >> >
> >> >> "Article is in review and not yet ready for viewing"
> >> >
> >> >http://kb.juniper.net/InfoCenter/index?page=content&id=TSB16210
> >> >
> >> >>
> >> >>
> >>
> >>
> http://kb.juniper.net/InfoCenter/index?page=content&id=S:TSB16164&smlogin
> >>=
> >> >
> >> >--
> >> >  ++ytti
> >> >_______________________________________________
> >> >juniper-nsp mailing list juniper-nsp at puck.nether.net
> >> >https://puck.nether.net/mailman/listinfo/juniper-nsp
> >>
> >> Thanks, this is very useful - does look like our new REs are affected
> >>:o(
> >


More information about the juniper-nsp mailing list