[j-nsp] EX 8200 deployment

Hoogen hoogen82 at gmail.com
Thu Mar 25 01:45:07 EDT 2010


I think flash isn't going to be considered... It has a finite erase/write
cycles.. yeah but 8200 could have had more storage..

-Hoogen

On Wed, Mar 24, 2010 at 4:21 PM, Richard A Steenbergen <ras at e-gerbil.net>wrote:

> On Thu, Mar 25, 2010 at 12:31:15AM +0300, Pavel Lunin wrote:
> > Richard, one more thing. What do you do with the crash dumps
> > untarzipping them on the router/switch itself? I have never done
> > anything with them but sending to JTA. I believe it can have a lot of
> > sense to pick them and discover yourself (though I've never tried),
> > but why on the switch itself? Am I missing something important?
>
> You can run gdb on the coredump files locally and get a pretty good idea
> of what blew up and where, which is often quite helpful in working
> around the original problem. Also, JTAC is far too often surprisingly
> bad at working with coredumps, and without the ability to independently
> verify things myself and tell them they were confused I've had some
> cases which would probably never have been solved.
>
> The story that was explained to me was that JTAC has some point and
> click tool that they load the core into, which parses it and searches
> their PR database to find matching backtraces. The problem is I'm
> convinced at this point nobody in JTAC actually knows what a backtrace
> is or how to read it, they just match it to whatever their tool tells
> them, and surprisingly often their tool is very very wrong.
>
> The other big problem of course is file size and compression. Apparently
> their tool only works with .zip files not .tgz files (which is a small
> bit of a problem, seeing as how the router only has gzip :P), so they
> have to uncompress it locally first before they can load it. I've had
> JTAC not know what a .tgz file was, I've had Advanced JTAC spend days
> trying to figure out why they couldn't get any data out of a coredump
> when the problem turned out to be their local filesystem quota wasn't
> big enough to work with a large core file, etc, etc. Even when things
> work "right" it seems to take them 12-72 hours to parse a coredump even
> on a p1 case, and a healthy percentage of the time their analysis is
> just flat out wrong. Without the ability to look at the dump yourself,
> you'd never know they were barking up the wrong tree.
>
> Because EX uses PowerPC, it isn't even particularly easy to find a
> FreeBSD ppc box where you can actually do any useful analysis of the
> coredumps. That assumes of course that you have working connectivity on
> the box in question and can quickly copy the sometimes very large files
> off, which due to the original problem that caused the crash is often
> times not the case. And where do they plan on writing a 2GB core dump
> when there is an EX kernel panic and you only have 600MB of free space
> on an "empty" box? You can bet there will be, I run into them at least
> 2 or 3 times a year on MX easily, it's just a fact of life. I mean
> seriously what does 32GB of flash cost, $100? Think about the amount of
> grief that will be caused by this in comparison, and tell me it was a
> smart move on their part. :)
>
> --
> Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


More information about the juniper-nsp mailing list