[j-nsp] 10.0 or 10.4?
Richard A Steenbergen
ras at e-gerbil.net
Thu Mar 17 11:23:38 EDT 2011
On Tue, Mar 15, 2011 at 10:57:56AM -0700, Steve Feldman wrote:
>
> What sorts of bugs did you see in 10.4R2?
We were just testing 10.4 on MX, since EX features are being a lot more
actively developed, thus making major version jumps much more risky. For
example, when we tried moving from 10.1 to 10.2 on EX8200 we discovered
a major SNMP issue caused by some internal changes that broke polling
and caused severe billing issues. There are a lot of firewall specific
changes in 10.4 for EX which we wanted more time to properly test before
deploying, since this is an area which tends to cause major outages when
bugs do pop up. I already outlined a few of the issues we found on
10.4R2 on MX in a previous post, and seeing as it seemed much less
stable than 10.3R3 for near equivalent features and bugfixes I didn't
see the point of pursuing it further right now.
> JTAC is recommending 10.4R2 on our EX8200s to fix a bug (PR581625 in
> 10.1R4) where some of our firewall filter rules were being silently
> ignored.
Well, here is what I can tell you about the reasons behind our most
recent move to 10.3R3 on EX8200 (where these are all fixed). A couple of
these were actually fixed in 10.2R3, but a few other nasty issues were
introduced at the same time, making 10.2R3 unusable in production. We're
heading towards 11.1 in the near future for other reasons anyways, which
is why we went after 10.3R3 instead of 10.2R4 for our next major code
goal, but 10.4R2 was definitely not confidence inspiring based on the
issues we saw under MX. :)
PR588115 - Changing the forwarding-table export policy twice in a row
quickly (while the previous change is still being evaluated) will cause
rpd to coredump.
PR581139 - Similar to above, but causes the FPC to crash too. Give it
several minutes before you commit again following a forwarding-table
export policy change.
PR523493 - Mysterious FPC crashes
PR509303 - Massive SNMP slowness and stalls, severely impacting polling
of 10.2R3 boxes with a decent number of interfaces (the more interfaces
the worse the situation).
PR566782 PR566717 PR540577 - Some more mysterious rpd and pfem crashes,
with extra checks added to prevent it in the future.
PR559679 - Commit script transient change issue, which sometimes causes
changes to not be picked up correctly unless you do a "commit full".
PR548166 - Sometimes most or all BGP sessions on a CPU loaded box will
drop to Idle following a commit and take 30+ minutes to come back up.
PR554456 - Sometimes netconf connections to EX8200's will result in junk
error messages being logged to the XML stream, corrupting the netconf
session.
PR550902 - On a CPU loaded box sometimes BGP policy-statement evaluation
will simply stop working, requiring a hard clear of the neighbor (or
ironically enough, sometimes just renaming the term in the policy will
fix it :P) to restore normal evaluation.
PR521993 - Ports on EX8200 FPCs will sometimes not initialize correctly,
resulting in situations where for example ports 4 and 5 on every FPC
will be able to receive packets but never transmit them. If you continue
to try and transmit packets down a wedged port (such as would happen if
the port is configured for L2), it will cause the FPC to crash.
There are also significant BGP convergence performance improvements
introduced in 10.3R3 and 10.4R2 if you have a lot of routes with
communities on them. For us this reduced convergence times on EX8200
with a few transit and ibgp rr feed views (~2mil paths) from 1 hour+ to
15-20 minutes. Still not "good" by any means, but significantly "less
bad".
Of course there are a few other issues introduced in 10.3R3 too
(shocker, that NEVER happens :P), but I already discussed them
previously. Ultimately I think 10.4 will be more interesting in its R3
or R4 timeframe, with SRs past that, but I don't think we're going to
end up using it much on EX8200 (as I said, we're being forced into 11.1
to support specific hardware anyways).
--
Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
More information about the juniper-nsp
mailing list