[j-nsp] L3VPN/RR/PE on Same router

Fri Aug 17 10:28:53 EDT 2018

> and that thing would then crash BGP on RRs, can't afford that happening.

Then best thing is to run two or three RRs in parallel each using different
BGP code base - even for the same AFI/SAFI pair

I am seeing number of networks running single vendor RRs and when things
melt they run around and claim that the problem was was really so rear and
unexpected :) Well usually bugs are of unexpected nature  ....

Thx,
R.

On Fri, Aug 17, 2018 at 4:05 PM, <adamv0025 at netconsultings.com> wrote:

> > From: Saku Ytti [mailto:saku at ytti.fi]
> > Sent: Friday, August 17, 2018 2:38 PM
> > To: Mark Tinka
> > Cc: adamv0025 at netconsultings.com; tim tiriche; Juniper List
> > Subject: Re: [j-nsp] L3VPN/RR/PE on Same router
> >
> > Hey Mark,
> >
> > > > Yes a good practice is to separate internet routes from
> > > > internal/services l3vpn routes onto separate BGP control planes
> > > > (different sessions at least) so that malformed bgp msg will affect
> > > > just one part of your overall BGP infrastructure.
> > >
> > > I see you've been giving this advice for quite some time now.
> >
> > I'm siding with Adam here. His disaster scenario actually happed to me in
> > 3292. We ran for years VXR VPN route-reflectors, after we changed them to
> > MX240 we added lot more RR's, with some hard justifications to
> > management why we need more when we've had no trouble with the count
> > we had.
> > After about 3 months of running MX240 reflectors, we got bad BGP UPDATE
> > and crashed each reflector, which was unprecedented outage in the history
> > of the network. And tough to explain to management, considering we just
> > had made the reflection more redundant with some significant investment.
> > I'm sure they believed we just had cocked it up, as people don't really
> > believe in chance/randomness, evident how people justify that things
> can't
> > be broken, by explaining how in previous moment in time it wasn't broken,
> > implying that transitioning from non-broken to broken is impossible.
> >
> > Note, this is not to trash on Juniper, all vendors have bad BGP
> > implementations and I'm sure one can fuzz any of them to find crash bugs.
> >
> Oh yeah for sure, the XR RRs too were crashing upon reception of malformed
> BGP updates in the past.
>
> Currently XR BGP is *somewhat protected by the "BGP Attribute Filter and
> Enhanced Attribute Error
> Handling" (now RFC 7606) which already proved itself to me (just got a log
> msg informing me the malformed attribute was deleted instead of an
> important transit session reset).
> Unfortunately can't enable it on junos as the code we run would instead of
> session reset crashed the rpd due to a bug if the RFC 7606 feature would be
> enabled.
>
> *But still I'd be haunted by what could happen if RFC 7606 would have
> missed something and that thing would then crash BGP on RRs, can't afford
> that happening.
>
>
> > Not only is it CAPEX irrelevant to have separate RR for IPv4 and IPv6,
> but you
> > also get faster convergence, as more CPU cycles, fewer BGP neighbours,
> less
> > routes. I view it as cheap insurance as well as very simple horizontal
> scaling.
> >
> And going virtual this really is a marginal spend in the grand scheme of
> things.
>
> adam
>
> netconsultings.com
> ::carrier-class solutions for the telecommunications industry::
>
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>