[j-nsp] L3VPN/RR/PE on Same router
adamv0025 at netconsultings.com
adamv0025 at netconsultings.com
Fri Aug 17 10:05:52 EDT 2018
> From: Saku Ytti [mailto:saku at ytti.fi]
> Sent: Friday, August 17, 2018 2:38 PM
> To: Mark Tinka
> Cc: adamv0025 at netconsultings.com; tim tiriche; Juniper List
> Subject: Re: [j-nsp] L3VPN/RR/PE on Same router
>
> Hey Mark,
>
> > > Yes a good practice is to separate internet routes from
> > > internal/services l3vpn routes onto separate BGP control planes
> > > (different sessions at least) so that malformed bgp msg will affect
> > > just one part of your overall BGP infrastructure.
> >
> > I see you've been giving this advice for quite some time now.
>
> I'm siding with Adam here. His disaster scenario actually happed to me in
> 3292. We ran for years VXR VPN route-reflectors, after we changed them to
> MX240 we added lot more RR's, with some hard justifications to
> management why we need more when we've had no trouble with the count
> we had.
> After about 3 months of running MX240 reflectors, we got bad BGP UPDATE
> and crashed each reflector, which was unprecedented outage in the history
> of the network. And tough to explain to management, considering we just
> had made the reflection more redundant with some significant investment.
> I'm sure they believed we just had cocked it up, as people don't really
> believe in chance/randomness, evident how people justify that things can't
> be broken, by explaining how in previous moment in time it wasn't broken,
> implying that transitioning from non-broken to broken is impossible.
>
> Note, this is not to trash on Juniper, all vendors have bad BGP
> implementations and I'm sure one can fuzz any of them to find crash bugs.
>
Oh yeah for sure, the XR RRs too were crashing upon reception of malformed BGP updates in the past.
Currently XR BGP is *somewhat protected by the "BGP Attribute Filter and Enhanced Attribute Error
Handling" (now RFC 7606) which already proved itself to me (just got a log msg informing me the malformed attribute was deleted instead of an important transit session reset).
Unfortunately can't enable it on junos as the code we run would instead of session reset crashed the rpd due to a bug if the RFC 7606 feature would be enabled.
*But still I'd be haunted by what could happen if RFC 7606 would have missed something and that thing would then crash BGP on RRs, can't afford that happening.
> Not only is it CAPEX irrelevant to have separate RR for IPv4 and IPv6, but you
> also get faster convergence, as more CPU cycles, fewer BGP neighbours, less
> routes. I view it as cheap insurance as well as very simple horizontal scaling.
>
And going virtual this really is a marginal spend in the grand scheme of things.
adam
netconsultings.com
::carrier-class solutions for the telecommunications industry::
More information about the juniper-nsp
mailing list