[j-nsp] graceful failover and software upgrades

Fri Mar 18 17:54:24 EST 2005

Pekka, 

pls see inline. 

On 20:41 Fri 18 Mar     , Pekka Savola wrote:
> Thanks for the procedures.  This is a nice start, because there isn't 
> anything published on this.
> 
> A few comments inline..
> 
> On Fri, 18 Mar 2005, Jeff C. Strahl wrote:
> > 1.   Enable Graceful restart to minimize/eliminate forwarding plane
> > disruption .  You also need to enable graceful restart on peer nodes for it
> > to buy you anything.
> 
> This brings up an interesting point which hasn't been clearly 
> documented anywhere.
> 
> What's the relation of graceful restart support (at your neighbor 
> nodes) and graceful switchover?
> 
> That is, if we have graceful switchover enabled, what happens if
>   a) neighbor supports graceful restart
>   b) neighbor does not support graceful restart
> 
> (And the currently the router is also configured to do graceful 
> restart..)
> 
> My hunch is that the protocol sessions don't flap with a), and do flap 
> with b). (The flap probably takes 2-10 seconds, the time it takes for 
> the other RE to take control using the switchover..)
> 
> Anyone have concrete ideas about the relation of graceful restart and 
> swichover ?

actually; in both scenario's the sessions flap. Here's what happens with
GRES (graceful RE switchover) enabled:
 - during normal operation the active RE synchronizes state with the
   backup. This is things like interfaces, routes etc.

 - when the active RE fails; the standby takes over. During this time
   the PFE (i.e. forwarding complex) continues to forward with the state
   of the time of the failure.

 - GRES does not mirror TCP sessions or any other routing protocol
   neighborships. So they will go down. 

 - This is where graceful restart comes into play. The original master
   had sessions with it's neighboring routers. When graceful restart is
   enabled, you effectively tell your neighbor: if i disappear don't
   worry i'll be back (tm) and while i'm gone assume i can still forward
   traffic (i.e. don't withdraw my routes from the rest of the network). 

 - when the new master comes up; it establishes new sessions/adjacencies
   and indicates it came back after a graceful restart. The neighbors will 
   now resend all their routing info as it's possible that right at the 
   time of the failure a routing update was missed. 

 - the router now has all current info again and can update the PFE if
   there are any discrepancies between the PFE state and the RE's current
   view on the world.

HTH,
-Daniel.