[c-nsp] RANCID Spiking CPUs

Mon Jun 9 17:21:36 EDT 2008

Mon, Jun 09, 2008 at 03:56:08PM -0400, Nick Davey:
> Hi All,
> I've deployed rancid on a fairly large metro network, and am seeing some
> pretty high CPU averages. When RANCID runs the CPU's on a large number of
> our boxes spike to about 95% for several seconds. Although they have never
> hit 100%, or caused any issues (dropped OSPF hello's, stp bpdu's) I'm
> concerned that this could happen under the right combination of events this
> could result is dropped OSPF neighbor adjacency's or other badness.
> 
> I've tried to replicate the high CPU issue by pasting the commands in
> manually however I haven't come anywhere close to the 95% I'm seeing when
> RANCID runs them. I'm assuming this is just the frequency at which the
> commands are run. Does anyone have any experience with this or any insight
> they can provide?

RANCID will submit many commands in less time that it'l take a human; you
will not likely be able to replicate it by hand.  However, displaying the
configuration, esp for a large and compressed config, likely causes the
greatest CPU util of any of the commands that are used.

Any process should be able to consume all the available CPU.  However, your
device's scheduler should use a higher preference for critical core
processes, such as routing, and lower for the user/cli so when there is a
resource deficit the critical bits get the time they need.  For example,
notice that when a device boots, BGP consumes all the CPU, yet OSPF
continues to manage its timers.

I've only seen one case where it was a problem; a massive EoA config.  show
running-config took so long that it affected management.  But, the same was
true when run by a human.  It was the wrong box for the job.