[c-nsp] IOS(-XE) 'configure replace' robustness

James Bensley jwbensley at gmail.com
Wed May 3 11:45:20 EDT 2017


On 4 April 2017 at 15:23, Saku Ytti <saku at ytti.fi> wrote:
>> soon. Ideally I want the configuration to be locked when doing a full
>> replace so that no-one else makes a configuration change at the same
>> time AND automatic rollback MUST also be active so that we have
>> guaranteed configuration state on the device (either the merge or
>> replace operation completed 100% without issue or we are 100% as
>> before the operation started). Otherwise automation is dead at the
>> door step.
>
> Call me naive, but I assume that when it boots and sees
> startup-config, it runs the startup-config through some parser
> function, which then outputs the actual config it'll load. If so, then
> they really should run the B/candidate config through same parser
> function, then try to do the replace.
> But I have no real information how Cisco does any of this.

TAC have confirmed there is "a" parser but how/what/why is all a
mystery still (also my TAC engineer is on holiday now, but I will beat
them with a stick when they are back and try to get additional
clarification of what is happening to router as so far they haven't
helped at all...)

>> I've been testing with a CSR1000v 3.16.4 image

> Can you try to recreate on everest or denali easily?

Yeah, switched to 16.3 (denali). Same issue. To break it down;

We are generating a full device config using Jinja2 templates. When I
opened the ticket I was using a quick and dirty script that had extra
white space in the output and the candidate configuration wasn't
perfectly aligned in the usual way that IOS config is indented. For
example, having an interface configuration like the below would mean
the interface was not fully configured after the configuration replace
operation:

interface Gi1
description type:cust link:ucpe_br0 desc:Outside_VLANs
! no ip address
! no vrf forwarding
    no ip redirects
no ip unreachables
no ip proxy-arp
negotiation auto
ntp disable
no keepalive
no mop enabled
   no shutdown
           service instance 504 ethernet
    encapsulation dot1q 504
    rewrite ingress tag pop 1 symmetric
    bridge-domain 504
    exit
            service instance 503 ethernet
    encapsulation dot1q 503
    rewrite ingress tag pop 1 symmetric
    bridge-domain 503
    exit
     exit

I know, when you look at that you can see it is ugly, but if I copy
and paste that in via CLI it works fine. Also if I SCP the
candidate_config.txt to the device and replace the startup config with
that file then reboot the router, it also works fine. So the penny
didn’t drop immediately that the white space and indentation might be
an issue. For some reason the config parser when using the
configuration replace operation can't handle that. I’m pressing TAC to
find out why.

OK, stop being lazy, tidy up the script, now an interface stanza looks
like this:

interface Gi1
 description type:cust link:ucpe_br0 desc:Outside_VLANs
 no ip address
 no ip redirects
 no ip unreachables
 no ip proxy-arp
 negotiation auto
 ntp disable
 no keepalive
 no mop enabled
 no shutdown
 service instance 504 ethernet
  encapsulation dot1q 504
  rewrite ingress tag pop 1 symmetric
  bridge-domain 504
 !
 service instance 503 ethernet
  encapsulation dot1q 503
  rewrite ingress tag pop 1 symmetric
  bridge-domain 503
 !
!

The config above is applied to the device entirely. So now our config
is applying however when we run the configuration replace operation
the router indicates that the operation has failed even though it was
successful (i.e. diff “show run” with candidate_config.txt, all the
lines are there, slightly jumbled order, but they are there). We see
the same syslog messages five times though (it’s not clear but I
believe the order of operations here is that the router tries to apply
the config, decides it has failed, then tries 5 times to roll it back,
it then decides it has failed to roll back 5 times in a row so it just
gives up). I have pasted a partial output below for brevity, you get
the idea:

% The key modulus size is 4096 bits

% Generating 4096 bit RSA keys, keys will be non-exportable...

*May  3 14:33:25.958: %SYS-5-CONFIG_R: Config Replace is Done

*May  3 2017 15:33:26.821 BST: %BDI_IF-5-CREATE_DELETE: Interface
BDI506 is created

*May  3 2017 15:33:26.913 BST: %LINK-3-UPDOWN: Interface BDI506,
changed state to down



% You already have RSA keys defined named vm-vrtr-002.

% They will be replaced.



% The key modulus size is 4096 bits

% Generating 4096 bit RSA keys, keys will be non-exportable...

*May  3 2017 15:33:31.321 BST: %BDI_IF-5-CREATE_DELETE: Interface
BDI506 is deleted

*May  3 2017 15:33:31.633 BST: %LINK-3-UPDOWN: Interface BDI506,
changed state to down

*May  3 2017 15:33:34.686 BST: %LINK-3-UPDOWN: Interface BDI506,
changed state to up

*May  3 2017 15:33:35.686 BST: %LINEPROTO-5-UPDOWN: Line protocol on
Interface BDI506, changed state to up



% You already have RSA keys defined named vm-vrtr-002.

% They will be replaced.



% The key modulus size is 4096 bits

% Generating 4096 bit RSA keys, keys will be non-exportable...

*May  3 2017 15:33:38.124 BST: %BDI_IF-5-CREATE_DELETE: Interface
BDI506 is deleted

*May  3 2017 15:33:38.252 BST: %LINK-5-CHANGED: Interface BDI506,
changed state to administratively down

*May  3 2017 15:33:38.418 BST: %LINK-3-UPDOWN: Interface BDI506,
changed state to down

*May  3 2017 15:33:39.254 BST: %LINEPROTO-5-UPDOWN: Line protocol on
Interface BDI506, changed state to down



% The key modulus size is 4096 bits

% Generating 4096 bit RSA keys, keys will be non-exportable...



…and so on for 5 times….



The final output on the screen is:

“Rollback aborted after 5 passes”



However “show run “ shows me the full config is there. So that’s a
step forwards, all the config is being applied (it seems the config
parser for "configure replace" is very picky!), but the router
indicates it has failed so once my TAC engineer is back from holiday,
is sticking time. Just before they went on holiday they said they had
been working with the BU relating to the config parser and an internal
defect had been opened so a public bug ID maybe be formed, we shall
wait and see, but they didn't say why the internal defect was opened
so that is another mystery too (was it because the parser has leading
white space issues? Or was it because it shows an operation as failed
when it was successfull? Or was it because it can't roll back when an
operation failed?).


Cheers,
James.


More information about the cisco-nsp mailing list