[c-nsp] OSPF OOB Resync and peer stuck in EXSTART (SeqNumberMismatch)

John Neiberger jneiberger at gmail.com
Fri Feb 8 23:28:39 EST 2013


This is a new one on me. We had a situation where OSPF between a router and
a firewall seemed to go insane and it involves something I've never heard
of before: Out of band Resync. Here are the logs from the beginning of the
event:

Feb  8 23:32:45.777 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from FULL to EXSTART, OOB-Resynchronization
Feb  8 23:32:50.777 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from EXSTART to EXCHANGE, Negotiation Done
Feb  8 23:34:49.830 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from EXCHANGE to DOWN, Neighbor Down: Too many retransmissions
Feb  8 23:35:49.830 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from DOWN to DOWN, Neighbor Down: Ignore timer expired
Feb  8 23:35:50.790 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from DOWN to INIT, Received Hello
Feb  8 23:35:50.790 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from INIT to 2WAY, 2-Way Received
Feb  8 23:35:50.790 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from 2WAY to EXSTART, AdjOK?
Feb  8 23:35:50.810 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from EXSTART to EXSTART, SeqNumberMismatch
Feb  8 23:36:00.814 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from EXSTART to EXSTART, SeqNumberMismatch
Feb  8 23:36:10.814 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from EXSTART to EXSTART, SeqNumberMismatch
Feb  8 23:36:25.814 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from EXSTART to EXSTART, SeqNumberMismatch
Feb  8 23:36:30.818 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7
from EXSTART to EXSTART, SeqNumberMismatch

Something happens to trigger an out-of-band resync and then the neighbor
gets stuck in EXSTART because of a sequence number mismatch. I first
thought we had an MTU mismatch, but the MTUs seem to check out. I read
somewhere that sequence number mismatches can be caused by a software
error. This just isn't something I've run into before.

First, I don't know what OOB Resynchronization is or what all it entails,
so I'm going to read some more about that to find out what triggers it and
what it is supposed to be doing under the hood. Second, why would a peer
that had been working just fine suddenly divebomb into the ground and then
get stuck in exstart?

We ultimately resolved the problem by clearing the OSPF process a couple of
times. Eventually all seemed to clear up and things are working fine. I
suspect a buggy OSPF implementation on the firewall but that's really just
a guess. The router is running 12.2(33)SRE3 code, which I think has a
pretty mature OSPF code.

Any thoughts?

Thanks,
John


More information about the cisco-nsp mailing list