[c-nsp] N3K: "VPC peer keep-alive receive has failed"
Manuel Guesdon
ml+cisco-nsp at oxymium.net
Thu Dec 27 05:57:06 EST 2018
Hi,
I have a strange problem with Nexus N3K and QinQ tunnel.
I've configured 2 Nexus 3064 with VPC. It works well for monthes.
Recently I've added a port-channel in dot1q-tunnel mode (the 1st one in this
mode).
Since that I have this message:
"%$ VDC-1 %$ %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 1, VPC peer
keep-alive receive has failed" multiple times a day on the 2 switches.
Details:
BIOS: version 4.1.0
NXOS: version 7.0(3)I6(1)
new interface & port-channel:
interface Ethernet1/35
switchport mode dot1q-tunnel
switchport access vlan 72
spanning-tree port type edge
speed 10000
channel-group 1035
interface port-channel1035
switchport mode dot1q-tunnel
switchport access vlan 72
speed 10000
vpc 1035
A "sh vlan id 72" only report peer-link ports/portchannels and
eth1/35 / po1035.
There's no other end for the moment for this tunnel.
Message appear on various time on each switch (i.e. not at the same time
on both switches) and not the same number of time per day. For exemple
today: 3 on a switch, 6 on the other one.
Switches load seems the same than before this new port channel and there's
no load pic around the message date/time (cacti 5mn measures)
When I shut the port, messages no more appear. When I re-enable it they
come back.
I've tried changing keep alive parameters:
--Keepalive interval : 500 msec
--Keepalive timeout : 10 seconds
--Keepalive hold timeout : 6 seconds
but same thing.
Keepalive link is on a dedicated 2 ports port-channel, IPs are set
directly on the portchannel, in a VRF.
1st switch:
vpc domain 1
role priority 1
peer-keepalive destination 10.0.6.3 source 10.0.6.2 vrf pkal \
interval 500 time out 10 hold-timeout 6
peer-gateway
auto-recovery
ipv6 nd synchronize
ip arp synchronize
2nd switch:
vpc domain 1
role priority 2
peer-keepalive destination 10.0.6.2 source 10.0.6.3 vrf pkal \
interval 500 time out 10 hold-timeout 6
peer-gateway
auto-recovery
ipv6 nd synchronize
ip arp synchronize
There's nothing in logs accept the "receive has failed" message.
There's no error on keep-alive interfaces.
On cacti, I just notice a little drop of outgoing traffic for keep-alive
ports around message apparition so it seems it's not a receive problem but
a transmit problem.
If a configure 2 others N3K with same configuration (Back-to-Back
configuration) for the other end of the tunnel and propagate vlan 72 toward
them, I start having the same message on the other switches, even if the
QinQ port on them is down. If I stop propagating vlan toward them,
message stop on these 2 switches (but continue on the first 2 switches).
Any idea ???
Manuel
--
______________________________________________________________________
Manuel Guesdon - OXYMIUM
More information about the cisco-nsp
mailing list