[c-nsp] N3K: "VPC peer keep-alive receive has failed"

Thu Dec 27 06:18:21 EST 2018

Different VPC domains, yes?

On Thu, Dec 27, 2018, 02:58 Manuel Guesdon <ml+cisco-nsp at oxymium.net wrote:

> Hi,
>
> I have a strange problem with Nexus N3K and QinQ tunnel.
>
>
> I've configured 2 Nexus 3064 with VPC. It works well for monthes.
>
> Recently I've added a port-channel in dot1q-tunnel mode (the 1st one in
> this
> mode).
> Since that I have this message:
> "%$ VDC-1 %$ %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 1, VPC peer
> keep-alive receive has failed" multiple times a day on the 2 switches.
>
> Details:
>   BIOS: version 4.1.0
>   NXOS: version 7.0(3)I6(1)
>
>   new interface & port-channel:
>
>         interface Ethernet1/35
>           switchport mode dot1q-tunnel
>           switchport access vlan 72
>           spanning-tree port type edge
>           speed 10000
>           channel-group 1035
>
>         interface port-channel1035
>           switchport mode dot1q-tunnel
>           switchport access vlan 72
>           speed 10000
>           vpc 1035
>
>    A "sh vlan id 72" only report peer-link ports/portchannels and
>    eth1/35 / po1035.
>
>    There's no other end for the moment for this tunnel.
>
>    Message appear on various time on each switch (i.e. not at the same time
>    on both switches) and not the same number of time per day. For exemple
>    today: 3 on a switch, 6 on the other one.
>
>    Switches load seems the same than before this new port channel and
> there's
>    no load pic around the message date/time (cacti 5mn measures)
>
>    When I shut the port, messages no more appear. When I re-enable it they
>    come back.
>
>    I've tried changing keep alive parameters:
>         --Keepalive interval            : 500 msec
>         --Keepalive timeout             : 10 seconds
>         --Keepalive hold timeout        : 6 seconds
>    but same thing.
>
>    Keepalive link is on a dedicated 2 ports port-channel, IPs are set
>    directly on the portchannel, in a VRF.
>
>    1st switch:
>         vpc domain 1
>           role priority 1
>           peer-keepalive destination 10.0.6.3 source 10.0.6.2 vrf pkal \
>              interval 500 time out 10 hold-timeout 6
>           peer-gateway
>           auto-recovery
>           ipv6 nd synchronize
>           ip arp synchronize
>
>    2nd switch:
>         vpc domain 1
>           role priority 2
>           peer-keepalive destination 10.0.6.2 source 10.0.6.3 vrf pkal \
>              interval 500 time out 10 hold-timeout 6
>           peer-gateway
>           auto-recovery
>           ipv6 nd synchronize
>           ip arp synchronize
>
>
>    There's nothing in logs accept the "receive has failed" message.
>
>    There's no error on keep-alive interfaces.
>
>    On cacti, I just notice a little drop of outgoing traffic for keep-alive
>    ports around message apparition so it seems it's not a receive problem
> but
>    a transmit problem.
>
>    If a configure 2 others N3K with same configuration (Back-to-Back
>    configuration) for the other end of the tunnel and propagate vlan 72
> toward
>    them, I start having the same message on the other switches, even if the
>    QinQ port on them is down. If I stop propagating vlan toward them,
>    message stop on these 2 switches (but continue on the first 2 switches).
>
>    Any idea ???
>
>
>
> Manuel
>
> --
> ______________________________________________________________________
> Manuel Guesdon - OXYMIUM
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>