[j-nsp] MC-LAG to EVPN migration triggering filter config bug?
Per Westerlund
p1 at westerlund.se
Mon Feb 17 16:12:37 UTC 2025
Hi everyone!
Possible filter programming bug?
Environment:
MC-LAG pair of qfx5120-48ym, all hosts attached with LAG/LACP. One VLAN
with IRB on switches for routing.
We are trying to turn this into a collapsed core EVPN setup during
runtime. Successfully done on other site, but with almost no traffic.
Problem on this site.
Simplified plan:
- disable all host interfaces on node 2
- convert config on node 2 to EVPN-based instead of MC-LAG-based. Reboot
node 2.
- convert member interface of old ISL/ICL-link to be L2 trunk to carry
cross-switch traffic
- one host at the time, disable link on node 1 and enable on node 2
instead
- once all hosts are moved, tear down temporary L2 trunk, convert node 1
to EVPN and reboot.
- everybody happy
In reality we hit strange behaviour. When troubleshooting we discovered
that we could not ping between node 1 and node 2 via the temporary
L2-trunk. Unique unicast IP-addresses on each IRB, but nothing appeared
in the ARP table, and even the ethernet table was suspiciously empty.
Void of any good ideas (later in a lab setup), we removed the lo0 input
filter protecting the RE. Now it started working the way it should have
been working from the beginning!!
The lo0 input filter is only for family net, so it should not be able to
influence mac-learning or ARP (L2 functions).
Questions:
- Is it possible that interface programming related to MC-LAG ISL/ICL
(no mac-learning, no normal ARP handling) could have been left on the
interface I repurposed to a temporary L2-trunk?
- In case the above is possible: Is there a way to ”flush” the
interface programming in a case like this?
The obvious solution of rebooting the node is unfortunately not
possible, several VMware clusters are running their VSAN backend through
the switch, planned downtime is not really an option.
/Per
More information about the juniper-nsp
mailing list