[j-nsp] What is this ethernet switching trace telling us?

John Neiberger jneiberger at gmail.com
Sun Jun 9 11:59:41 EDT 2013


There are four 4-port NICs (called PHYs) in these chassis. PHY A and B are
a redundant pair connected to two different switches, as are PHY
C and D. If anything on PHY A fails, it should fail over all four
interfaces over to PHY B on the other switch. We're seeing the same MAC
addresses appearing on multiple interfaces on the primary switch, sometimes
even on ports coming from a different PHY! That shouldn't be happening.

We have several of these throughout our network and we're only seeing this
problem in a couple of cases. The rest work just fine. Most of the SBCs are
connected to two EX4200s setup as a virtual chassis. In this case, the two
switches are not virtual chassis. I don't think that has any bearing
whatsoever since it has nothing to do with the MAC addresses arriving on
multiple interfaces. Acme wants us to reconfigure this pair of switches as
a VC, but I suspect that would just delay us from finding the real cause.
I've asked them what the underlying OS is for these SBCs, but I haven't
received a reply yet.


On Sun, Jun 9, 2013 at 7:43 AM, Serge Vautour <sergevautour at yahoo.ca> wrote:

>
> Our VoIP guys have these ACME SBCs. I do know they run in Active/Passive
> mode with a VRRP like protocol. This means a common MAC that floats between
> both boxes depending on which is primary. Maybe both boxes think they're
> primary and keep advertising the same MAC? This would cause a constant move
> of the same MAC.
>
> Serge
>
>   ------------------------------
>  *From:* John Neiberger <jneiberger at gmail.com>
> *To:* Georgios Vlachos <g.vlachos at kestrel-is.gr>
> *Cc:* juniper-nsp at puck.nether.net
> *Sent:* Saturday, June 8, 2013 9:32:19 PM
>
> *Subject:* Re: [j-nsp] What is this ethernet switching trace telling us?
>
> We added a firewall filter to count incoming frames with the wrong MAC
> addresses and we are seeing incrementing counters. Between that and the
> trace logs it sends clear that the addresses are jumping around. I'm not
> sure what OS the Acme SBC is running, but hopefully there is a tunable knob
> somewhere.
> On Jun 8, 2013 7:50 AM, "Georgios Vlachos" <g.vlachos at kestrel-is.gr>
> wrote:
>
> > You can use the MAC Move Limiting feature with action log to see what is
> > happening,
> >
> > Or just use a interface-specific FF with "from" the source MAC and action
> > count (with implicit accept)...
> >
> >
> >
> > -----Original Message-----
> > From: juniper-nsp [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf
> > Of
> > John Neiberger
> > Sent: Saturday, June 08, 2013 4:16 PM
> > To: Gavin Henry
> > Cc: juniper-nsp at puck.nether.net
> > Subject: Re: [j-nsp] What is this ethernet switching trace telling us?
> >
> > This is an Acme Packet chassis. I really have no idea what it has running
> > on it, but I'll find out from our voice team.
> >
> > Thanks!
> > On Jun 8, 2013 1:35 AM, "Gavin Henry" <ghenry at suretec.co.uk> wrote:
> >
> > > Hi John,
> > >
> > > We (SureVoIP) have seen this on some of our hosted SIP servers which
> > > run on Linux with multiple interfaces. This was connected to a Cisco
> > > switch though. If the SBC is on linux then install arpwatch and add
> > > your email to /etc/aliases. We found that the Linux kernel doesn't
> > > send the same arp response out of the same interface. For example, one
> > > interface was a public IP and one was a private IP. The kernel would
> > > send a "I'm on MAC blah" for the private IP out of the public IP port!
> > >
> > > arptables is the solution, but in 10 years it's the first time I'd
> > > seen this. Google shows otherwise (me):
> > >
> > > http://www.gossamer-threads.com/lists/drbd/users/24805
> > >
> > >
> >
> >
> http://serverfault.com/questions/58146/what-can-cause-two-network-interfaces
> > -on-the-same-machine-to-flip-flop-their-ip
> > >
> > > arpwatch will report "flip flop" in the logs.
> > >
> > > If you're not on Linux then I'm not sure :-(
> > >
> > > Thanks.
> > >
> > >
> > > On 8 June 2013 01:49, John Neiberger <jneiberger at gmail.com> wrote:
> > > > Here is another example of the same type of thing. In this case, a
> MAC
> > > > address appears to be jumping from one four-port card to another on
> the
> > > same
> > > > switch. Port 5 is connected to one NIC, while port 8 is on another
> > > four-port
> > > > NIC and should never, ever use the MAC address we're learning on port
> > 5.
> > > Do
> > > > these logs really indicate that the MAC is being learned on those
> > > > interfaces, or is it cryptically trying to tell us something else? I
> > > don't
> > > > want to assume.
> > > >
> > > > Jun  7 23:21:15.686871 Attempt to add vlan sbc-core mac
> > > 00:08:25:fa:3c:91,
> > > > ifname ge-0/0/8.0, pnac_status 0, 0
> > > >
> > > > Jun  7 23:21:15.686981 vlan sbc-core mac 00:08:25:fa:3c:91 (tag 40),
> > iif
> > > =
> > > > ge-0/0/8.0: present in FDB
> > > >
> > > > Jun  7 23:21:15.687048 (3, 00:08:25:fa:3c:91) next-hop index change
> > > [1330 ->
> > > > 1329]
> > > >
> > > > Jun  7 23:21:15.687172 Attempt to add vlan sbc-core mac
> > > 00:08:25:fa:3c:91,
> > > > ifname ge-0/0/5.0, pnac_status 0, 0
> > > >
> > > > Jun  7 23:21:15.687267 vlan sbc-core mac 00:08:25:fa:3c:91 (tag 40),
> > iif
> > > =
> > > > ge-0/0/5.0: present in FDB
> > > >
> > > > Jun  7 23:21:15.687501 (3, 00:08:25:fa:3c:91) next-hop index change
> > > [1329 ->
> > > > 1330]
> > > >
> > > > Jun  7 23:21:15.687672 KRT enqueue FDB (3, 00:08:25:fa:3c:91)
> nh-index
> > > 1330
> > > >
> > > > Jun  7 23:21:15.687732 l3nh_fdb_notify: FDB CHANGE vlan <sbc-core>
> mac
> > > > 00:08:25:fa:3c:91
> > > >
> > > > Jun  7 23:21:49.269317 Attempt to add vlan sbc-core mac
> > > 00:08:25:fa:3c:91,
> > > > ifname ge-0/0/5.0, pnac_status 0, 0
> > > >
> > > > Jun  7 23:21:49.269427 vlan sbc-core mac 00:08:25:fa:3c:91 (tag 40),
> > iif
> > > =
> > > > ge-0/0/5.0: present in FDB
> > > >
> > > > Jun  7 23:21:49.269583 KRT enqueue FDB (3, 00:08:25:fa:3c:91)
> nh-index
> > > 1330
> > > >
> > > > Jun  7 23:21:49.269646 krt_dequeue: type FDB op change 3,
> > > 00:08:25:fa:3c:91
> > > > Direct nh 1330
> > > >
> > > > Jun  7 23:21:49.270539 l3nh_fdb_notify: FDB CHANGE vlan <sbc-core>
> mac
> > > > 00:08:25:fa:3c:91
> > > >
> > > > Jun  7 23:37:09.776588 Attempt to add vlan sbc-core mac
> > > 00:08:25:fa:3c:91,
> > > > ifname ge-0/0/8.0, pnac_status 0, 0
> > > >
> > > > Jun  7 23:37:09.776953 vlan sbc-core mac 00:08:25:fa:3c:91 (tag 40),
> > iif
> > > =
> > > > ge-0/0/8.0: present in FDB
> > > >
> > > > Jun  7 23:37:09.777140 (3, 00:08:25:fa:3c:91) next-hop index change
> > > [1330 ->
> > > > 1329]
> > > >
> > > >
> > > >
> > > > On Fri, Jun 7, 2013 at 6:30 PM, John Neiberger <jneiberger at gmail.com
> >
> > > wrote:
> > > >>
> > > >> I just checked and we do not have spanning tree enabled on this
> switch
> > > or
> > > >> its partner. We have two switches with a 10-gig link between them.
> > Each
> > > >> switch is connected to a different upstream router. The device in
> > > question
> > > >> is a session border controller for VoIP. It is a chassis with
> multiple
> > > >> four-port NICs that are in redundant pairs. Two four-port cards
> > connect
> > > to
> > > >> one switch and the other two connect to the second switch. The cards
> > use
> > > >> virtual IPs and MAC addresses. If a failover is required, an entire
> > > >> four-port card fails to the card connected to the other switch. At
> > that
> > > >> point the NIC is supposed to send gratuitous ARPs to repopulate the
> > MAC
> > > >> address table with the correct location. Based on the ethernet
> > switching
> > > >> trace logs, it looks to us like the virtual MAC addresses on those
> > NICs
> > > are
> > > >> regularly jumping around between interfaces, which is definitely not
> > > >> supposed to be happening. We're now stuck in a battle between
> Juniper
> > > and
> > > >> the SBC vendor over whose equipment is misbehaving. I wanted to make
> > > sure we
> > > >> were correctly interpreting those trace logs. I'm also still curious
> > > about
> > > >> why the MAC learning log is not updating. There hasn't been a new
> > entry
> > > in
> > > >> the log in nearly two months, which just can't be true.
> > > >>
> > > >> Thanks!
> > > >> John
> > > >>
> > > >>
> > > >> On Fri, Jun 7, 2013 at 5:05 PM, Harold 'Buz' Dale <buz.dale at usg.edu
> >
> > > >> wrote:
> > > >>>
> > > >>> Are you running spanning tree ?
> > > >>>
> > > >>> Sent from my iPhone
> > > >>>
> > > >>> On Jun 7, 2013, at 18:37, "Gavin Henry" <ghenry at suretec.co.uk>
> > wrote:
> > > >>>
> > > >>> > Is this a server connected via two ports?
> > > >>> >
> > > >>> > Sent from my iPad 2
> > > >>> >
> > > >>> > On 7 Jun 2013, at 23:12, John Neiberger <jneiberger at gmail.com>
> > > wrote:
> > > >>> >
> > > >>> >> Also, another interesting thing about this is that the output of
> > > "show
> > > >>> >> ethernet mac-learning-log" stops at April 13th. I have no idea
> > why.
> > > If
> > > >>> >> a
> > > >>> >> MAC address were jumping around, we'd see it in the MAC learning
> > > >>> >> log...if
> > > >>> >> it were up to date. What would cause a Juniper switch to stop
> > > logging
> > > >>> >> to
> > > >>> >> the MAC learning log?
> > > >>> >>
> > > >>> >> By the way, this is an EX4200 running 10.4R6.5.
> > > >>> >>
> > > >>> >>
> > > >>> >> On Fri, Jun 7, 2013 at 4:07 PM, John Neiberger <
> > > jneiberger at gmail.com>
> > > >>> >> wrote:
> > > >>> >>
> > > >>> >>> We're trying to troubleshoot an odd issue and this log output
> > makes
> > > >>> >>> it
> > > >>> >>> appear that a MAC address is flipping between interfaces. There
> > are
> > > >>> >>> other
> > > >>> >>> interfaces involved later in the logs. I'm starting to think
> this
> > > >>> >>> isn't
> > > >>> >>> telling us what we think it's telling us. Does this indicate
> that
> > > the
> > > >>> >>> MAC
> > > >>> >>> address really is being learned from multiple interfaces? The
> > > >>> >>> confusing
> > > >>> >>> thing about the logs is the mention of l3nh. Is that layer
> three
> > > next
> > > >>> >>> hop?
> > > >>> >>> If so, why are we seeing that in ethernet-level trace options
> and
> > > >>> >>> what is
> > > >>> >>> the significance?
> > > >>> >>>
> > > >>> >>> I'm a little confused. Here is an example:
> > > >>> >>>
> > > >>> >>> Jun  4 13:07:22.953201 Attempt to add vlan sbc-core mac
> > > >>> >>> 00:08:25:fa:3c:82,
> > > >>> >>> ifname ge-0/0/6.0, pnac_status 0, 0
> > > >>> >>> Jun  4 13:07:22.953312 vlan sbc-core mac 00:08:25:fa:3c:82 (tag
> > > 40),
> > > >>> >>> iif =
> > > >>> >>> ge-0/0/6.0: present in FDB
> > > >>> >>> Jun  4 13:07:22.953374 (3, 00:08:25:fa:3c:82) next-hop index
> > change
> > > >>> >>> [1344
> > > >>> >>> -> 1328]
> > > >>> >>> Jun  4 13:07:22.953562 KRT enqueue FDB (3, 00:08:25:fa:3c:82)
> > > >>> >>> nh-index 1328
> > > >>> >>> Jun  4 13:07:22.953712 krt_dequeue: type FDB op change 3,
> > > >>> >>> 00:08:25:fa:3c:82 Direct nh 1328
> > > >>> >>> Jun  4 13:07:22.954372 l3nh_fdb_notify: FDB CHANGE vlan
> > <sbc-core>
> > > >>> >>> mac
> > > >>> >>> 00:08:25:fa:3c:82
> > > >>> >>> Jun  4 13:21:18.041160 Attempt to add vlan sbc-core mac
> > > >>> >>> 00:08:25:fa:3c:82,
> > > >>> >>> ifname ge-0/0/5.0, pnac_status 0, 0
> > > >>> >>> Jun  4 13:21:18.041271 vlan sbc-core mac 00:08:25:fa:3c:82 (tag
> > > 40),
> > > >>> >>> iif =
> > > >>> >>> ge-0/0/5.0: present in FDB
> > > >>> >>> Jun  4 13:21:18.041332 (3, 00:08:25:fa:3c:82) next-hop index
> > change
> > > >>> >>> [1328
> > > >>> >>> -> 1327]
> > > >>> >>> Jun  4 13:21:18.041670 Attempt to add vlan sbc-core mac
> > > >>> >>> 00:08:25:fa:3c:82,
> > > >>> >>> ifname ge-0/0/6.0, pnac_status 0, 0
> > > >>> >>> Jun  4 13:21:18.041767 vlan sbc-core mac 00:08:25:fa:3c:82 (tag
> > > 40),
> > > >>> >>> iif =
> > > >>> >>> ge-0/0/6.0: present in FDB
> > > >>> >>> Jun  4 13:21:18.041807 (3, 00:08:25:fa:3c:82) next-hop index
> > change
> > > >>> >>> [1327
> > > >>> >>> -> 1328]
> > > >>> >>> Jun  4 13:21:18.041962 KRT enqueue FDB (3, 00:08:25:fa:3c:82)
> > > >>> >>> nh-index 1328
> > > >>> >>>
> > > >>> >>> It looks to me like the MAC address is jumping around. What do
> > you
> > > >>> >>> think?
> > > >>> >>>
> > > >>> >>> Thanks,
> > > >>> >>> John
> > > >>> >> _______________________________________________
> > > >>> >> juniper-nsp mailing list juniper-nsp at puck.nether.net
> > > >>> >> https://puck.nether.net/mailman/listinfo/juniper-nsp
> > > >>> > _______________________________________________
> > > >>> > juniper-nsp mailing list juniper-nsp at puck.nether.net
> > > >>> > https://puck.nether.net/mailman/listinfo/juniper-nsp
> > > >>
> > > >>
> > > >
> > >
> > >
> > >
> > > --
> > > Kind Regards,
> > >
> > > Gavin Henry.
> > > Managing Director.
> > >
> > > T +44 (0) 1224 279484
> > > M +44 (0) 7930 323266
> > > F +44 (0) 1224 824887
> > > E ghenry at suretec.co.uk
> > >
> > > Open Source. Open Solutions(tm).
> > >
> > > http://www.suretecsystems.com/
> > >
> > > Suretec Systems is a limited company registered in Scotland. Registered
> > > number: SC258005. Registered office: 24 Cormack Park, Rothienorman,
> > > Inverurie,
> > > Aberdeenshire, AB51 8GL.
> > >
> > > Subject to disclaimer at http://www.suretecgroup.com/disclaimer.html
> > >
> > > Do you know we have our own VoIP provider called SureVoIP? See
> > > http://www.surevoip.co.uk
> > >
> > > Did you see our API? http://www.surevoip.co.uk/api
> > >
> > _______________________________________________
> > juniper-nsp mailing list juniper-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
> >
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
>
>


More information about the juniper-nsp mailing list