[j-nsp] What is this ethernet switching trace telling us?

Gavin Henry ghenry at suretec.co.uk
Sat Jun 8 03:35:19 EDT 2013


Hi John,

We (SureVoIP) have seen this on some of our hosted SIP servers which
run on Linux with multiple interfaces. This was connected to a Cisco
switch though. If the SBC is on linux then install arpwatch and add
your email to /etc/aliases. We found that the Linux kernel doesn't
send the same arp response out of the same interface. For example, one
interface was a public IP and one was a private IP. The kernel would
send a "I'm on MAC blah" for the private IP out of the public IP port!

arptables is the solution, but in 10 years it's the first time I'd
seen this. Google shows otherwise (me):

http://www.gossamer-threads.com/lists/drbd/users/24805
http://serverfault.com/questions/58146/what-can-cause-two-network-interfaces-on-the-same-machine-to-flip-flop-their-ip

arpwatch will report "flip flop" in the logs.

If you're not on Linux then I'm not sure :-(

Thanks.


On 8 June 2013 01:49, John Neiberger <jneiberger at gmail.com> wrote:
> Here is another example of the same type of thing. In this case, a MAC
> address appears to be jumping from one four-port card to another on the same
> switch. Port 5 is connected to one NIC, while port 8 is on another four-port
> NIC and should never, ever use the MAC address we're learning on port 5. Do
> these logs really indicate that the MAC is being learned on those
> interfaces, or is it cryptically trying to tell us something else? I don't
> want to assume.
>
> Jun  7 23:21:15.686871 Attempt to add vlan sbc-core mac 00:08:25:fa:3c:91,
> ifname ge-0/0/8.0, pnac_status 0, 0
>
> Jun  7 23:21:15.686981 vlan sbc-core mac 00:08:25:fa:3c:91 (tag 40), iif =
> ge-0/0/8.0: present in FDB
>
> Jun  7 23:21:15.687048 (3, 00:08:25:fa:3c:91) next-hop index change [1330 ->
> 1329]
>
> Jun  7 23:21:15.687172 Attempt to add vlan sbc-core mac 00:08:25:fa:3c:91,
> ifname ge-0/0/5.0, pnac_status 0, 0
>
> Jun  7 23:21:15.687267 vlan sbc-core mac 00:08:25:fa:3c:91 (tag 40), iif =
> ge-0/0/5.0: present in FDB
>
> Jun  7 23:21:15.687501 (3, 00:08:25:fa:3c:91) next-hop index change [1329 ->
> 1330]
>
> Jun  7 23:21:15.687672 KRT enqueue FDB (3, 00:08:25:fa:3c:91) nh-index 1330
>
> Jun  7 23:21:15.687732 l3nh_fdb_notify: FDB CHANGE vlan <sbc-core> mac
> 00:08:25:fa:3c:91
>
> Jun  7 23:21:49.269317 Attempt to add vlan sbc-core mac 00:08:25:fa:3c:91,
> ifname ge-0/0/5.0, pnac_status 0, 0
>
> Jun  7 23:21:49.269427 vlan sbc-core mac 00:08:25:fa:3c:91 (tag 40), iif =
> ge-0/0/5.0: present in FDB
>
> Jun  7 23:21:49.269583 KRT enqueue FDB (3, 00:08:25:fa:3c:91) nh-index 1330
>
> Jun  7 23:21:49.269646 krt_dequeue: type FDB op change 3, 00:08:25:fa:3c:91
> Direct nh 1330
>
> Jun  7 23:21:49.270539 l3nh_fdb_notify: FDB CHANGE vlan <sbc-core> mac
> 00:08:25:fa:3c:91
>
> Jun  7 23:37:09.776588 Attempt to add vlan sbc-core mac 00:08:25:fa:3c:91,
> ifname ge-0/0/8.0, pnac_status 0, 0
>
> Jun  7 23:37:09.776953 vlan sbc-core mac 00:08:25:fa:3c:91 (tag 40), iif =
> ge-0/0/8.0: present in FDB
>
> Jun  7 23:37:09.777140 (3, 00:08:25:fa:3c:91) next-hop index change [1330 ->
> 1329]
>
>
>
> On Fri, Jun 7, 2013 at 6:30 PM, John Neiberger <jneiberger at gmail.com> wrote:
>>
>> I just checked and we do not have spanning tree enabled on this switch or
>> its partner. We have two switches with a 10-gig link between them. Each
>> switch is connected to a different upstream router. The device in question
>> is a session border controller for VoIP. It is a chassis with multiple
>> four-port NICs that are in redundant pairs. Two four-port cards connect to
>> one switch and the other two connect to the second switch. The cards use
>> virtual IPs and MAC addresses. If a failover is required, an entire
>> four-port card fails to the card connected to the other switch. At that
>> point the NIC is supposed to send gratuitous ARPs to repopulate the MAC
>> address table with the correct location. Based on the ethernet switching
>> trace logs, it looks to us like the virtual MAC addresses on those NICs are
>> regularly jumping around between interfaces, which is definitely not
>> supposed to be happening. We're now stuck in a battle between Juniper and
>> the SBC vendor over whose equipment is misbehaving. I wanted to make sure we
>> were correctly interpreting those trace logs. I'm also still curious about
>> why the MAC learning log is not updating. There hasn't been a new entry in
>> the log in nearly two months, which just can't be true.
>>
>> Thanks!
>> John
>>
>>
>> On Fri, Jun 7, 2013 at 5:05 PM, Harold 'Buz' Dale <buz.dale at usg.edu>
>> wrote:
>>>
>>> Are you running spanning tree ?
>>>
>>> Sent from my iPhone
>>>
>>> On Jun 7, 2013, at 18:37, "Gavin Henry" <ghenry at suretec.co.uk> wrote:
>>>
>>> > Is this a server connected via two ports?
>>> >
>>> > Sent from my iPad 2
>>> >
>>> > On 7 Jun 2013, at 23:12, John Neiberger <jneiberger at gmail.com> wrote:
>>> >
>>> >> Also, another interesting thing about this is that the output of "show
>>> >> ethernet mac-learning-log" stops at April 13th. I have no idea why. If
>>> >> a
>>> >> MAC address were jumping around, we'd see it in the MAC learning
>>> >> log...if
>>> >> it were up to date. What would cause a Juniper switch to stop logging
>>> >> to
>>> >> the MAC learning log?
>>> >>
>>> >> By the way, this is an EX4200 running 10.4R6.5.
>>> >>
>>> >>
>>> >> On Fri, Jun 7, 2013 at 4:07 PM, John Neiberger <jneiberger at gmail.com>
>>> >> wrote:
>>> >>
>>> >>> We're trying to troubleshoot an odd issue and this log output makes
>>> >>> it
>>> >>> appear that a MAC address is flipping between interfaces. There are
>>> >>> other
>>> >>> interfaces involved later in the logs. I'm starting to think this
>>> >>> isn't
>>> >>> telling us what we think it's telling us. Does this indicate that the
>>> >>> MAC
>>> >>> address really is being learned from multiple interfaces? The
>>> >>> confusing
>>> >>> thing about the logs is the mention of l3nh. Is that layer three next
>>> >>> hop?
>>> >>> If so, why are we seeing that in ethernet-level trace options and
>>> >>> what is
>>> >>> the significance?
>>> >>>
>>> >>> I'm a little confused. Here is an example:
>>> >>>
>>> >>> Jun  4 13:07:22.953201 Attempt to add vlan sbc-core mac
>>> >>> 00:08:25:fa:3c:82,
>>> >>> ifname ge-0/0/6.0, pnac_status 0, 0
>>> >>> Jun  4 13:07:22.953312 vlan sbc-core mac 00:08:25:fa:3c:82 (tag 40),
>>> >>> iif =
>>> >>> ge-0/0/6.0: present in FDB
>>> >>> Jun  4 13:07:22.953374 (3, 00:08:25:fa:3c:82) next-hop index change
>>> >>> [1344
>>> >>> -> 1328]
>>> >>> Jun  4 13:07:22.953562 KRT enqueue FDB (3, 00:08:25:fa:3c:82)
>>> >>> nh-index 1328
>>> >>> Jun  4 13:07:22.953712 krt_dequeue: type FDB op change 3,
>>> >>> 00:08:25:fa:3c:82 Direct nh 1328
>>> >>> Jun  4 13:07:22.954372 l3nh_fdb_notify: FDB CHANGE vlan <sbc-core>
>>> >>> mac
>>> >>> 00:08:25:fa:3c:82
>>> >>> Jun  4 13:21:18.041160 Attempt to add vlan sbc-core mac
>>> >>> 00:08:25:fa:3c:82,
>>> >>> ifname ge-0/0/5.0, pnac_status 0, 0
>>> >>> Jun  4 13:21:18.041271 vlan sbc-core mac 00:08:25:fa:3c:82 (tag 40),
>>> >>> iif =
>>> >>> ge-0/0/5.0: present in FDB
>>> >>> Jun  4 13:21:18.041332 (3, 00:08:25:fa:3c:82) next-hop index change
>>> >>> [1328
>>> >>> -> 1327]
>>> >>> Jun  4 13:21:18.041670 Attempt to add vlan sbc-core mac
>>> >>> 00:08:25:fa:3c:82,
>>> >>> ifname ge-0/0/6.0, pnac_status 0, 0
>>> >>> Jun  4 13:21:18.041767 vlan sbc-core mac 00:08:25:fa:3c:82 (tag 40),
>>> >>> iif =
>>> >>> ge-0/0/6.0: present in FDB
>>> >>> Jun  4 13:21:18.041807 (3, 00:08:25:fa:3c:82) next-hop index change
>>> >>> [1327
>>> >>> -> 1328]
>>> >>> Jun  4 13:21:18.041962 KRT enqueue FDB (3, 00:08:25:fa:3c:82)
>>> >>> nh-index 1328
>>> >>>
>>> >>> It looks to me like the MAC address is jumping around. What do you
>>> >>> think?
>>> >>>
>>> >>> Thanks,
>>> >>> John
>>> >> _______________________________________________
>>> >> juniper-nsp mailing list juniper-nsp at puck.nether.net
>>> >> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>> > _______________________________________________
>>> > juniper-nsp mailing list juniper-nsp at puck.nether.net
>>> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
>>
>



-- 
Kind Regards,

Gavin Henry.
Managing Director.

T +44 (0) 1224 279484
M +44 (0) 7930 323266
F +44 (0) 1224 824887
E ghenry at suretec.co.uk

Open Source. Open Solutions(tm).

http://www.suretecsystems.com/

Suretec Systems is a limited company registered in Scotland. Registered
number: SC258005. Registered office: 24 Cormack Park, Rothienorman, Inverurie,
Aberdeenshire, AB51 8GL.

Subject to disclaimer at http://www.suretecgroup.com/disclaimer.html

Do you know we have our own VoIP provider called SureVoIP? See
http://www.surevoip.co.uk

Did you see our API? http://www.surevoip.co.uk/api


More information about the juniper-nsp mailing list