[j-nsp] Strange behavior on directly connected interfaces?

Sat May 20 00:39:27 EDT 2006

 Input is absolutely welcome, late or not.  Thanks for offering your thoughts.

  I don't have administrative control over the "naughty" routers, but
their behavior does seem to correlate with what little I know about
proxy ARP.  They don't specifically support a proxy ARP feature,
however -- they are consumer grade router/firewall devices.  I have
one in a lab now, however, and I am able to reproduce the behavior, so
I should be able to answer questions regarding how they function.
Static mapping of MAC addresses is a possibility and would certainly
mitigate unicast flooding, but the task of implementing this approach
in all appropriate cases would be so arduous so as to be prohibitive.
I've experimented extensively with the adjustment of CAM table
timeouts to correlate with ARP expiration but I have only been
partially successful thus far.

  -FC

On 5/19/06, Harry Reynolds <harry at juniper.net> wrote:
> Butting in late and too lazy to completely digest this thread now.
>
> After a quick glance:
>
> I wonder if you have proxy arp enabled on the "naughty" routers, and if
> so, whether turning it off might help mitigate? You mention they send
> icmp to the target node, and I think proxy ARP would generate ARP so
> perhaps not...
>
> Also, any chance of putting a manual/static mac entry in the switches
> that are flooding?
>
> Regards
>
>
> > -----Original Message-----
> > From: juniper-nsp-bounces at puck.nether.net
> > [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf Of
> > Frances Albemuth
> > Sent: Friday, May 19, 2006 9:23 AM
> > To: Hannes Gredler
> > Cc: juniper-nsp at puck.nether.net
> > Subject: Re: [j-nsp] Strange behavior on directly connected
> > interfaces?
> >
> >   Hi Hannes,
> >
> >  Some of these questions are easier to answer than others.
> >
> > On 5/17/06, Hannes Gredler <hannes at juniper.net> wrote:
> > > frances,
> > >
> > > question 1: what is the MAC adress of the device that
> > >              generates the 10MBit/s worth of traffic.
> > >
> >
> >  I'm not sure there's a single device responsible for this;
> > it sort of looks as if there are at least two culprits.  The
> > two being examined at the moment are both consumer grade
> > routing appliances.  I've since determined more about what is
> > occurring with these culprit devices:
> >
> >   * These routers transmit bogus traffic to destinations
> > located in the same subnet with spoofed source IPs but real
> > source MACs.
> >
> >   * Unicast is flooded because the ARP timeout exceeds the
> > CAM table timeout.  The CAM table never learns the MAC of the
> > "target" device because that device is discarding all of this
> > traffic and not generating any traffic of their own (at the
> > time this occurs -- the behavior is not constant).
> >
> >   * Some of the destination IP's generate no traffic during
> > certain periods of the day.
> >
> >   * The traffic the culprit devices transmit to other devices
> > in the broadcast domain will never meet the requirements of a
> > typical iptables or equivalent implementation so the traffic
> > is quietly dropped.
> >
> >   Net result? Bogus traffic is broadcast all over the place
> > because the switching infrastructure never has a cause to
> > learn the MAC(s) the culprit routers are trying to reach.
> > The culprit routers don't ARP for it, they just remember the
> > destination MAC, and the switches dutifully flood the unicast
> > frames in hopes of identifying the legitimate destination MAC
> > from a hypothetical return stream of traffic.  This never
> > happens, so these bursts of illegitimate traffic occur until
> > someone generates traffic from behind a target device.
> > Then the switching infrastructure learns the MAC and voila,
> > the unicast traffic stops getting flooded all over the place.
> >
> > > question 2: is your juniper router the only exit for your traffic
> > >
> >
> >  Indeed it is.
> >
> > > question 3: could it be that there are hidden backdoor(s)
> > >
> >
> >   As in loop-ish cross-connections "behind" our infrastructure?
> > Possible, but unlikely.
> >
> > > question 4: what traffic is being looped / unicast / broadcast
> > >
> >
> >  What's known about that traffic is largely articulated in
> > the answer to question 1, though if you've got more questions
> > about that traffic specifically I can probably find more answers...
> >
> > > question 5: what is the destination MAC adress of the looped traffic
> > >              (broadcast address / unicast address of the router)
> > >
> >
> >   Also covered largely in the answer to question 1, but to
> > expand on this a bit, there are two distinct behaviors.  I'll
> > call one "weirdness"  and the other "high weirdness".  In the
> > case of high weirdness, here's what happens to the best of my
> > ability to tell:
> >
> >   - Legitimate ICMP is transmitted from outside source and
> > arrives at router.
> >   - Router figures packet should egress to directly connected
> > network via specific logical interface (makes certain filter
> > criterion are good, et al).
> >   - Router finds the destination address in the ARP table and
> > fires off a frame into the "Ethernet cloud" with the
> > destination MAC culled from the ARP table.
> >  - The switches haven't heard a frame from the device
> > corresponding with the destination MAC for a while and have
> > forgotten the destination MAC, so they flood the frames.
> >   - Naughty routers (two of them) hear the frames and get in
> > on the action.  They spoof the source IP of the router (!!)
> > and transmit massive amounts of ICMP to the node which the
> > router is also trying to transmit to.
> >  - None of this traffic warrants a response from the target
> > node or the equipment behind it -- it's a firewall silently
> > discarding unwanted traffic.  So we still don't know how to
> > get to this MAC without flooding.
> >   - Since these naughty routers are spoofing the IP of the
> > real gateway but never ARP'ing for it, lots of routers are
> > receiving flooded unicast frames which they believe they
> > shouldn't be receiving and which they believe came from the
> > real gateway.  They send the gateway ICMP redirect host
> > messages (redirecting it to... itself).
> >   - For each ICMP echo that goes in, dozens of ICMP messages
> > with different purposes come out.
> >   - Some of these packets are getting their TTL decremented
> > (the only thing that slows the situation down) but others are
> > not.  Give it a good thirty seconds and you have a storm.
> > Often if you stop the introduction of ICMP to the network,
> > the TTL will decrement on enough of these packets to calm the
> > situation.
> >
> >   In the case of weirdness, we have a much less severe
> > version of the situation outlined above, wherein lots of
> > routers are getting frames that don't belong to them because
> > of the ARP/CAM synchronization issue, but it doesn't get out
> > of control because the two very naughty nodes don't get
> > involved and the TTL decrements as it should.
> >
> >   The other issue with the TTL exceeded messages coming back
> > on a different logical interface is a little bit of a red
> > herring - still interesting, but the situation above seems to
> > be the elephant in the living room here.
> >
> >  Let me know if you have thoughts, and thank you for your
> > time and consideration.
> >
> >   -FC
> >
> > > /hannes
> > >
> > > Frances Albemuth wrote:
> > > >  The issue can thus far be mitigated (believe it or not) by
> > > > filtering ICMP to and from the "mystery node", or by
> > filtering ICMP
> > > > to and from every network on interface "A".  I'm in possession of
> > > > the MAC of the "mystery node" and I know exactly where it
> > lives on
> > > > the network, but it doesn't seem to correspond oddly with
> > anything
> > > > and I haven't identified anything quirky about the network
> > > > configuration.  What else should I be keeping an eye out for?
> > > >
> > > >  -FC
> > > >
> > > > On 5/17/06, Hannes Gredler <hannes at juniper.net> wrote:
> > > >
> > > >> frances,
> > > >>
> > > >> to mitigate the problem while diagnosing you could configure a
> > > >> firewall that discards traffic from non-local-subnet sources.
> > > >>
> > > >> but lets focus on the loop:
> > > >>    what is the mac-adress of the mystery node ?
> > > >>
> > > >> /hannes
> > > >>
> > > >> Frances Albemuth wrote:
> > > >> >  Hi Hannes,
> > > >> >
> > > >> >  Thanks for your response.  When I'm sniffing on the segment I
> > > >> > see a massive stream of ICMP TTL exceeded messages
> > being returned
> > > >> > by the "mystery node".  The topology is definitely
> > loop-free and
> > > >> > the "loop-ish" behavior that we see only seems to
> > occur when data
> > > >> > is transmitted to unreachable destinations.
> > > >> >
> > > >> > I assume by forwarding loop you mean an Ethernet loop? I would
> > > >> > agree that it behaves this way in some respects.  Of
> > course, if I
> > > >> > had a genuine loop the problems would be more serious
> > and would
> > > >> > occur regardless of routed traffic (the Ethernet
> > topology with a
> > > >> > handful of hosts would cripple itself).
> > > >> >
> > > >> >  Also interesting: the node returning the TTL exceeded "storm"
> > > >> > lives behind a link with a maximum synchronous
> > capacity of 10M.  The "storm"
> > > >> > itself results in 10M of traffic pushing consistently over all
> > > >> > ports where the VLAN lives.  It thusly only cripples other
> > > >> > devices with a 10M maximum synchronous bandwidth.
> > > >> >
> > > >> > Thanks!
> > > >> >
> > > >> >  -FC
> > > >> >
> > > >> > On 5/16/06, Hannes Gredler <hannes at juniper.net> wrote:
> > > >> >
> > > >> >> frances,
> > > >> >>
> > > >> >> looks like you have a forwarding loop in your setup;
> > > >> >>
> > > >> >> for further troubleshooting attach a packet-sniffer to the
> > > >> >> subnet in question and spot for the source MAC-adress that is
> > > >> >> bouncing back your traffic.
> > > >> >>
> > > >> >> /hannes
> > > >> >>
> > > >> >>
> > > >> >> Frances Albemuth wrote:
> > > >> >> >  Hi,
> > > >> >> >
> > > >> >> >  This is my first post to the list and I would like
> > to preface
> > > >> this by
> > > >> >> > stating that I doubt this problem is actually related
> > > >> specifically to
> > > >> >> > Juniper equipment (perhaps a configuration error involving
> > > >> >> > Juniper equipment, however). I'm hoping the issue
> > I'm working
> > > >> >> > on right now might ring bells in the heads of
> > others, and in
> > > >> >> > any case I figure
> > > >> this
> > > >> >> > is as good a place as any to find yourself beaten
> > by the clue stick.
> > > >> >> >
> > > >> >> >   I have a directly connected interface facing a large, flat
> > > >> Ethernet
> > > >> >> > infrastructure.  There are dozens of IP's mapped to the
> > > >> >> > interface in question (this is a legacy aspect of
> > the design,
> > > >> >> > but migration to a more hierarchical infrastructure
> > is a long
> > > >> >> > process).  Periodically, when packets are
> > transmitted with an
> > > >> >> > unreachable destination IP residing on the directly
> > connected
> > > >> >> > interface,  a massive series of ICMP TTL exceeded
> > packets is
> > > >> >> > returned by a different host
> > > >> residing on
> > > >> >> > a different logical interface.  Traceroutes to the
> > unreachable
> > > >> >> > IP similarly show a one-node loop (the same IP
> > responds until
> > > >> >> > the
> > > >> TTL=0).
> > > >> >> >  The node is always the same, but if unmitigated
> > ICMP traffic
> > > >> >> > is permitted to and from addresses on the logical interface,
> > > >> sniffing the
> > > >> >> > wire shows this behavior occurring to and from a number of
> > > >> >> > nodes.  I haven't managed to duplicate the
> > multi-node behavior
> > > >> >> > in a semi-controlled environment.
> > > >> >> >
> > > >> >> >   When sniffing the segment in question, the ICMP is clearly
> > > >> visible,
> > > >> >> > so for whatever reason it is universally broadcast, even
> > > >> >> > though both nodes involved in the ICMP communication are
> > > >> >> > legitimate unicast destinations.  If a ping is left
> > running,
> > > >> >> > these TTL exceeded
> > > >> messages
> > > >> >> > will continue an accelerate ad nauseum until a de
> > facto pseudo
> > > >> >> > broadcast storm occurs, crippling access on every switching
> > > >> >> > node
> > > >> where
> > > >> >> > the VLAN in question is mapped.  Usually (but not
> > always) the
> > > >> >> > anomalies halt when the ping is killed.  The issue
> > is largely
> > > >> >> > mitigated by denying all ICMP to and from addresses
> > mapped to
> > > >> >> > the logical interface.
> > > >> >> >
> > > >> >> >   That's all I'm comfortable asserting about the
> > issue at this time.
> > > >> >> > What I'm really digging for here is an explanation
> > as to why
> > > >> >> > when
> > > >> the
> > > >> >> > Juniper tries to transmit to an unreachable node, it doesn't
> > > >> discover
> > > >> >> > the node is unreachable due to a lack of response
> > from an ARP
> > > >> request
> > > >> >> > and return ICMP unreachables on it's own.  I may have missed
> > > >> something
> > > >> >> > obvious here (I'm sort of hoping so) and would
> > appreciate any
> > > >> >> > suggestions or experience from others.  If I've sent this
> > > >> >> > message
> > > >> to a
> > > >> >> > woefully inappropriate list I would greatly appreciate a
> > > >> suggestion as
> > > >> >> > to a better place to bring my question(s).
> > > >> >> >
> > > >> >> >  Thanks,
> > > >> >> >
> > > >> >> >   -FC
> > > >> >> >
> > > >> >> > _______________________________________________
> > > >> >> > juniper-nsp mailing list juniper-nsp at puck.nether.net
> > > >> >> > http://puck.nether.net/mailman/listinfo/juniper-nsp
> > > >> >>
> > > >>
> > >
> >
> > _______________________________________________
> > juniper-nsp mailing list juniper-nsp at puck.nether.net
> > http://puck.nether.net/mailman/listinfo/juniper-nsp
> >
>