[j-nsp] MX80 - Control Plane, Bridge Domains

Tue Oct 14 11:39:57 EDT 2014

We have a particular legacy setup that involves a couple of multi-net
bridge domains on an MX80 with a fairly decent number of networks attached
to multiple interfaces.  Example:

Bridge Domain
===
> show configuration bridge-domains
CUSTOMER {
    description Customer;
    domain-type bridge;
    vlan-id XXX;
    interface ge-1/0/0.0;
    interface ge-1/0/1.0;
    (multiple interfaces omiited, roughly 25 total);
    routing-interface irb.xxx

Interfaces
===
> show configuration interfaces ge-1/0/0
description Customer;
encapsulation ethernet-bridge;
unit 0;

(repeat above for each bridge domain member)

IRB
===
> show configuration interfaces irb.xxx
description Customer;
family inet {
    address 192.168.1.1/24;
    address 192.168.2.1/24;
    address 192.168.3.1/24;
    (additional addresses omitted);

The number of routed networks on the IRB is pretty large, >75.

The problem that we're seeing is when somewhat large-ish amounts of traffic
gets directed to non-existing hosts on these multi-nets (say to
non-existing hosts numbered .100 to .200 in the 192.168.1/24 subnet, and
expand that by all the configured subnets), the control-plane on the MX80
seems to grind down to a halt.  ICMP RTT goes way up which alerts
monitoring services, and we've even seen the box drop IGP/BGP to it's
upstream device.

We've already slapped in some interface specific ARP policers, but ARP
policing won't really do anything for this issue, since the box itself is
sending ARP traffic *out* towards those non-existing hosts, and IIRC ARP
policing is an ingress interface function.  The box has a decent CoPP
filter, but this is data-plane traffic to the directly attached hosts in
the network, not necessarily to the interface itself (though it *is*
resulting in excessive control-plane traffic due to the ARP load, it would
appear) so I'm not sure we can solve this with CoPP.  We've also looked at
sticking in some BUM policers on the bridge domains (bridge-domain
blah/forwarding-options/flood), but that appears to be an ingress function
as well, and we haven't seen that help either.

Simply killing off and re-architect-ing this topology isn't currently an
option, though it's planned at some point.  I'm curious if anyone run into
this type of issue with this type of setup, and how did you solve it?