[j-nsp] Interesting EX4200 gotcha and "resolution"

Graham Brown juniper-nsp at grahambrown.info
Sun Sep 25 16:55:13 EDT 2011


Excellent post Jeff, thanks for sharing.

On Sun, Sep 25, 2011 at 8:36 PM, Jeff Wheeler <jsw at inconcepts.biz> wrote:

> A colleague pointed out recently that some of the gotchas and fixes we
> run into are interesting to others, so in that spirit, I have another
> one to share with you today.
>
> In this case, a malfunctioning EX4200 (10.4R4.5) appears to have valid
> ARP entries for a few boxes, but when you try to ping them, etc. the
> boxes do not get any traffic.  In fact, they receive nothing from the
> switch except ARP who-has.  They respond, and upon clearing the ARP
> entries from the EX4200, that process repeats.
>
> Upon investigating the PFE data, I found that the halp-nh arp-table
> was missing these ARP entries, even though they were present in the
> Junos CLI and indeed the correct MAC address is referenced in the PFE
> route table.  See below:
>
> PFEM0(vty)# show route ip prefix 192.0.2.39 detail
>
> IPv4 Route Table 0, default.0, 0x0:
> Destination   NH IP Addr      Type     NH ID Interface
> ------------  --------------- -------- ----- ---------
> 192.0.2.39                       192.0.2.39      Unicast  2933 RT-ifl
> 197 vlan.1122 ifl 197
>
>  RT flags: 0x0000, Ignore: 0x00000000, COS index: 0
>  DCU id: 0, SCU id: 0,  RPF ifl list id: 0
>
>
>
> PFEM0(vty)# show nh id 2933 detail
>   ID      Type      Interface    Next Hop Addr    Protocol
> Encap     MTU       Flags  PFE internal Flags
> -----  --------  -------------  ---------------  ----------
> ------------  ----  ----------  --------------------
>  2933   Unicast  vlan.1122      192.0.2.39            IPv4
> Ethernet     0  0x00000000 0x00000000
>
>   Flags:             2       nh_idx:          3
>   CMD:           Route       Arp Idx:      1341
>   MTU Idx:           2       Num Tags:        0
>   Upd Cnt:           1       Tun Strt:    False
>   Chain_nh   3484:
>   Hw install:        1
>   Mac:         00 0e 0c a2 2d dc
>
>
>
> PFEM0(vty)# show halp-nh arp-table
> Device: 0
> ...hundreds and hundreds of lines...
>  ArpEntry Idx 1340 : 00:15:17:6b:a9:7c
>  ArpEntry Idx 1342 : 00:25:90:2c:41:e5
> ...hundreds more, but where is Idx 1341?!
>
>
> Our "fix" is to remove 192.0.2.1/27 from the vlan.1122 configuration,
> commit, and then rollback.  This is obviously not good.  I would like
> to have tried installing a different ARP entry (by configuring this IP
> address on another machine) but I have not had an opportunity to test
> this yet.
>
> The reason this is happening is the ASIC vendor format ARP table in
> the PFE memory is abstracted from the "Juniper ARP table," as I
> understand.  It appears that simply refreshing the Juniper ARP table
> with an identical entry does not cause a missing entry to be put into
> the forwarding table.
>
> I would love to be able to reproduce this, but with hundreds to a few
> thousand machines each on many EX4200 stacks, it happens very rarely.
> I only mention it because "clear arp" from the CLI does not work, so
> this problem gets escalated until it reaches someone brave enough to
> temporarily break some unaffected boxes to fix a broken one.  It would
> be nice, though, if "clear arp" actually worked right.
>
> If you encounter this problem and do something different, I would be
> very interested to hear from you!
> --
> Jeff S Wheeler <jsw at inconcepts.biz>
> Sr Network Operator  /  Innovative Network Concepts
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


More information about the juniper-nsp mailing list