[j-nsp] Interesting EX4200 gotcha and "resolution"
Graham Brown
juniper-nsp at grahambrown.info
Sun Sep 25 16:55:13 EDT 2011
Excellent post Jeff, thanks for sharing.
On Sun, Sep 25, 2011 at 8:36 PM, Jeff Wheeler <jsw at inconcepts.biz> wrote:
> A colleague pointed out recently that some of the gotchas and fixes we
> run into are interesting to others, so in that spirit, I have another
> one to share with you today.
>
> In this case, a malfunctioning EX4200 (10.4R4.5) appears to have valid
> ARP entries for a few boxes, but when you try to ping them, etc. the
> boxes do not get any traffic. In fact, they receive nothing from the
> switch except ARP who-has. They respond, and upon clearing the ARP
> entries from the EX4200, that process repeats.
>
> Upon investigating the PFE data, I found that the halp-nh arp-table
> was missing these ARP entries, even though they were present in the
> Junos CLI and indeed the correct MAC address is referenced in the PFE
> route table. See below:
>
> PFEM0(vty)# show route ip prefix 192.0.2.39 detail
>
> IPv4 Route Table 0, default.0, 0x0:
> Destination NH IP Addr Type NH ID Interface
> ------------ --------------- -------- ----- ---------
> 192.0.2.39 192.0.2.39 Unicast 2933 RT-ifl
> 197 vlan.1122 ifl 197
>
> RT flags: 0x0000, Ignore: 0x00000000, COS index: 0
> DCU id: 0, SCU id: 0, RPF ifl list id: 0
>
>
>
> PFEM0(vty)# show nh id 2933 detail
> ID Type Interface Next Hop Addr Protocol
> Encap MTU Flags PFE internal Flags
> ----- -------- ------------- --------------- ----------
> ------------ ---- ---------- --------------------
> 2933 Unicast vlan.1122 192.0.2.39 IPv4
> Ethernet 0 0x00000000 0x00000000
>
> Flags: 2 nh_idx: 3
> CMD: Route Arp Idx: 1341
> MTU Idx: 2 Num Tags: 0
> Upd Cnt: 1 Tun Strt: False
> Chain_nh 3484:
> Hw install: 1
> Mac: 00 0e 0c a2 2d dc
>
>
>
> PFEM0(vty)# show halp-nh arp-table
> Device: 0
> ...hundreds and hundreds of lines...
> ArpEntry Idx 1340 : 00:15:17:6b:a9:7c
> ArpEntry Idx 1342 : 00:25:90:2c:41:e5
> ...hundreds more, but where is Idx 1341?!
>
>
> Our "fix" is to remove 192.0.2.1/27 from the vlan.1122 configuration,
> commit, and then rollback. This is obviously not good. I would like
> to have tried installing a different ARP entry (by configuring this IP
> address on another machine) but I have not had an opportunity to test
> this yet.
>
> The reason this is happening is the ASIC vendor format ARP table in
> the PFE memory is abstracted from the "Juniper ARP table," as I
> understand. It appears that simply refreshing the Juniper ARP table
> with an identical entry does not cause a missing entry to be put into
> the forwarding table.
>
> I would love to be able to reproduce this, but with hundreds to a few
> thousand machines each on many EX4200 stacks, it happens very rarely.
> I only mention it because "clear arp" from the CLI does not work, so
> this problem gets escalated until it reaches someone brave enough to
> temporarily break some unaffected boxes to fix a broken one. It would
> be nice, though, if "clear arp" actually worked right.
>
> If you encounter this problem and do something different, I would be
> very interested to hear from you!
> --
> Jeff S Wheeler <jsw at inconcepts.biz>
> Sr Network Operator / Innovative Network Concepts
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
More information about the juniper-nsp
mailing list