[j-nsp] Interesting EX4200 gotcha and "resolution"
Jeff Wheeler
jsw at inconcepts.biz
Sun Sep 25 15:36:14 EDT 2011
A colleague pointed out recently that some of the gotchas and fixes we
run into are interesting to others, so in that spirit, I have another
one to share with you today.
In this case, a malfunctioning EX4200 (10.4R4.5) appears to have valid
ARP entries for a few boxes, but when you try to ping them, etc. the
boxes do not get any traffic. In fact, they receive nothing from the
switch except ARP who-has. They respond, and upon clearing the ARP
entries from the EX4200, that process repeats.
Upon investigating the PFE data, I found that the halp-nh arp-table
was missing these ARP entries, even though they were present in the
Junos CLI and indeed the correct MAC address is referenced in the PFE
route table. See below:
PFEM0(vty)# show route ip prefix 192.0.2.39 detail
IPv4 Route Table 0, default.0, 0x0:
Destination NH IP Addr Type NH ID Interface
------------ --------------- -------- ----- ---------
192.0.2.39 192.0.2.39 Unicast 2933 RT-ifl
197 vlan.1122 ifl 197
RT flags: 0x0000, Ignore: 0x00000000, COS index: 0
DCU id: 0, SCU id: 0, RPF ifl list id: 0
PFEM0(vty)# show nh id 2933 detail
ID Type Interface Next Hop Addr Protocol
Encap MTU Flags PFE internal Flags
----- -------- ------------- --------------- ----------
------------ ---- ---------- --------------------
2933 Unicast vlan.1122 192.0.2.39 IPv4
Ethernet 0 0x00000000 0x00000000
Flags: 2 nh_idx: 3
CMD: Route Arp Idx: 1341
MTU Idx: 2 Num Tags: 0
Upd Cnt: 1 Tun Strt: False
Chain_nh 3484:
Hw install: 1
Mac: 00 0e 0c a2 2d dc
PFEM0(vty)# show halp-nh arp-table
Device: 0
...hundreds and hundreds of lines...
ArpEntry Idx 1340 : 00:15:17:6b:a9:7c
ArpEntry Idx 1342 : 00:25:90:2c:41:e5
...hundreds more, but where is Idx 1341?!
Our "fix" is to remove 192.0.2.1/27 from the vlan.1122 configuration,
commit, and then rollback. This is obviously not good. I would like
to have tried installing a different ARP entry (by configuring this IP
address on another machine) but I have not had an opportunity to test
this yet.
The reason this is happening is the ASIC vendor format ARP table in
the PFE memory is abstracted from the "Juniper ARP table," as I
understand. It appears that simply refreshing the Juniper ARP table
with an identical entry does not cause a missing entry to be put into
the forwarding table.
I would love to be able to reproduce this, but with hundreds to a few
thousand machines each on many EX4200 stacks, it happens very rarely.
I only mention it because "clear arp" from the CLI does not work, so
this problem gets escalated until it reaches someone brave enough to
temporarily break some unaffected boxes to fix a broken one. It would
be nice, though, if "clear arp" actually worked right.
If you encounter this problem and do something different, I would be
very interested to hear from you!
--
Jeff S Wheeler <jsw at inconcepts.biz>
Sr Network Operator / Innovative Network Concepts
More information about the juniper-nsp
mailing list