[c-nsp] Lan ETS - 6506 -

Laurent Dumont ldumont at coldnorthadmin.com
Mon Feb 13 21:47:25 EST 2017


Hi everyone,

I was hoping someone could help shed some light on an issue we 
experienced during one of our event in Montreal. I'm part of the team 
running the network at Lan ETS. We organize one of the biggest LAN event 
in North America with around 1500 players attending the event over the 
weekend.

We run 6506 as our core network device with SUP-720 (need to check if 
they are XL). We assign public IPs to 95% percent of the devices in the 
network so most of the load is purely routing packets to our upstream 
and back. This year, we made the decision to NAT slightly under 300 
devices on a /24 using the 6506. During the event, the CPU load on the 
6509 was extremely suspicious. We had extremely high spikes of 80%-90% 
IP INPUT process during periods of 3-10 minutes. Both times this 
happened, it resolved itself without us changing anything to our config. 
That said, both the in/out traffic and PPS was fairly low - between 2-4 
Gbit usually with very quick spikes to 7-8 Gbit (associated with a 
normal CPU increase)/ 200k PPS ish. Our first guess was that NAT could 
be the cause of the traffic spikes, but the load during the rest of the 
event was very even, and that is before we reduced the NATed subnets.

Our next guess was a very specific type of packet that was directly 
punted to the CPU without CEF optimization. During our period of high 
CPU use, no routed packets were dropped and the only direct effect was 
random drops of NATed sessions. Can anyone recommend a way to debug such 
an issue? For our next events, we are definitely looking into 
network/SPAN in order to get actual data on the type of data hitting our 
equipment.

This is one case of learning while everything is on fire around you!

Thanks

Laurent



More information about the cisco-nsp mailing list