[f-nsp] Outbound NAT problem

Fri Feb 3 21:48:19 EST 2012

On Feb 3, 2012, at 9:12 PM, George B. wrote:

Hi George;

Thanks for responding!

> There have been some changes in the serveriron NAT code since 10.2.01,
> I would upgrade first.
> 
> Is access-list 199 your ONLY list?

No, it's one of quite a few.  It's the one with the most hosts on it though, probably 30 or so.

> Does it ONLY have host IPs after that first entry?

Yes.

> Is there more than one access-list entry that might be able to apply
> to a troublesome flow?

No

> The problem was that the NAT stuff gets cached in hardware and if a
> new flow arrives that already matches a rule in the hardware, it wont
> go looking up through the access list, it will just use it.  So if
> some other access list got triggered somewhere with a deny rule that
> would also match your traffic, that deny might be getting re-used
> without looking at the actual access list.  What trigger this thing is
> simply the order in which flows arrive on the thing.
> 
> Lets say I have something like:
> 
> access-list 198 permit ip host 10.1.0.1 any
> access-list 198 deny ip host 10.1.0.0 0.0.0.255 any
> 
> access-list 199 permit ip 10.1.0.0 0.0.0.255 any
> 
> If you first get a packet through from host 10.1.0.2 that matches the
> NAT rule associated with rule 199 if you then get a packet in the same
> port from 10.1.0.1, it will ALSO match 199 without the serveriron
> looking at the access lists.  It will see that it has a NAT rule
> loaded in the hardware that allows 10.1.0.0/24 to anywhere and the
> arriving packet matches that rule, it doesn't bother looking in the
> access lists.

That's a fascinating bit of breakage, but I've got a simpler setup than that.

> But if you get a packet through from 10.1.0.1 FIRST and then a packet
> from 10.1.0.2, you're fine and the hosts match the expected NATs.
> 
> Also, depending on the number of connections you are making per
> second, you can't rely on the output of sho ip nat trans to show you
> if it is NATing what you expect it to.  That is because a flow entry
> could change before it has finished printing to the screen and the
> actual source IP and NATed IP shown on one line as being
> representative of a given flow might be information for two different
> flows.  I'm not sure if that got fixed in 10.x or not or if it can be
> fixed.  You really need a sniffer on the traffic to see if it is being
> NATed to what you expect it to be NATed to.
> 
> If your traffic were being NATed to an incorrect IP address, would
> than have caused the symptoms that you show?

Only if it were inconsistent.  It could NAT out as most any address and reach the outside OK.  When this was broken I could get from one of the affected hosts to the 'outside world' maybe one time in three.  Getting out would be ssh or telnet to a web server.  It sometimes worked after a delay of some tens of seconds; I thought at first the remote end was doing a DNS lookup before allowing the connection.  But everything 'outside' that we connected to wasn't so arranged.

Other hosts on the same NAT lists were working fine, as did the affected hosts after a reboot.

Thanks,

--- David

> 
> On Tue, Jan 31, 2012 at 12:02 PM, David Miller <dmiller at metheus.org> wrote:
>> All;
>> 
>> I have a ServerIron 4G SSL with Version 10.2.01oTI4 on it.  Other than very infrequently dropping the ability to ssh to it it's been extremely reliable.
>> 
>> Yesterday something really weird happened to it.  I have a group of web servers (apache/php) the SI load balances amongst.  Each has to establish a tcp connection to an external service to process a certain type of request. Each of the web servers could only make this connection intermittently.
>> 
>> Other hosts on the same network had no problem, even if they used the same outbound NAT rule.
>> 
>> The thing that gives me the willies is that a reboot seems to have 'fixed' it - the whole group could make outbound connections anywhere after the reboot, something that makes me wonder if I should even bother looking at the config.
>> 
>> I'm looking for advice from the experienced pros here.  Should I:
>> 
>> 1) immediately upgrade to the current firmware
>> 2) ignore it, it's never going to happen again
>> 3) replace the hardware
>> 4) move to new load balancers
>> 
>> .... or something else?  They were expensive and have been very stable, I'd like to not get too drastic.
>> 
>> 
>> TIA,
>> 
>> --- David
>> 
>> 
>> The config looks like this:
>> 
>> ip nat inside source list 199 pool default_pool overload
>> ip nat pool default_pool 6.a.b.c.8 a.b.c.8 netmask 255.255.255.255
>> ip nat pool default_pool port-pool-range 2
>> 
>> [...]
>> 
>> access-list 199 deny ip any 192.168.140.0 0.0.0.255
>> access-list 199 permit ip host 192.168.12.11 any
>> access-list 199 permit ip host 192.168.12.12 any
>> (etc, a bunch more use this outgoing address)
>> 
>> server vip-group 1
>> ip-nat-pool default_pool
>> 
>> 
>> 
>> 
>> _______________________________________________
>> foundry-nsp mailing list
>> foundry-nsp at puck.nether.net
>> http://puck.nether.net/mailman/listinfo/foundry-nsp
>