[f-nsp] Outbound NAT problem

George B. georgeb at gmail.com
Fri Feb 3 21:12:41 EST 2012


There have been some changes in the serveriron NAT code since 10.2.01,
I would upgrade first.

Is access-list 199 your ONLY list?
Does it ONLY have host IPs after that first entry?
Is there more than one access-list entry that might be able to apply
to a troublesome flow?

The problem was that the NAT stuff gets cached in hardware and if a
new flow arrives that already matches a rule in the hardware, it wont
go looking up through the access list, it will just use it.  So if
some other access list got triggered somewhere with a deny rule that
would also match your traffic, that deny might be getting re-used
without looking at the actual access list.  What trigger this thing is
simply the order in which flows arrive on the thing.

Lets say I have something like:

access-list 198 permit ip host 10.1.0.1 any
access-list 198 deny ip host 10.1.0.0 0.0.0.255 any

access-list 199 permit ip 10.1.0.0 0.0.0.255 any

If you first get a packet through from host 10.1.0.2 that matches the
NAT rule associated with rule 199 if you then get a packet in the same
port from 10.1.0.1, it will ALSO match 199 without the serveriron
looking at the access lists.  It will see that it has a NAT rule
loaded in the hardware that allows 10.1.0.0/24 to anywhere and the
arriving packet matches that rule, it doesn't bother looking in the
access lists.

But if you get a packet through from 10.1.0.1 FIRST and then a packet
from 10.1.0.2, you're fine and the hosts match the expected NATs.

Also, depending on the number of connections you are making per
second, you can't rely on the output of sho ip nat trans to show you
if it is NATing what you expect it to.  That is because a flow entry
could change before it has finished printing to the screen and the
actual source IP and NATed IP shown on one line as being
representative of a given flow might be information for two different
flows.  I'm not sure if that got fixed in 10.x or not or if it can be
fixed.  You really need a sniffer on the traffic to see if it is being
NATed to what you expect it to be NATed to.

If your traffic were being NATed to an incorrect IP address, would
than have caused the symptoms that you show?

What I had to do was a massive pain in the hips if you have a lot of
NAT rules as I do:

access-list 101 permit ip host 10.1.0.1 any
access-list 101 deny ip any any

access-list 102 permit ip  host 10.1.0.2 any
access-list 102 deny ip any any

access-list 103 permit ip host 10.1.0.3 any
access-list 103 deny ip any any

...

access-list 199  deny ip host 10.1.0.1 any
access-list 199  deny ip host 10.1.0.2 any
access-list 199  deny ip host 10.1.0.3 any
...
access-list 199  permit ip 10.1.0.0 0.0.0.255 any

It gets even worse if your rules are nats to different destinations:

access-list 101 permit ip host 10.1.0.1 10.5.0.0 0.0.0.255

access-list 199 permit ip host 10.1.0.1 any

If first packet through is from 10.1.0.1 but to some destination not
in 10.5.0.0/24, then rule 199's criteria are stored in the hardware
and the next packet it, even if destined for 10.5.0.0/24 will still
match the list 199 nat rule and get the NAT IP associated with list
199 and not the one for list 101.

I know that was fixed for the ADX code, not sure about the other hardware.

I have an old serveriron running 10.2.01iTG4 and the default NAT
access-list is a mess with about 100 deny entries and maybe 4 allow
entries because *EVERYTHING* explicitly allowed in any other NAT rule
must be explicitly denied in the default rule if those other rules
allow traffic that would be covered by the default NAT.  If I don't do
that, once the default NAT gets hit on a flow, if traffic arrives from
a host that would be in another nat pool that the port has not seen
yet, it will just go with the default nat without the explicit denies.

It was fixed in 12.1.00d of the ADX code.  Don't know if it was ported
to the other hardware or not.




On Tue, Jan 31, 2012 at 12:02 PM, David Miller <dmiller at metheus.org> wrote:
> All;
>
> I have a ServerIron 4G SSL with Version 10.2.01oTI4 on it.  Other than very infrequently dropping the ability to ssh to it it's been extremely reliable.
>
> Yesterday something really weird happened to it.  I have a group of web servers (apache/php) the SI load balances amongst.  Each has to establish a tcp connection to an external service to process a certain type of request. Each of the web servers could only make this connection intermittently.
>
> Other hosts on the same network had no problem, even if they used the same outbound NAT rule.
>
> The thing that gives me the willies is that a reboot seems to have 'fixed' it - the whole group could make outbound connections anywhere after the reboot, something that makes me wonder if I should even bother looking at the config.
>
> I'm looking for advice from the experienced pros here.  Should I:
>
> 1) immediately upgrade to the current firmware
> 2) ignore it, it's never going to happen again
> 3) replace the hardware
> 4) move to new load balancers
>
> .... or something else?  They were expensive and have been very stable, I'd like to not get too drastic.
>
>
> TIA,
>
> --- David
>
>
> The config looks like this:
>
> ip nat inside source list 199 pool default_pool overload
> ip nat pool default_pool 6.a.b.c.8 a.b.c.8 netmask 255.255.255.255
> ip nat pool default_pool port-pool-range 2
>
> [...]
>
> access-list 199 deny ip any 192.168.140.0 0.0.0.255
> access-list 199 permit ip host 192.168.12.11 any
> access-list 199 permit ip host 192.168.12.12 any
> (etc, a bunch more use this outgoing address)
>
> server vip-group 1
> ip-nat-pool default_pool
>
>
>
>
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp at puck.nether.net
> http://puck.nether.net/mailman/listinfo/foundry-nsp



More information about the foundry-nsp mailing list