[c-nsp] help on NAT rate limiting

Ted Mittelstaedt tedm at toybox.placo.com
Wed Dec 29 05:33:18 EST 2004



> -----Original Message-----
> From: hag at linnaean.org [mailto:hag at linnaean.org]
> Sent: Wednesday, December 29, 2004 1:23 AM
> To: Ted Mittelstaedt
> Cc: Church, Chuck; cisco-nsp at puck.nether.net
> Subject: Re: [c-nsp] help on NAT rate limiting
> 
> 
> "Ted Mittelstaedt" <tedm at toybox.placo.com> writes:
> 
> > 24 hours is really pretty absurd anyhow.  Even more so
> > because very very few tcp protocols that have long sequences where
> > they don't send data don't use keepalives.  About
> 
>     Speak for yourself please.  I've been situations where I've
> *upped* timers like that.  Just because you can't think of a reason
> for it doesn't mean reasons don't exist.
> 

for everyone?  You are saying that everyone on the Internet should
be running those 'situations'?  Like some kludgy thing like this
should be the -standard-?

Yeah, right, uhuh.  Please tell us what other commercial NAT device
you have done that with.  I've worked with NAT long before Cisco
even knew what it was back in the IOS 11.1 days.  This isn't a
typical situation.  Very few if any other NAT devices allow you
to adjust these timers.

This situation is only applicable to a session passing through a
NAT device.  It isn't applicable to any other scenario.

>     Some of us selectivly disable keepalives because it's a choice
> between
> A) detecting dead hosts without involving the application (and in this
> day and age, informing intermediate manglers like NAT that the session
> isn't idle)
> or
> B) having an idling session survive transient failures.
> 

I understand that in some situations that you can benefit from disabling
keepalives.  But there is a fundamental conflict with doing this
across a translator.  No translation device has an infinite amount of
memory and if you design one to completely disable the teardown of
idle sessions through a timeout timer - so that your non-keepalive
session isn't disrupted by a transient failure - then it will
eventually consume all memory and stop working. 

Whoever said that translators are optimal network devices anyway?
Not I.

And in any case disabling keepalives is an extremely crude way of
allowing an idle session to survive transient failures.  If you have
a session that must survive transient failures then you must quantify
those failures.  If those transient failures are of, for example, a
5 minute duration, then set the application keepalives so that they are
only issued every 5 minutes, and the application takes 3 keepalive
failures before deciding it's a dead session.

>     Sometimes it's better to do B and remember to twiddle the session
> once every couple of days to keep the idiot middle boxes happy.
> 

Why are you even using idiot middle boxes in the first place?  If you
have such an esoteric situation you shouldn't.

A transient failure that is a failure of several days is not a transient
failure.  It is an outage.  And an application that uses TCP and opens
a connection and expects it to remain open for days at a time without
passing any data, is not well written if it doesen't use keepalives.
If such an application is intended to be used in a highly disrupted
circuit, it should use keepalives intelligently.  But, much of this
makes little sense anyway because an app that does not pass data over
several days time should not maintain an open TCP session.  It should
open a session when it has data to exchange, exchange the data, then
close the session.  This has nothing to do with dead host detection,
it is just good network programming practice.

In any case I don't have an objection to allowing someone to kludge up
their router if they want.  My objection is to institutionalizing a
kludge.  If it is only people who know what they are doing who are putting
icky kludges like this 'several day no-host-detection timer thing' that
you are advocating then we don't have a problem.  But when something
like that is turned on for everyone, then all the idiots that don't
know what they are doing are running the icky kludges.  And worse,
if it's done on the sly, then all the people that know what they are
doing are running them as well, without realizing it.

The same problem happens with the morons that block all ICMP types and
pretend that there is only one MTU size in the world.

Whatever you say, Cisco broke NAT in 12.3.  I don't know if it's because
someone at Cisco made a bunch of bad decisions, or if it's because
someone at Cisco made a mistake and put in a bug.  But I have seen twice
now, configs that ran fine under 12.2 and lower, in completely
different environments and different hardware, become big
problems under 12.3.  And I really cannot understand how this could
be a bug - if it is a bug, then nobody must be running 12.3 because 
why didn't someone scream about it earlier?

Ted


More information about the cisco-nsp mailing list