[c-nsp] asymmetric multihoming & nat

Wed Jan 26 07:57:18 EST 2011

Adam,

On Wed, 26 Jan 2011, Adam Greene wrote:

> Pete,
>
> Thanks ... we ran some tests this evening, disabling NAT entirely, and saw 
> the same results, so I think we can safely say that NAT is not causing the 
> issue.
>
> The situation we are facing is that the customer appears to be unable to 
> route asymmetric traffic. At least that's what we think the problem is so 
> far. They get full routes from two providers (one of them being us), and 
> announce their IP block through both providers. When traffic comes in through 
> us and goes out through their other provider, it's blocked. ICMP traffic 
> doesn't seem to mind. But TCP traffic is definitely not working.

"not working" as in "the connection does not complete and hangs" or as in "can 
not get the the SYN in the direction from/to them?"

I'd probe at a more granular level to see whether the problem is related to 
something blocking due to a confused state machine on a middlebox somewhere, 
*or* if it is indeed genuine ACL.

>
> We allow their traffic through our network asymmetrically (we have ASA's at 
> our edges but have enabled tcp-state-bypass on them). I am suspecting the 
> customer's other upstream provider (or ours) may have some asymmetry block in 
> place. But ... that also seems unlikely, since I assume most IP carriers do 
> not.

The part that seems unlikely to me is the fact that ICMP behaves differently 
from TCP - if I were to put in uRPF-like measures, I'd have them identical for 
ICMP and TCP.

>
> So we're currently stumped.

I'd debug it this way: first, pick a server somewhere on the 'net where you can 
initiate the connections to, from, and do the tcpdump on (to see in details what 
happens).

There are several points where it makes sense to put the probes:

1) customer's host (internal).
2) both their firewalls (internal)
3) both their firewalls (external)
4) your cloud closer to customer
5) your cloud closer to the server
6) the server itself

Then you can try a TCP connection from the customer to that server, and see 
where things break - and try the TCP connection in the other direction, and 
compare how the TCP connection looks at each point when sniffed.

The above might be a bit laborious to do all at once so it can be 
split in multiple iterations with less probe points - but just to outline the 
principle.

I notice on github there's now also another potentially useful tool for 
debugging - https://github.com/enki/muXTCP - but that'd be a much longer story 
as I suspect that code had been affected by bitrot :-)

cheers,
andrew

>
> Thanks,
> Adam
>
>
> On 1/24/2011 2:56 PM, Pete Lumbis wrote:
>> Adam,
>> 
>> I realized (with the help of an off-list post) I mis-read your
>> original post. I thought this was on two different devices, instead of
>> two connections on the same device.
>> 
>> For a single box the NAT lookups are done when traffic arrives on any
>> nat inside/outside interface*. If we create a translation for a packet
>> exiting f0/0 (for example) and the response arrives on f0/1, we will
>> see the packet arriving on a NAT outside interface, do the NAT lookup
>> and match the existing translation that was created by the first
>> outbound packet.
>> 
>> What kind of problems are you seeing? Is traffic slow or not arriving at 
>> all?
>> 
>> *based on NAT order of ops when traffic arrives on a NAT interface and
>> is destined for a NAT interface
>> 
>> -Pete
>> 
>> On Fri, Jan 21, 2011 at 6:05 PM, Pete Lumbis<alumbis at gmail.com>  wrote:
>>> NAT could definitely be causing issues. Generally you could use
>>> something like Stateful NAT (SNAT) between the two BGP speakers to
>>> make sure they sync their NAT tables, but this this feature has had a
>>> number of challenges/issues and development and started moving it to
>>> end of life.
>>> 
>>> 
>>> 
>>> On Fri, Jan 21, 2011 at 4:09 PM, Adam Greene<maillist at webjogger.net> 
>>> wrote:
>>>> Hi guys,
>>>> 
>>>> I have a multihomed customer who receives full BGP routes from both us 
>>>> and
>>>> another provider and load balances between the two connections. Things 
>>>> are
>>>> working fine until the traffic becomes asymmetric (i.e. inbound through 
>>>> one
>>>> provider, outbound through the other).
>>>> 
>>>> The block they are announcing to their providers is NATed on their BGP
>>>> router. In other words, all their internal hosts are on private IP space.
>>>> The internal interface is designated "ip nat inside" and both WAN 
>>>> interfaces
>>>> are designated "ip nat outside". The actual NAT configurations do not
>>>> reference any interfaces, just pools.
>>>> 
>>>> Could the NAT be prohibiting asymmetric traffic in this case? i.e. if the
>>>> inbound traffic is NATed coming in on one interface, will the router 
>>>> refuse
>>>> to NAT the outbound traffic through the other interface?
>>>> 
>>>> If the NAT is the problem, I suppose they could do the NAT on a loopback
>>>> interface instead ... but I understand that the traffic will all be
>>>> process-switched if we do that, and performance will probably suffer.
>>>> 
>>>> Thanks for your insight,
>>>> Adam
>>>> 
>>>> _______________________________________________
>>>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>>> 
>> 
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>