[c-nsp] 7606 to 6509 [BGP hold time issue]

Thu May 3 12:11:50 EDT 2012

We matched MTU. It was one of the first things we attempted. We also lowered MTU to 1280 both ends. No change.

-----Original Message-----
From: Scott Granados [mailto:scott at granados-llc.net] 
Sent: 03 May 2012 16:14
To: Scantlebury, Kieron
Cc: cisco-nsp at puck.nether.net
Subject: Re: [c-nsp] 7606 to 6509 [BGP hold time issue]

You have an MTU mismatch.:)

THis is my guess anyway because it really matches closely your issue.

I ran in to this with almost the same set up using larger MTU sizes for the ethernet + tags.  I had to use the IP MTU command under the actual interface (or subiff depending) and set to 1500.

You can easily tell by
show ip bgp nei a.b.c.d | inc data

look at the segment size and make sure that it makes with the MTU you have set including overhead.

In my case, I was getting number greater than 1460 which in my setup I knew wouldn't fly.

Hope that helps.

Thanks
Scott

On May 3, 2012, at 10:55 AM, Scantlebury, Kieron wrote:

> Hi Guru's
> 
> I have a Cisco 7606-S with a 1 gig DIA link to our customers Cisco 6509 switch.
> This is directly connected just a few cabinets down in the same COLO.
> 
> The link is stable and we have no errors.
> 
> The problem we are seeing is that BGP is getting ripped down 3 minute 
> (hold timer) - See below
> 
> *FYI - The obvious has been checked. Hold timers match on each device. Tried adjusting MTU etc...
> 
> May  3 05:00:10.490 BST: %BGP-5-ADJCHANGE: neighbor *** .***.***.*** 
> Up May  3 05:03:11.302 BST: %BGP-5-ADJCHANGE: neighbor *** 
> .***.***.*** Down BGP Notification sent May  3 05:00:00.866 BST: 
> %BGP-3-NOTIFICATION: sent to neighbor *** .***.***.*** 4/0 (hold time 
> expired) 0 bytes
> 
> I have done a few packet captures. It appears that the customers 6509 isn't acknowledging the BGP update packets and so our ASR try's to re-transmit packets (that are above 1300 bytes +) again and again. The customer is unable to do a PCAP so I cant Guarantee that the update packet is hitting his device.
> 
> The customer requires the full routing table. Some 39000+ routes. BGP only drops when sending this table. If I limit these advertisements to say 1.23.0.0/16 le 32 the BGP stays stable. If I advertise a full 1.0.0.0/8 le 32 then it becomes unstable again.
> 
> We do have a known issue on our 7606 at the moment, TCAM is full. This 
> issue is being resolved this coming weekend. However I don't believe 
> that this will be the cause of our BGP issue. Unless of course this 
> issue is effecting the overall performance of our router. (That's for 
> another team to worry about :))
> 
> An idea that's been thrown around is that the customers 6509 doesn't have enough memory to support the full routing table. Here are some outputs from his switch.
> 
> #####show proc mem | i BGP Router
> 396   0 4160646648 3294746732  284982876          0          0 BGP Router
> 
> #####show ip bgp summary
> 408443 network entries using 48196274 bytes of memory
> 
> 
> #####show version
> cisco WS-C6503-E (R7000) processor (revision 1.3) with 983008K/65536K bytes of memory.
> Processor board ID FOX1350GH8H
> SR71000 CPU at 600Mhz, Implementation 0x504, Rev 1.2, 512KB L2 Cache 
> Last reset from s/w reset
> 9 Virtual Ethernet interfaces
> 66 Gigabit Ethernet interfaces
> 1917K bytes of non-volatile configuration memory.
> 8192K bytes of packet buffer memory.
> 
> 65536K bytes of Flash internal SIMM (Sector size 512K).
> Configuration register is 0x2102
> 
> Also, There is nothing in their logs to indicate a memory issue.
> 
> Any further ideas would be appreciated.
> 
> Many Thanks in advance.
> Kieron
> 
> 
> 
> 
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net 
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/