[c-nsp] Cisco 3550-12G VSI stops routing traffic

Tue Apr 22 18:26:08 EDT 2008

Hi Randal

it is really a wired problem but i can suggest 2 causes

1- it might due to VSI interfaces or ARP table limitation problem
2- if you are running PVST , it might be due to PVST instances limitation at
this IOS release

but to make it clear lets gather some logs and statistcis

1- u mentioned u transferred the affected VLANs to another distribution
switch , what is the active number of VLANs on this switch ?
2- as i got from ur description , the switch always drops the traffic of
only 2 VLANs randomly . get the output of show vlan and show spanning
tree during the problem time
3- u mentioned that the problem solved when u cleared ARP table , can u get
show arp | in incomplete to see which entries are incomplete before and
after the clearing and which VLAN it belongs to .

i hope to hear from u soon .

best regards
--Abo Zaid

On 4/22/08, randal k <cisconsp at data102.com> wrote:
>
> Hey guys,
> I've ran into a ridiculous problem that has me completely stumped.
>
> Network is a standard edge/core/access/distribution network comprised of
> 7206,6509-sup7203bxls, 3550s&3750s, and 3550s/2950s, respectively.
> Distribution is pure OSPF, with 226 routes currently in area 0, while the
> cores & edges run full mesh bgp. The cores originate defaults for the
> distribution layer, distribution layer carries all of the customer
> gateways
> and communicates those networks to OSPF.
>
> The distribution 3550-12G in question is running
> c3550-ipservices-mz.122-25.SEB4.bin. It's configured with 22 VSIs, carries
> all of Area 0 (226 routes), and has 354 mac addresses listed and just shy
> of
> 300 arp entries. Average traffic through the switch is approximately
> 120mbps. Not very loaded.
>
> This switches decided to randomly stop routing traffic two two completely
> separate VSIs (vlan 602, & vlan 149). These two VLANs are attached to the
> same port & downstream access switch, G0/4 and a 2960. The Internet can
> see
> the VSI IP addresses without issue, OSPF still advertises the routes
> without
> issue, everything is great up to the switch. Hosts attached to the
> 3550-12G
> are able to see their appropriate VSI gateway IP, but cannot see anything
> past it. Attached hosts are, however, able to see all of the other 21 VSI
> IP
> addresses on the switch -- just nothing off of the switch. No traffic is
> able to pass from off-switch/Internet to affected attached hosts, period.
> Resolution was to move the VSI/customer gateway to a different
> distribution
> switch. Although the affected/broken 3550-12G is still in the switching
> path, it does Layer 2 forwarding without issue -- just that those 2 VSIs
> just stopped forwarding traffic.
>
> So this morning, we lost two more networks, the primary and secondary IP
> address on a VSI for a completely different customer (vlan 609). On a
> lark,
> I clear arp'd and the two networks came back, but two other different VSIs
> went down (vlan 122, 167)!
>
> The only thing that all of the VSIs have in common is that they are all
> servicing customers attached to the 3550-12G's port G0/4. As mentioned
> earlier, there was a 2960 switch attached to G0/4, which has been replaced
> to no avail. Host configuration on affected VSI makes no difference -
> swapping in different servers, my laptop, etc, all yield the same problem.
> However, as of right now, if I plug my laptop into an access switch on
> g0/7
> configured for the same now-broken vlan 167, it works just fine. It's
> almost
> as if the VSI's dealing specifically with g0/4 were having problems.
>
> Fearing a broken g0/4 <-> 2960 trunk, my config has been reduced to 4
> lines,
> no change in service:
> !
> interface GigabitEthernet0/4
> description down_acc12.fac01.cos
> switchport trunk encapsulation dot1q
> switchport mode trunk
> load-interval 30
> !
>
>
> If I move the VSI & Gateway to different distribution switch, it works
> fine.
> If I move the access to a different port, it works fine. I have not
> reloaded
> the switch yet, as there is some other stuff on there that I don't want to
> incur 3-4 minutes of downtime on -- but I am fearing that the problem may
> jump and cause more harm. Am I dealing with a randomly screwed up g0/4
> that's smoking VSIs (how?), a buggy IOS that does this or ???. I've been
> searching the Internet the world over and would love to hear some ideas
> and
> anecdotes.
>
> Thanks for reading my wall of text,
> Randal
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>