[nsp] VLAN Trunking problems

Sam Stickland sam_ml at spacething.org
Thu Oct 9 13:31:42 EDT 2003


Hi,

I ending up with a really bizarre state of affairs today.

I was converting a normal ethernet cable link to a dot1q trunk. It's between
a 3550 (12.1(11)EA1a) and a 2924XL (12.0(5)WC5). I've previously never had
any problem trunking either of these devices with other equipment, but this
the first time I've tried creating a trunk between the two of them.

The final setup looked like this:

Cisco 6509
    |
    |
Cisco 3550
    |
    |
Cisco 2924

The 6509 is the core router and is connected to a number of transit
providers and peers. There's also a 7206 connected to the 3550 as a backup
for the 6509, and a little 2651 (used as a demo route-server). All these
links are dot1q trunks.

Once the link between the 3550 and the 2924 become a dot1q trunk, network
traffic across that link started to get weridly corrupted.

One of the webservers attached to the 2924 was accessible via some routes on
the 6509 but not via others. On the routes that were accessible, traceroutes
to the server were often weird (sometimes the listed penulimate hop was
_another_ machine attached to the 2924).

ARP addresses ended up all over the place (ie. the MAC address for the Cisco
2651 was sometimes listed for where the 6509 should had been, a lot simply
failed to resolve).

While all the services for machines attached to the could be accessed from
machines attached to the 2924, not a single traceroute worked from the 2924
machines to any of them (sometimes they timed out after locating the 6509,
sometimes they didn't even get that far).

The most bizarre case was a webserver, and I kid you not, that was able to
serve any type of content EXCEPT jpegs. HTML pages, gifs, pngs, everything
all worked fine - but jpegs, not a chance. A little sniffing of the network
traffic showed the jpeg traffic simply disappearing into the either.

I spent a little time wondering what the hell was going on, and then
reverted the trunk back to an access link and every single one of the
syntoms listed above simultaneously disappeared.

It sounds like I'm hitting an IOS bug (with the estoric version of IOS in
use here, I'm not overly surprised). However, the link between is the 3550
and the 2924 is actually a layer 2 interconnect (supplied by the
datacentre), and it goes through a number of switches.

Is it possible that something they are doing on the interconnect could cause
problems like this? Stripping of VLAN frames perhaps? Am I grasping at
straws?

Sam



More information about the cisco-nsp mailing list