[c-nsp] Prove it's not the network!
Jeff Fitzwater
jfitz at Princeton.EDU
Thu May 15 08:55:43 EDT 2008
I sure hope Justin lets us know what the problem really was, after all
this..
Jeff Fitzwater
OIT Network Systems
Princeton University
On May 15, 2008, at 3:56 AM, Whisper wrote:
> Justin, I have alwasy been under the impression that Network Engineers
> primary role was going around constantly proving that the Network is
> not the
> problem. :)
>
> Your rant, I suspect, is more or less repeated on daily basis by
> Network
> Engineers all around the world.
>
> On Thu, May 15, 2008 at 3:41 PM, Justin Shore <justin at justinshore.com>
> wrote:
>
>> Nathan wrote:
>>
>>> Proceed by elimination. If there is someone else in the office (I
>>> suppose the T1 is not just for one person) whose Outlook is *not*
>>> slow, and especially if "someone else" can be extended to "everybody
>>> else" then the problem is not the network.
>>>
>>> Outlook can have severe speed/response problems when not kept
>>> healthy;
>>> most notably there's something called PST files that have to be kept
>>> at a reasonable size, or re-indexed or something, and people who
>>> like
>>> to keep all their mail tend to run into that.
>>
>> Here's a long account of a similar battle over PSTs that I fought.
>>
>> I fought a 'blame-the-network' battle at a customer's site a couple
>> years ago. We built a brand-new GigE greenfield network in a new
>> building and help the customer move into their new digs. Shortly
>> thereafter a certain group of users started complaining that their
>> computers were horribly slow, most especially Outlook. This reached
>> upper management before it came back down to us contractors so it
>> was a
>> huge deal when it landed at our feet.
>>
>> First thing we did was narrow down exactly who had the problem and
>> who
>> didn't. 95% of the complaints were "me too!" complaints and weren't
>> legitimate. The remaining 5% were isolated to one group of users
>> in one
>> specific area of the new building. Their IT staff that was working
>> on
>> this problem with us immediately blamed us again because "it had to
>> be
>> the network's fault because all the users are in the same physical
>> vicinity". I showed them graph after graph of the network I/O from
>> the
>> Exchange servers through the core and down through the uplinks to
>> distribution. In the end we ended up graphing every affected users'
>> port. The graphs did not help; we were still to blame.
>>
>> Finally one day I sat down with the squeakiest user and had her
>> show me
>> exactly what was slow and the steps she took to make that happen from
>> minute 1 of her walking into her office. I had her shut down and
>> start
>> from a cold boot. She commented that the login process was faster
>> than
>> normal and asked what I'd done to fix it (grrr). She fired up
>> Outlook
>> and I noticed that it was very slow. She said that it was faster
>> than
>> normal. Finally Outlook came up and she started scrolling through
>> her
>> email. She selected a message and waited 10 seconds or so for the
>> message to come up. Then she'd try to save the attachment to the
>> desktop and it would take 4-5 minutes (for a 20MB attachment). She
>> continued on with her daily routine and started scrolling down
>> there her
>> Outlook folders. I stopped her when I saw "Inbox, Sent, Drafts, etc"
>> scroll by more than once. This was the sign I was looking for. I
>> took
>> the wheel at this point and started counting. She had 8 (count them,
>> EIGHT) sets of default Outlook folders because she had 8 PSTs
>> mounted in
>> Outlook. She explained that she hits the Exchange PST hard limit
>> of 2GB
>> every 8-10 months. The company's IT folks would export everything
>> to a
>> new PST to give her a fresh inbox. Then they'd mount it in Outlook
>> so
>> she could have access to it (it was tax stuff so Legal wouldn't let
>> her
>> delete anything, literally). I started hunting for the PSTs and
>> found
>> them on an old file server, one that we had no idea was related to
>> the
>> mail system. She was mounting 8 roughly 2GB PSTs across the
>> network to
>> Outlook on a PC running XP w/ 128MB of RAM. Wonderful.
>>
>> But it gets better. I noticed that her inbox wasn't on the server
>> but
>> was instead in a PST on the same file server and her email was set to
>> deliver to PST, not Exchange directly. In this situation the way
>> Exchange works, email is held on the server for PST users until they
>> bring their Outlook online. OL then downloads the queued up email
>> and
>> stuffs it into the PST. Well, the PST was stored on the server so
>> the
>> client would have to manipulate the PST on the server.
>>
>> Oh, but it gets better still. A few days later one of sys admins was
>> looking the newly discovered file server that was apparently
>> critical to
>> the function of the mail server. From across the room we here loud
>> profanity and run over to see what happened. He discovered that the
>> idiot IT staff set up Windows to compress the non-RAIDed drive that
>> contains all the user PSTs and home directories because they ran
>> low on
>> drive space about a year earlier. Before a user's OL client can
>> modify
>> the PST the server has to decompress the entire PST, then write the
>> changes for the client, and recompress the PST and then write it
>> back to
>> disk. The server was a low-end MS box with 256MB of RAM with no RAID
>> and a backup that usually failed. Oh, and that sys admin also
>> discovered shortly thereafter that all of the users created in the
>> past
>> year and a half were set to deliver to PST because of, you guessed
>> it,
>> another drive space issue. Isn't that nice.
>>
>> All the users that reported this problem turned out to be users that
>> handled tax data and couldn't delete any email. That's why that
>> group
>> of users all experienced the problem. Every single one of these
>> users
>> were mounting 2-8 2GB PSTs across the network. Those that shutdown
>> at
>> night would come in at 8am and fire up their computers. A couple
>> dozen
>> different users would all try to pull down their PSTs from the
>> compressed file system of the poor server. So it wasn't the
>> network's
>> fault. The network was running like a champ. The POS server put
>> into
>> mission critical service by incompetent IT staff was to blame. We
>> spent
>> weeks troubleshooting the problem and trying to convince management
>> that
>> the network was fine. In the end I had to sit down with a user,
>> watch
>> everything that they did and then analyze their steps to figure out
>> what
>> was causing the problem. Oh, and the reason it was faster the day I
>> worked with her was because we did this mid-morning, not at 8am. Did
>> anyone ever apologize (even figuratively) to the network folks?
>> Nope.
>> Of course not.
>>
>>
>> As a network engineer I've found that the vast majority of my job is
>> helping other people find their problems. The network seldom
>> breaks and
>> when it does it's not subtle; it's catastrophic. Even highly skilled
>> technical people still blame the network when their stuff doesn't
>> work
>> right (after all my network is just a bunch of tubes, right?).
>> Networking is like mysterious dark magic that no one seems to
>> understand. It's the gremlins on the wire that causes Windows to
>> crash,
>> not poor programming and a lack of QA. Networking is simply not
>> understood by most people and it's human nature to fear and loathe
>> what
>> they don't understand. To be able to do my job effectively I have to
>> know my shit and everyone elses' well enough to know how something
>> works
>> when it inevitably breaks. Had I not come into networking with a
>> systems background and were I not a quick study under fire I would
>> not
>> be good at what I do. Did something "suddenly" break that must have
>> been caused by the network maintenance I did last week? No, it's the
>> fact that it never worked to begin with and you never actually
>> tested it
>> when you deployed it a year ago. It wasn't until a user tested it
>> for
>> you that you became aware of the fact that it wasn't working. It
>> just
>> happened to come a week after I did maintenance on an unrelated
>> device
>> on an unrelated network. But I'm going to spend all morning sniffing
>> and decoding traffic to help you realize that this device off to the
>> side over here couldn't possibly be involved. *sigh* Story of my
>> life.
>>
>> </OT RANT>
>>
>> Justin
>> _______________________________________________
>> cisco-nsp mailing list cisco-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
> _______________________________________________
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
More information about the cisco-nsp
mailing list