[cisco-voip] high I/O Wait on one core

Erick Wellnitz ewellnitzvoip at gmail.com
Mon May 20 12:08:42 EDT 2013


Just to follow up on this....

We made a network routing change over the weekend and magically the I/O
wait fell back to normal.  May be coincidence but this seems to have
resolved itself somehow.

I don't like things that 'magically' fix themselves.


On Fri, Apr 5, 2013 at 12:41 PM, Erick Wellnitz <ewellnitzvoip at gmail.com>wrote:

> I have a bunch of LSIESG_AlertIndication messages.  Consistency checks and
> battery relearns.  only recently have the consistency checks found
> inconsistent parity.
>
>
> On Fri, Apr 5, 2013 at 11:02 AM, Tom Piscitell (tpiscite) <
> tpiscite at cisco.com> wrote:
>
>> Yes that is the CAR DB. In addition to checking CDR retention, I would
>> pull syslogs and look for signs of HDD failure. The drives may be failing
>> which will cause high IOWait during normal disk usage. Here is a helpful
>> command since you are running on an IBM server:
>>
>> $ grep -R LSIESG_AlertIndication --include messages* *
>>
>> HTH,
>> -Tom
>>
>> On Apr 5, 2013, at 11:49 AM, "Haas, Neal" <nhaas at co.fresno.ca.us>
>>  wrote:
>>
>> > Jumping in on the thread, what is your CDR retention set to? Do you
>> redirect to a 3rd party CDR such as ISI? We have only a 30 day retention on
>> our servers I believe. We never use the CDR from the server.
>> >
>> > We had a lot of IO when our CDR was set to a few months, The IO was
>> from the deletion of the old CDR at the end of the month. It has been a few
>> years since we changed the settings.
>> >
>> >
>> > Neal Haas
>> >
>> > From: cisco-voip-bounces at puck.nether.net [mailto:
>> cisco-voip-bounces at puck.nether.net] On Behalf Of Erick Wellnitz
>> > Sent: Friday, April 05, 2013 8:21 AM
>> > To: Tom Piscitell (tpiscite)
>> > Cc: cisco-voip
>> > Subject: Re: [cisco-voip] high I/O Wait on one core
>> >
>> > caroninit seems to be the biggest offender (by about 50x) in both disk
>> writes and cpu usage.  Am I correct in assuming this has something to do
>> with call detail records?
>> >
>> >
>> > On Fri, Apr 5, 2013 at 9:02 AM, Tom Piscitell (tpiscite) <
>> tpiscite at cisco.com> wrote:
>> > Erick,
>> >
>> > You can use the FIOR utility from the CLI to identify which processes
>> are writing to the disk.
>> >
>> > admin:utils fior
>> >       utils fior disable
>> >       utils fior enable
>> >       utils fior list
>> >       utils fior start
>> >       utils fior status
>> >       utils fior stop
>> >       utils fior top
>> >
>> > Here is a typical use case:
>> >
>> > 1. Enable the FIOR utility before/during a time of High IO Wait
>> >         admin:utils fior enable
>> >         File I/O Statistics has been enabled.
>> >         admin:utils fior start
>> >         Loading fiostats module: ok
>> >         Enabling fiostats : ok
>> >         File I/O Statistics has been started.
>> >
>> > 2. Wait a couple minutes. FIOR will poll for data every 5 seconds I
>> believe. Then use utils fior top to see whats hitting the CPU the hardest:
>> >
>> > admin:utils fior top ?
>> > Syntax:
>> > utils fior top n sort_by [start=date-time] [stop=date-time]
>> >
>> >          n:            number of processes
>> >          sort_by:      read, write, read-rate, write-rate
>> >          date-time:    of the form %H:%M, %H:%M:%S
>> >                                    %a,%H:%M, %a,%H:%M:%S
>> >                                    %Y-%m-%d,%H:%M, %Y-%m-%d %H:%M:%S
>> > Example:
>> > admin:utils fior top 10 write start=2010-04-20 10:00:00 stop=2010-04-20
>> 10:30:00
>> >
>> > This of course won't tell you *why* a process is hitting the disk, but
>> it will at least show you who has the most read/writes. To answer the why
>> question you would need to look at traces for the offending process/service.
>> >
>> > HTH,
>> > -Tom
>> >
>> > On Apr 4, 2013, at 5:43 PM, Erick Wellnitz <ewellnitzvoip at gmail.com>
>> wrote:
>> >
>> > > Hello all!
>> > >
>> > > I have a dual 4 core IBM 7835I3 which is my publisher.   One one core
>> of the first CPU the I/O Wait is through the roof.  RTMT shows that writes
>> to the hard drives are at between 600 and 700 MB/s which is exponentially
>> higher than the subscriber on the same model of hardware.
>> > >
>> > > Short of calling TAC is there any way to figure out what is causing
>> the extremely high volume of writes to the drives?  I already stopped most
>> traces and looking at the processes doesn't give any clues.
>> > >
>> > > Thanks again!
>> > >
>> > >
>> > > _______________________________________________
>> > > cisco-voip mailing list
>> > > cisco-voip at puck.nether.net
>> > > https://puck.nether.net/mailman/listinfo/cisco-voip
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20130520/8e238100/attachment.html>


More information about the cisco-voip mailing list