<div dir="ltr"><div>Just to follow up on this....</div><div> </div><div>We made a network routing change over the weekend and magically the I/O wait fell back to normal.  May be coincidence but this seems to have resolved itself somehow. </div>

<div> </div><div>I don't like things that 'magically' fix themselves.</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Apr 5, 2013 at 12:41 PM, Erick Wellnitz <span dir="ltr"><<a href="mailto:ewellnitzvoip@gmail.com" target="_blank">ewellnitzvoip@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I have a bunch of LSIESG_AlertIndication messages.  Consistency checks and battery relearns.  only recently have the consistency checks found inconsistent parity.</div>

<div class="gmail_extra"><br><br><div class="gmail_quote">

On Fri, Apr 5, 2013 at 11:02 AM, Tom Piscitell (tpiscite) <span dir="ltr"><<a href="mailto:tpiscite@cisco.com" target="_blank">tpiscite@cisco.com</a>></span> wrote:<br><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">


Yes that is the CAR DB. In addition to checking CDR retention, I would pull syslogs and look for signs of HDD failure. The drives may be failing which will cause high IOWait during normal disk usage. Here is a helpful command since you are running on an IBM server:<br>


<br>

$ grep -R LSIESG_AlertIndication --include messages* *<br>

<br>

HTH,<br>

-Tom<br>

<br>

On Apr 5, 2013, at 11:49 AM, "Haas, Neal" <<a href="mailto:nhaas@co.fresno.ca.us" target="_blank">nhaas@co.fresno.ca.us</a>><br>

 wrote:<br>

<br>

> Jumping in on the thread, what is your CDR retention set to? Do you redirect to a 3rd party CDR such as ISI? We have only a 30 day retention on our servers I believe. We never use the CDR from the server.<br>

><br>

> We had a lot of IO when our CDR was set to a few months, The IO was from the deletion of the old CDR at the end of the month. It has been a few years since we changed the settings.<br>

><br>

><br>

> Neal Haas<br>

><br>

> From: <a href="mailto:cisco-voip-bounces@puck.nether.net" target="_blank">cisco-voip-bounces@puck.nether.net</a> [mailto:<a href="mailto:cisco-voip-bounces@puck.nether.net" target="_blank">cisco-voip-bounces@puck.nether.net</a>] On Behalf Of Erick Wellnitz<br>


> Sent: Friday, April 05, 2013 8:21 AM<br>

> To: Tom Piscitell (tpiscite)<br>

> Cc: cisco-voip<br>

> Subject: Re: [cisco-voip] high I/O Wait on one core<br>

><br>

> caroninit seems to be the biggest offender (by about 50x) in both disk writes and cpu usage.  Am I correct in assuming this has something to do with call detail records?<br>

><br>

><br>

> On Fri, Apr 5, 2013 at 9:02 AM, Tom Piscitell (tpiscite) <<a href="mailto:tpiscite@cisco.com" target="_blank">tpiscite@cisco.com</a>> wrote:<br>

> Erick,<br>

><br>

> You can use the FIOR utility from the CLI to identify which processes are writing to the disk.<br>

><br>

> admin:utils fior<br>

>       utils fior disable<br>

>       utils fior enable<br>

>       utils fior list<br>

>       utils fior start<br>

>       utils fior status<br>

>       utils fior stop<br>

>       utils fior top<br>

><br>

> Here is a typical use case:<br>

><br>

> 1. Enable the FIOR utility before/during a time of High IO Wait<br>

>         admin:utils fior enable<br>

>         File I/O Statistics has been enabled.<br>

>         admin:utils fior start<br>

>         Loading fiostats module: ok<br>

>         Enabling fiostats : ok<br>

>         File I/O Statistics has been started.<br>

><br>

> 2. Wait a couple minutes. FIOR will poll for data every 5 seconds I believe. Then use utils fior top to see whats hitting the CPU the hardest:<br>

><br>

> admin:utils fior top ?<br>

> Syntax:<br>

> utils fior top n sort_by [start=date-time] [stop=date-time]<br>

><br>

>          n:            number of processes<br>

>          sort_by:      read, write, read-rate, write-rate<br>

>          date-time:    of the form %H:%M, %H:%M:%S<br>

>                                    %a,%H:%M, %a,%H:%M:%S<br>

>                                    %Y-%m-%d,%H:%M, %Y-%m-%d %H:%M:%S<br>

> Example:<br>

> admin:utils fior top 10 write start=2010-04-20 10:00:00 stop=2010-04-20 10:30:00<br>

><br>

> This of course won't tell you *why* a process is hitting the disk, but it will at least show you who has the most read/writes. To answer the why question you would need to look at traces for the offending process/service.<br>


><br>

> HTH,<br>

> -Tom<br>

><br>

> On Apr 4, 2013, at 5:43 PM, Erick Wellnitz <<a href="mailto:ewellnitzvoip@gmail.com" target="_blank">ewellnitzvoip@gmail.com</a>> wrote:<br>

><br>

> > Hello all!<br>

> ><br>

> > I have a dual 4 core IBM 7835I3 which is my publisher.   One one core of the first CPU the I/O Wait is through the roof.  RTMT shows that writes to the hard drives are at between 600 and 700 MB/s which is exponentially higher than the subscriber on the same model of hardware.<br>


> ><br>

> > Short of calling TAC is there any way to figure out what is causing the extremely high volume of writes to the drives?  I already stopped most traces and looking at the processes doesn't give any clues.<br>


> ><br>

> > Thanks again!<br>

> ><br>

> ><br>

> > _______________________________________________<br>

> > cisco-voip mailing list<br>

> > <a href="mailto:cisco-voip@puck.nether.net" target="_blank">cisco-voip@puck.nether.net</a><br>

> > <a href="https://puck.nether.net/mailman/listinfo/cisco-voip" target="_blank">https://puck.nether.net/mailman/listinfo/cisco-voip</a><br>

><br>

<br>

</blockquote></div><br></div>

</blockquote></div><br></div>