One more thing, I saw &quot;CCM|MGCPHandler TransId: 1097943 Timeout. Retry#1&quot; in the SDI traces, but can&#39;t seem to find the #2 or #3 retry, which causes MGCP gateway reset.<br><br>Thanks,<br>Wil<br><br><div class="gmail_quote">

On Wed, Nov 4, 2009 at 12:47 AM, Wilson Hew <span dir="ltr">&lt;<a href="mailto:wilsonhew@gmail.com">wilsonhew@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Hello Wes,<br><br>Thank you so much for the information. It really benefits me!<br><br>Btw, when you say the below AUEP is not &#39;normal&#39;, can you please help to elaborate?<br><br>----------------------------------------------------------------------------------<br>

AUEP 76267 AALN/S2/SU0/0@MLP-VG-01 MGCP 0.1<br>F: X<br>|&lt;CLID::StandAloneCluster&gt;&lt;NID::X.X.X.X&gt;&lt;CT::1,100,132,1.204039&gt;&lt;IP::X.X.X.X&gt;&lt;DEV::&gt;&lt;LVL::Significant&gt;&lt;MASK::2000&gt;<br>----------------------------------------------------------------------------------<br>

<br>I found that in my SDI traces and I can see AUEP ACK received. However, I got a shocked when I see this (more than 10 msgs received within second, together):<br><br>----------------------------------------------------------------------------------<br>

NTFY 129359098 aaln/S2/SU0/3@MLP-VG-01 MGCP 0.1<br>N: <a href="http://ca@172.22.7.1:2427" target="_blank">ca@172.22.7.1:2427</a><br>X: 69<br>O: L/hd<br>|&lt;CLID::StandAloneCluster&gt;&lt;NID::172.22.7.1&gt;&lt;CT::1,100,132,1.204114&gt;&lt;IP::172.23.8.251&gt;&lt;DEV::&gt;&lt;LVL::Significant&gt;&lt;MASK::2000&gt;<br>

11/03/2009 15:25:38.443 CCM|&lt;CLID::StandAloneCluster&gt;&lt;NID::172.22.7.1&gt;&lt;CT::1,100,132,1.204114&gt;&lt;MN::MGCPEndPoint&gt;&lt;MV::aaln/S2/SU0/3@MLP-VG-01&gt;&lt;DEV::&gt;&lt;LVL::All&gt;&lt;MASK::ffff&gt;<br>

11/03/2009 15:25:38.443 CCM|MGCPHandler received msg from: 172.23.8.251<br><br>NTFY 129359097 *@MLP-VG-01 MGCP 0.1<br>X: 0<br>O: <br>|&lt;CLID::StandAloneCluster&gt;&lt;NID::172.22.7.1&gt;&lt;CT::1,100,132,1.204115&gt;&lt;IP::172.23.8.251&gt;&lt;DEV::&gt;&lt;LVL::Significant&gt;&lt;MASK::2000&gt;<br>

11/03/2009 15:25:38.443 CCM|&lt;CLID::StandAloneCluster&gt;&lt;NID::172.22.7.1&gt;&lt;CT::1,100,132,1.204115&gt;&lt;MN::MGCPEndPoint&gt;&lt;MV::*@MLP-VG-01&gt;&lt;DEV::&gt;&lt;LVL::All&gt;&lt;MASK::ffff&gt;<br>11/03/2009 15:25:38.443 CCM|MGCPHandler received msg from: 172.23.8.251<br>

----------------------------------------------------------------------------------<br><br>Followed by this (seeing phones keep alive timeout):<br><br>----------------------------------------------------------------------------------<br>

11/03/2009 15:25:38.445 CCM|StationInit:   TCPPid=[ 1.100.9.210] Keep alive timeout.|&lt;CLID::StandAloneCluster&gt;&lt;NID::<br><br>and the below (is the below trying to tell MGCP gateway restarting?):<br><br>11/03/2009 15:25:38.485 CCM|MGCPInit - //// RSIP &lt;restart&gt; from *@MLP-VG-01|&lt;CLID::StandAloneCluster&gt;&lt;NID::<br>

<br>11/03/2009 15:25:38.490 CCM|MGCPManager received DUPLICATE message with TransId: 129359097|&lt;CLID::StandAloneCluster&gt;&lt;NID::<br>----------------------------------------------------------------------------------<br>

<br>Lastly, CUCM is sending messages to MGCP gateway (more than 10 msgs received within second, together):<br><br>----------------------------------------------------------------------------------<br>|&lt;CLID::StandAloneCluster&gt;&lt;NID::172.22.7.1&gt;&lt;CT::1,100,132,1.204110&gt;&lt;IP::172.23.8.251&gt;&lt;DEV::&gt;&lt;LVL::Significant&gt;&lt;MASK::2000&gt;<br>

11/03/2009 15:25:38.756 CCM|MGCPHandler send msg SUCCESSFULLY to: 172.23.8.251<br>200 129359097 <br><br>|&lt;CLID::StandAloneCluster&gt;&lt;NID::172.22.7.1&gt;&lt;CT::1,100,132,1.204111&gt;&lt;IP::172.23.8.251&gt;&lt;DEV::&gt;&lt;LVL::Significant&gt;&lt;MASK::2000&gt;<br>

11/03/2009 15:25:38.756 CCM|MGCPHandler send msg SUCCESSFULLY to: 172.23.8.251<br>200 129359097 <br>----------------------------------------------------------------------------------<br><br>Looks to me the network is not stable or not consistent during that period.<br>

<br>Any feedback from the gurus out there is very much appreciated!<br><br>Thanks,<br>Wil<div><div></div><div class="h5"><br><br><br><div class="gmail_quote">On Tue, Nov 3, 2009 at 10:52 PM, Wes Sisk <span dir="ltr">&lt;<a href="mailto:wsisk@cisco.com" target="_blank">wsisk@cisco.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div bgcolor="#ffffff" text="#000000">

blah! I forgot to mention the service parameter.  CM Service parameter

&quot;MGCP Retry Timeout Handling&quot; configures the behavior when a timeout is

observed.  This allows marking the endpoint oos, resetting just the

port, unregistering the entire gateway.  Unregistering then entire

gateway is the default value.<br><font color="#888888">

<br>

/wes</font><div><div></div><div><br>

<br>

On Tuesday, November 03, 2009 9:29:24 AM, Wes Sisk

<a href="mailto:wsisk@cisco.com" target="_blank">&lt;wsisk@cisco.com&gt;</a> wrote:<br>

<blockquote type="cite">

timely question.<br>

  <br>

MGCP gateway can be viewed as:<br>

  <br>

MGCP Gateway<br>

    mgcp/udp based registration and keepalives<br>

    analog endpoints<br>

       mgcp/udp based registration and transactions<br>

    digital endpoints<br>

       backhaul/tcp based<br>

  <br>

on CM if you see the alarm:<br>

MGCPGatewayLostComm then the top level mgcp process stopped

communicating with CM.  Usually the GW sends keepalives to CM similar

to:<br>

12/27/2005 10:16:40.173 CCM|MGCPHandler received msg from: 10.10.33.250<br>

NTFY 333382 *@HQ-VG224-3rdFlr MGCP 0.1<br>

X: 0<br>

O:<br>

|&lt;CLID::MFCU-CM-1-Cluster&gt;&lt;NID::10.10.200.11&gt;&lt;CT::2,100,66,1.23017474&gt;&lt;IP::10.10.33.250&gt;&lt;DEV::&gt;

  <br>

  <br>

If CM does not receive keepalive from gateway CM will attempt to query

the gw with this message:<br>

12/27/2005 10:17:07.002 CCM|MGCPHandler send msg SUCCESSFULLY to:

10.10.31.250<br>

AUEP 13561613 AALN/S2/0@HQ-VG224-1stFlr MGCP 0.1<br>

F: X<br>

|&lt;CLID::MFCU-CM-1-Cluster&gt;&lt;NID::10.10.200.11&gt;&lt;CT::2,100,66,1.23017448&gt;&lt;IP::10.10.31.250&gt;&lt;DEV::&gt;

  <br>

  <br>

This AUEP is not &#39;normal&#39;.<br>

F = RequestedInfo<br>

X = RequestIdentifier<br>

Normal AUEP requests much more information. This is a special &quot;hello,

are you there&quot; type exchange.<br>

  <br>

the gateway should respond:<br>

12/27/2005 10:17:07.002 CCM|MGCPHandler received msg from: 10.10.31.250<br>

200 13561613<br>

X: 2<br>

|&lt;CLID::MFCU-CM-1-Cluster&gt;&lt;NID::10.10.200.11&gt;&lt;CT::2,100,66,1.23017648&gt;&lt;IP::10.10.31.250&gt;&lt;DEV::&gt;

  <br>

  <br>

  <br>

This is getting very close to unregistration.  Another way to look at

this is to look for indicates of lost messages to the gateway.  Each

MGCP transaction is retransmitted up to 3 times if not ack&#39;d.  You can

see retries in the CM SDI traces:<br>

01/13/2005 10:34:33.603 CCM|MGCPHandler TransId: 1097943 Timeout.

Retry#1<br>

  <br>

If you see frequent retries then you are intermittently dropping or

excessively delaying the UDP packets carrying the MGCP payload.<br>

  <br>

  <br>

There is also an issue where endpoints may stop responding to CM.  CM

will retry the transaction 3 times and then unregister the gateway. 

This looks similar to the retries tracked above.  The main difference

is that you will see valid exchanges with other endpoints on the

gateway or you will see successful keepalives with the top level

gateway MGCP process.  This was historically caused by CSCsf26617 and

similar.  The signature of this failure is repeated retransmits of the

DLCX, RQNT, or CRCX messages from CM to the gateway while other

endpoints are responding.  If this is happening then the gateway is

having an internal error such as resource allocation or dsp hang.<br>

  <br>

HTH.<br>

  <br>

/Wes<br>

  <br>

  <br>

  <br>

On Tuesday, November 03, 2009 3:39:24 AM, Wilson Hew

  <a href="mailto:wilsonhew@gmail.com" target="_blank">&lt;wilsonhew@gmail.com&gt;</a>

wrote:<br>

  <blockquote type="cite">Bob/Ryan, appreciate your feedback. Thanks.<br>

    <br>

Guess I need to look at the connection between my MGCP gateway and

CUCM. Any idea what else I may need to check? I am looking at the SDI

traces, but have no idea what to look at.<br>

    <br>

Thanks,<br>

Wil<br>

    <br>

    <div class="gmail_quote">On Tue, Nov 3, 2009 at 2:35 AM, Bob Fronk <span dir="ltr">&lt;<a href="mailto:bob@btrfronk.com" target="_blank">bob@btrfronk.com</a>&gt;</span>

wrote:<br>

    <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

      <div link="blue" vlink="purple" lang="EN-US">

      <div>

      <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">I

had this happening and found out it was an MPLS circuit going down. 

Due

to location of this particular site, our 12mbps MPLS circuit is

supplied by

multiple T1s bonded with MLPPP.</span></p>

      <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span></p>

      <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">One

of the T1s was going up/down several times a day (telco problem) and

each time,

the MLPPP would reset for a couple seconds.   The MGCP gateway

responded by going into SRST and the PRI would go down for a moment.</span></p>

      <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span></p>

      <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Just

something to check</span></p>

      <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span></p>

      <div style="border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0in 0in;">

      <p class="MsoNormal"><b><span style="font-size: 10pt;">From:</span></b><span style="font-size: 10pt;"> <a href="mailto:cisco-voip-bounces@puck.nether.net" target="_blank">cisco-voip-bounces@puck.nether.net</a>

[mailto:<a href="mailto:cisco-voip-bounces@puck.nether.net" target="_blank">cisco-voip-bounces@puck.nether.net</a>]

      <b>On

Behalf Of </b>Wilson Hew<br>

      <b>Sent:</b> Monday, November 02, 2009 11:47 AM<br>

      <b>To:</b> <a href="mailto:cisco-voip@puck.nether.net" target="_blank">cisco-voip@puck.nether.net</a><br>

      <b>Subject:</b> [cisco-voip] MGCP - Fallback to SRST very often

even though

connectivity to CUCM is fine</span></p>

      </div>

      <div>

      <div>

      <p class="MsoNormal"> </p>

      <p class="MsoNormal">Hello there,<br>

      <br>

Greetings. I am having problem with my MGCP gateway, and I need little

help and

advice. My MGCP gateway is running as SRST, and it will fallback to

SRST very

often (twice a day). And it will go back to normal operation from

fallback just

after that. The connectivity from my MGCP gateway (remote site) to CUCM

is

fine.<br>

      <br>

I noticed my E1 is going down everytime when it falls back to SRST - is

it

considered normal?<br>

      <br>

My gateway is running 12.4(24)T1 and CUCM version 7.0.2.<br>

      <br>

In &#39;sh ccm-manager&#39;, I have the below:<br>

--------------------------------------------------------------<br>

TFTP retry count to shut Ports: 2<br>

      <br>

Statistics:<br>

    Packets recvd:   857<br>

    Recv failures:   1<br>

    Packets xmitted: 852<br>

    Xmit failures:   0<br>

--------------------------------------------------------------<br>

In &#39;sh mgcp stats&#39;:<br>

      <br>

 UDP pkts rx 557379, tx 558783<br>

 Unrecognized rx pkts 0, MGCP message parsing errors 0<br>

 Duplicate MGCP ack tx 9, Invalid versions count 0<br>

 CreateConn rx 36256, successful 36249, failed 7<br>

 DeleteConn rx 36274, successful 36101, failed 173<br>

 ModifyConn rx 66178, successful 66126, failed 52<br>

 DeleteConn tx 154, successful 154, failed 0<br>

 NotifyRequest rx 54652, successful 54516, failed 136<br>

 AuditConnection rx 3, successful 3, failed 0<br>

 AuditEndpoint rx 14887, successful 8080, failed 6807<br>

 RestartInProgress tx 6248, successful 6248, failed 0<br>

 Notify tx 342779, successful 342779, failed 0<br>

 ACK tx 201075, NACK tx 7191<br>

 ACK rx 349100, NACK rx 0<br>

 Collisions: Passive 0, Active 0<br>

--------------------------------------------------------------<br>

      <br>

Can I tell what is wrong with the above? Apart from that, I see numbers

of

slips in controllers e1 increasing, and I have

network-clock-participate

configured.<br>

      <br>

Would appreciate if you could give me your feedback about this. Any

feedback is

very much appreciated.<br>

      <br>

Thanks,<br>

Wil</p>

      </div>

      </div>

      </div>

      </div>

    </blockquote>

    </div>

    <br>

    <pre><hr size="4" width="90%">

_______________________________________________

cisco-voip mailing list

<a href="mailto:cisco-voip@puck.nether.net" target="_blank">cisco-voip@puck.nether.net</a>

<a href="https://puck.nether.net/mailman/listinfo/cisco-voip" target="_blank">https://puck.nether.net/mailman/listinfo/cisco-voip</a>

  </pre>

  </blockquote>

  <br>

  <pre><hr size="4" width="90%">

_______________________________________________

cisco-voip mailing list

<a href="mailto:cisco-voip@puck.nether.net" target="_blank">cisco-voip@puck.nether.net</a>

<a href="https://puck.nether.net/mailman/listinfo/cisco-voip" target="_blank">https://puck.nether.net/mailman/listinfo/cisco-voip</a>

  </pre>

</blockquote>

<br>

</div></div></div>

</blockquote></div><br>

</div></div></blockquote></div><br>