[cisco-voip] MGCP - Fallback to SRST very often even though connectivity to CUCM is fine

Tue Nov 3 11:47:16 EST 2009

Hello Wes,

Thank you so much for the information. It really benefits me!

Btw, when you say the below AUEP is not 'normal', can you please help to
elaborate?

----------------------------------------------------------------------------------
AUEP 76267 AALN/S2/SU0/0 at MLP-VG-01 MGCP 0.1
F: X
|<CLID::StandAloneCluster><NID::X.X.X.X><CT::1,100,132,1.204039><IP::X.X.X.X><DEV::><LVL::Significant><MASK::2000>
----------------------------------------------------------------------------------

I found that in my SDI traces and I can see AUEP ACK received. However, I
got a shocked when I see this (more than 10 msgs received within second,
together):

----------------------------------------------------------------------------------
NTFY 129359098 aaln/S2/SU0/3 at MLP-VG-01 MGCP 0.1
N: ca at 172.22.7.1:2427
X: 69
O: L/hd
|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204114><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
11/03/2009 15:25:38.443
CCM|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204114><MN::MGCPEndPoint><MV::aaln/S2/SU0/3 at MLP-VG-01
><DEV::><LVL::All><MASK::ffff>
11/03/2009 15:25:38.443 CCM|MGCPHandler received msg from: 172.23.8.251

NTFY 129359097 *@MLP-VG-01 MGCP 0.1
X: 0
O:
|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204115><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
11/03/2009 15:25:38.443
CCM|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204115><MN::MGCPEndPoint><MV::*@MLP-VG-01><DEV::><LVL::All><MASK::ffff>
11/03/2009 15:25:38.443 CCM|MGCPHandler received msg from: 172.23.8.251
----------------------------------------------------------------------------------

Followed by this (seeing phones keep alive timeout):

----------------------------------------------------------------------------------
11/03/2009 15:25:38.445 CCM|StationInit:   TCPPid=[ 1.100.9.210] Keep alive
timeout.|<CLID::StandAloneCluster><NID::

and the below (is the below trying to tell MGCP gateway restarting?):

11/03/2009 15:25:38.485 CCM|MGCPInit - //// RSIP <restart> from
*@MLP-VG-01|<CLID::StandAloneCluster><NID::

11/03/2009 15:25:38.490 CCM|MGCPManager received DUPLICATE message with
TransId: 129359097|<CLID::StandAloneCluster><NID::
----------------------------------------------------------------------------------

Lastly, CUCM is sending messages to MGCP gateway (more than 10 msgs received
within second, together):

----------------------------------------------------------------------------------
|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204110><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
11/03/2009 15:25:38.756 CCM|MGCPHandler send msg SUCCESSFULLY to:
172.23.8.251
200 129359097

|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204111><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
11/03/2009 15:25:38.756 CCM|MGCPHandler send msg SUCCESSFULLY to:
172.23.8.251
200 129359097
----------------------------------------------------------------------------------

Looks to me the network is not stable or not consistent during that period.

Any feedback from the gurus out there is very much appreciated!

Thanks,
Wil

On Tue, Nov 3, 2009 at 10:52 PM, Wes Sisk <wsisk at cisco.com> wrote:

>  blah! I forgot to mention the service parameter.  CM Service parameter
> "MGCP Retry Timeout Handling" configures the behavior when a timeout is
> observed.  This allows marking the endpoint oos, resetting just the port,
> unregistering the entire gateway.  Unregistering then entire gateway is the
> default value.
>
> /wes
>
>
> On Tuesday, November 03, 2009 9:29:24 AM, Wes Sisk <wsisk at cisco.com><wsisk at cisco.com>wrote:
>
> timely question.
>
> MGCP gateway can be viewed as:
>
> MGCP Gateway
>     mgcp/udp based registration and keepalives
>     analog endpoints
>        mgcp/udp based registration and transactions
>     digital endpoints
>        backhaul/tcp based
>
> on CM if you see the alarm:
> MGCPGatewayLostComm then the top level mgcp process stopped communicating
> with CM.  Usually the GW sends keepalives to CM similar to:
> 12/27/2005 10:16:40.173 CCM|MGCPHandler received msg from: 10.10.33.250
> NTFY 333382 *@HQ-VG224-3rdFlr MGCP 0.1
> X: 0
> O:
> |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017474><IP::10.10.33.250><DEV::>
>
>
> If CM does not receive keepalive from gateway CM will attempt to query the
> gw with this message:
> 12/27/2005 10:17:07.002 CCM|MGCPHandler send msg SUCCESSFULLY to:
> 10.10.31.250
> AUEP 13561613 AALN/S2/0 at HQ-VG224-1stFlr MGCP 0.1
> F: X
> |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017448><IP::10.10.31.250><DEV::>
>
>
> This AUEP is not 'normal'.
> F = RequestedInfo
> X = RequestIdentifier
> Normal AUEP requests much more information. This is a special "hello, are
> you there" type exchange.
>
> the gateway should respond:
> 12/27/2005 10:17:07.002 CCM|MGCPHandler received msg from: 10.10.31.250
> 200 13561613
> X: 2
> |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017648><IP::10.10.31.250><DEV::>
>
>
>
> This is getting very close to unregistration.  Another way to look at this
> is to look for indicates of lost messages to the gateway.  Each MGCP
> transaction is retransmitted up to 3 times if not ack'd.  You can see
> retries in the CM SDI traces:
> 01/13/2005 10:34:33.603 CCM|MGCPHandler TransId: 1097943 Timeout. Retry#1
>
> If you see frequent retries then you are intermittently dropping or
> excessively delaying the UDP packets carrying the MGCP payload.
>
>
> There is also an issue where endpoints may stop responding to CM.  CM will
> retry the transaction 3 times and then unregister the gateway.  This looks
> similar to the retries tracked above.  The main difference is that you will
> see valid exchanges with other endpoints on the gateway or you will see
> successful keepalives with the top level gateway MGCP process.  This was
> historically caused by CSCsf26617 and similar.  The signature of this
> failure is repeated retransmits of the DLCX, RQNT, or CRCX messages from CM
> to the gateway while other endpoints are responding.  If this is happening
> then the gateway is having an internal error such as resource allocation or
> dsp hang.
>
> HTH.
>
> /Wes
>
>
>
> On Tuesday, November 03, 2009 3:39:24 AM, Wilson Hew <wilsonhew at gmail.com><wilsonhew at gmail.com>wrote:
>
> Bob/Ryan, appreciate your feedback. Thanks.
>
> Guess I need to look at the connection between my MGCP gateway and CUCM.
> Any idea what else I may need to check? I am looking at the SDI traces, but
> have no idea what to look at.
>
> Thanks,
> Wil
>
> On Tue, Nov 3, 2009 at 2:35 AM, Bob Fronk <bob at btrfronk.com> wrote:
>
>>  I had this happening and found out it was an MPLS circuit going down.
>> Due to location of this particular site, our 12mbps MPLS circuit is supplied
>> by multiple T1s bonded with MLPPP.
>>
>>
>>
>> One of the T1s was going up/down several times a day (telco problem) and
>> each time, the MLPPP would reset for a couple seconds.   The MGCP gateway
>> responded by going into SRST and the PRI would go down for a moment.
>>
>>
>>
>> Just something to check
>>
>>
>>
>> *From:* cisco-voip-bounces at puck.nether.net [mailto:
>> cisco-voip-bounces at puck.nether.net] *On Behalf Of *Wilson Hew
>> *Sent:* Monday, November 02, 2009 11:47 AM
>> *To:* cisco-voip at puck.nether.net
>> *Subject:* [cisco-voip] MGCP - Fallback to SRST very often even though
>> connectivity to CUCM is fine
>>
>>
>>
>> Hello there,
>>
>> Greetings. I am having problem with my MGCP gateway, and I need little
>> help and advice. My MGCP gateway is running as SRST, and it will fallback to
>> SRST very often (twice a day). And it will go back to normal operation from
>> fallback just after that. The connectivity from my MGCP gateway (remote
>> site) to CUCM is fine.
>>
>> I noticed my E1 is going down everytime when it falls back to SRST - is it
>> considered normal?
>>
>> My gateway is running 12.4(24)T1 and CUCM version 7.0.2.
>>
>> In 'sh ccm-manager', I have the below:
>> --------------------------------------------------------------
>> TFTP retry count to shut Ports: 2
>>
>> Statistics:
>>     Packets recvd:   857
>>     Recv failures:   1
>>     Packets xmitted: 852
>>     Xmit failures:   0
>> --------------------------------------------------------------
>> In 'sh mgcp stats':
>>
>>  UDP pkts rx 557379, tx 558783
>>  Unrecognized rx pkts 0, MGCP message parsing errors 0
>>  Duplicate MGCP ack tx 9, Invalid versions count 0
>>  CreateConn rx 36256, successful 36249, failed 7
>>  DeleteConn rx 36274, successful 36101, failed 173
>>  ModifyConn rx 66178, successful 66126, failed 52
>>  DeleteConn tx 154, successful 154, failed 0
>>  NotifyRequest rx 54652, successful 54516, failed 136
>>  AuditConnection rx 3, successful 3, failed 0
>>  AuditEndpoint rx 14887, successful 8080, failed 6807
>>  RestartInProgress tx 6248, successful 6248, failed 0
>>  Notify tx 342779, successful 342779, failed 0
>>  ACK tx 201075, NACK tx 7191
>>  ACK rx 349100, NACK rx 0
>>  Collisions: Passive 0, Active 0
>> --------------------------------------------------------------
>>
>> Can I tell what is wrong with the above? Apart from that, I see numbers of
>> slips in controllers e1 increasing, and I have network-clock-participate
>> configured.
>>
>> Would appreciate if you could give me your feedback about this. Any
>> feedback is very much appreciated.
>>
>> Thanks,
>> Wil
>>
>
> ------------------------------
>
> _______________________________________________
> cisco-voip mailing listcisco-voip at puck.nether.nethttps://puck.nether.net/mailman/listinfo/cisco-voip
>
>
> ------------------------------
>
> _______________________________________________
> cisco-voip mailing listcisco-voip at puck.nether.nethttps://puck.nether.net/mailman/listinfo/cisco-voip
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20091104/be61ef50/attachment.html>