[cisco-voip] MGCP - Fallback to SRST very often even though connectivity to CUCM is fine

Tue Nov 3 11:56:20 EST 2009

One more thing, I saw "CCM|MGCPHandler TransId: 1097943 Timeout. Retry#1" in
the SDI traces, but can't seem to find the #2 or #3 retry, which causes MGCP
gateway reset.

Thanks,
Wil

On Wed, Nov 4, 2009 at 12:47 AM, Wilson Hew <wilsonhew at gmail.com> wrote:

> Hello Wes,
>
> Thank you so much for the information. It really benefits me!
>
> Btw, when you say the below AUEP is not 'normal', can you please help to
> elaborate?
>
>
> ----------------------------------------------------------------------------------
> AUEP 76267 AALN/S2/SU0/0 at MLP-VG-01 MGCP 0.1
> F: X
>
> |<CLID::StandAloneCluster><NID::X.X.X.X><CT::1,100,132,1.204039><IP::X.X.X.X><DEV::><LVL::Significant><MASK::2000>
>
> ----------------------------------------------------------------------------------
>
> I found that in my SDI traces and I can see AUEP ACK received. However, I
> got a shocked when I see this (more than 10 msgs received within second,
> together):
>
>
> ----------------------------------------------------------------------------------
> NTFY 129359098 aaln/S2/SU0/3 at MLP-VG-01 MGCP 0.1
> N: ca at 172.22.7.1:2427
> X: 69
> O: L/hd
>
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204114><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.443
> CCM|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204114><MN::MGCPEndPoint><MV::aaln/S2/SU0/3 at MLP-VG-01
> ><DEV::><LVL::All><MASK::ffff>
> 11/03/2009 15:25:38.443 CCM|MGCPHandler received msg from: 172.23.8.251
>
> NTFY 129359097 *@MLP-VG-01 MGCP 0.1
> X: 0
> O:
>
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204115><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.443
> CCM|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204115><MN::MGCPEndPoint><MV::*@MLP-VG-01><DEV::><LVL::All><MASK::ffff>
> 11/03/2009 15:25:38.443 CCM|MGCPHandler received msg from: 172.23.8.251
>
> ----------------------------------------------------------------------------------
>
> Followed by this (seeing phones keep alive timeout):
>
>
> ----------------------------------------------------------------------------------
> 11/03/2009 15:25:38.445 CCM|StationInit:   TCPPid=[ 1.100.9.210] Keep alive
> timeout.|<CLID::StandAloneCluster><NID::
>
> and the below (is the below trying to tell MGCP gateway restarting?):
>
> 11/03/2009 15:25:38.485 CCM|MGCPInit - //// RSIP <restart> from
> *@MLP-VG-01|<CLID::StandAloneCluster><NID::
>
> 11/03/2009 15:25:38.490 CCM|MGCPManager received DUPLICATE message with
> TransId: 129359097|<CLID::StandAloneCluster><NID::
>
> ----------------------------------------------------------------------------------
>
> Lastly, CUCM is sending messages to MGCP gateway (more than 10 msgs
> received within second, together):
>
>
> ----------------------------------------------------------------------------------
>
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204110><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.756 CCM|MGCPHandler send msg SUCCESSFULLY to:
> 172.23.8.251
> 200 129359097
>
>
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204111><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.756 CCM|MGCPHandler send msg SUCCESSFULLY to:
> 172.23.8.251
> 200 129359097
>
> ----------------------------------------------------------------------------------
>
> Looks to me the network is not stable or not consistent during that period.
>
> Any feedback from the gurus out there is very much appreciated!
>
> Thanks,
> Wil
>
>
>
> On Tue, Nov 3, 2009 at 10:52 PM, Wes Sisk <wsisk at cisco.com> wrote:
>
>>  blah! I forgot to mention the service parameter.  CM Service parameter
>> "MGCP Retry Timeout Handling" configures the behavior when a timeout is
>> observed.  This allows marking the endpoint oos, resetting just the port,
>> unregistering the entire gateway.  Unregistering then entire gateway is the
>> default value.
>>
>> /wes
>>
>>
>> On Tuesday, November 03, 2009 9:29:24 AM, Wes Sisk <wsisk at cisco.com><wsisk at cisco.com>wrote:
>>
>> timely question.
>>
>> MGCP gateway can be viewed as:
>>
>> MGCP Gateway
>>     mgcp/udp based registration and keepalives
>>     analog endpoints
>>        mgcp/udp based registration and transactions
>>     digital endpoints
>>        backhaul/tcp based
>>
>> on CM if you see the alarm:
>> MGCPGatewayLostComm then the top level mgcp process stopped communicating
>> with CM.  Usually the GW sends keepalives to CM similar to:
>> 12/27/2005 10:16:40.173 CCM|MGCPHandler received msg from: 10.10.33.250
>> NTFY 333382 *@HQ-VG224-3rdFlr MGCP 0.1
>> X: 0
>> O:
>> |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017474><IP::10.10.33.250><DEV::>
>>
>>
>> If CM does not receive keepalive from gateway CM will attempt to query the
>> gw with this message:
>> 12/27/2005 10:17:07.002 CCM|MGCPHandler send msg SUCCESSFULLY to:
>> 10.10.31.250
>> AUEP 13561613 AALN/S2/0 at HQ-VG224-1stFlr MGCP 0.1
>> F: X
>> |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017448><IP::10.10.31.250><DEV::>
>>
>>
>> This AUEP is not 'normal'.
>> F = RequestedInfo
>> X = RequestIdentifier
>> Normal AUEP requests much more information. This is a special "hello, are
>> you there" type exchange.
>>
>> the gateway should respond:
>> 12/27/2005 10:17:07.002 CCM|MGCPHandler received msg from: 10.10.31.250
>> 200 13561613
>> X: 2
>> |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017648><IP::10.10.31.250><DEV::>
>>
>>
>>
>> This is getting very close to unregistration.  Another way to look at this
>> is to look for indicates of lost messages to the gateway.  Each MGCP
>> transaction is retransmitted up to 3 times if not ack'd.  You can see
>> retries in the CM SDI traces:
>> 01/13/2005 10:34:33.603 CCM|MGCPHandler TransId: 1097943 Timeout. Retry#1
>>
>> If you see frequent retries then you are intermittently dropping or
>> excessively delaying the UDP packets carrying the MGCP payload.
>>
>>
>> There is also an issue where endpoints may stop responding to CM.  CM will
>> retry the transaction 3 times and then unregister the gateway.  This looks
>> similar to the retries tracked above.  The main difference is that you will
>> see valid exchanges with other endpoints on the gateway or you will see
>> successful keepalives with the top level gateway MGCP process.  This was
>> historically caused by CSCsf26617 and similar.  The signature of this
>> failure is repeated retransmits of the DLCX, RQNT, or CRCX messages from CM
>> to the gateway while other endpoints are responding.  If this is happening
>> then the gateway is having an internal error such as resource allocation or
>> dsp hang.
>>
>> HTH.
>>
>> /Wes
>>
>>
>>
>> On Tuesday, November 03, 2009 3:39:24 AM, Wilson Hew
>> <wilsonhew at gmail.com> <wilsonhew at gmail.com> wrote:
>>
>> Bob/Ryan, appreciate your feedback. Thanks.
>>
>> Guess I need to look at the connection between my MGCP gateway and CUCM.
>> Any idea what else I may need to check? I am looking at the SDI traces, but
>> have no idea what to look at.
>>
>> Thanks,
>> Wil
>>
>> On Tue, Nov 3, 2009 at 2:35 AM, Bob Fronk <bob at btrfronk.com> wrote:
>>
>>>  I had this happening and found out it was an MPLS circuit going down.
>>> Due to location of this particular site, our 12mbps MPLS circuit is supplied
>>> by multiple T1s bonded with MLPPP.
>>>
>>>
>>>
>>> One of the T1s was going up/down several times a day (telco problem) and
>>> each time, the MLPPP would reset for a couple seconds.   The MGCP gateway
>>> responded by going into SRST and the PRI would go down for a moment.
>>>
>>>
>>>
>>> Just something to check
>>>
>>>
>>>
>>> *From:* cisco-voip-bounces at puck.nether.net [mailto:
>>> cisco-voip-bounces at puck.nether.net] *On Behalf Of *Wilson Hew
>>> *Sent:* Monday, November 02, 2009 11:47 AM
>>> *To:* cisco-voip at puck.nether.net
>>> *Subject:* [cisco-voip] MGCP - Fallback to SRST very often even though
>>> connectivity to CUCM is fine
>>>
>>>
>>>
>>> Hello there,
>>>
>>> Greetings. I am having problem with my MGCP gateway, and I need little
>>> help and advice. My MGCP gateway is running as SRST, and it will fallback to
>>> SRST very often (twice a day). And it will go back to normal operation from
>>> fallback just after that. The connectivity from my MGCP gateway (remote
>>> site) to CUCM is fine.
>>>
>>> I noticed my E1 is going down everytime when it falls back to SRST - is
>>> it considered normal?
>>>
>>> My gateway is running 12.4(24)T1 and CUCM version 7.0.2.
>>>
>>> In 'sh ccm-manager', I have the below:
>>> --------------------------------------------------------------
>>> TFTP retry count to shut Ports: 2
>>>
>>> Statistics:
>>>     Packets recvd:   857
>>>     Recv failures:   1
>>>     Packets xmitted: 852
>>>     Xmit failures:   0
>>> --------------------------------------------------------------
>>> In 'sh mgcp stats':
>>>
>>>  UDP pkts rx 557379, tx 558783
>>>  Unrecognized rx pkts 0, MGCP message parsing errors 0
>>>  Duplicate MGCP ack tx 9, Invalid versions count 0
>>>  CreateConn rx 36256, successful 36249, failed 7
>>>  DeleteConn rx 36274, successful 36101, failed 173
>>>  ModifyConn rx 66178, successful 66126, failed 52
>>>  DeleteConn tx 154, successful 154, failed 0
>>>  NotifyRequest rx 54652, successful 54516, failed 136
>>>  AuditConnection rx 3, successful 3, failed 0
>>>  AuditEndpoint rx 14887, successful 8080, failed 6807
>>>  RestartInProgress tx 6248, successful 6248, failed 0
>>>  Notify tx 342779, successful 342779, failed 0
>>>  ACK tx 201075, NACK tx 7191
>>>  ACK rx 349100, NACK rx 0
>>>  Collisions: Passive 0, Active 0
>>> --------------------------------------------------------------
>>>
>>> Can I tell what is wrong with the above? Apart from that, I see numbers
>>> of slips in controllers e1 increasing, and I have network-clock-participate
>>> configured.
>>>
>>> Would appreciate if you could give me your feedback about this. Any
>>> feedback is very much appreciated.
>>>
>>> Thanks,
>>> Wil
>>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> cisco-voip mailing listcisco-voip at puck.nether.nethttps://puck.nether.net/mailman/listinfo/cisco-voip
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> cisco-voip mailing listcisco-voip at puck.nether.nethttps://puck.nether.net/mailman/listinfo/cisco-voip
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20091104/f0025900/attachment.html>