[cisco-voip] MGCP - Fallback to SRST very often even though connectivity to CUCM is fine

Tue Nov 3 11:56:46 EST 2009

AUEP 13561613 AALN/S2/0 at HQ-VG224-1stFlr MGCP 0.1
F: X

This AUEP is not 'normal'.
F = RequestedInfo
X = RequestIdentifier
Normal AUEP requests much more information.

CM only sends this type of AUEP after 2 missed keepalives from the MGCP 
gateway.

/Wes

On Tuesday, November 03, 2009 11:47:16 AM, Wilson Hew 
<wilsonhew at gmail.com> wrote:
> Hello Wes,
>
> Thank you so much for the information. It really benefits me!
>
> Btw, when you say the below AUEP is not 'normal', can you please help 
> to elaborate?
>
> ----------------------------------------------------------------------------------
> AUEP 76267 AALN/S2/SU0/0 at MLP-VG-01 MGCP 0.1
> F: X
> |<CLID::StandAloneCluster><NID::X.X.X.X><CT::1,100,132,1.204039><IP::X.X.X.X><DEV::><LVL::Significant><MASK::2000>
> ----------------------------------------------------------------------------------
>
> I found that in my SDI traces and I can see AUEP ACK received. 
> However, I got a shocked when I see this (more than 10 msgs received 
> within second, together):
>
> ----------------------------------------------------------------------------------
> NTFY 129359098 aaln/S2/SU0/3 at MLP-VG-01 MGCP 0.1
> N: ca at 172.22.7.1:2427 <http://ca@172.22.7.1:2427>
> X: 69
> O: L/hd
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204114><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.443 
> CCM|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204114><MN::MGCPEndPoint><MV::aaln/S2/SU0/3 at MLP-VG-01><DEV::><LVL::All><MASK::ffff>
> 11/03/2009 15:25:38.443 CCM|MGCPHandler received msg from: 172.23.8.251
>
> NTFY 129359097 *@MLP-VG-01 MGCP 0.1
> X: 0
> O:
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204115><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.443 
> CCM|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204115><MN::MGCPEndPoint><MV::*@MLP-VG-01><DEV::><LVL::All><MASK::ffff>
> 11/03/2009 15:25:38.443 CCM|MGCPHandler received msg from: 172.23.8.251
> ----------------------------------------------------------------------------------
>
> Followed by this (seeing phones keep alive timeout):
>
> ----------------------------------------------------------------------------------
> 11/03/2009 15:25:38.445 CCM|StationInit:   TCPPid=[ 1.100.9.210] Keep 
> alive timeout.|<CLID::StandAloneCluster><NID::
>
> and the below (is the below trying to tell MGCP gateway restarting?):
>
> 11/03/2009 15:25:38.485 CCM|MGCPInit - //// RSIP <restart> from 
> *@MLP-VG-01|<CLID::StandAloneCluster><NID::
>
> 11/03/2009 15:25:38.490 CCM|MGCPManager received DUPLICATE message 
> with TransId: 129359097|<CLID::StandAloneCluster><NID::
> ----------------------------------------------------------------------------------
>
> Lastly, CUCM is sending messages to MGCP gateway (more than 10 msgs 
> received within second, together):
>
> ----------------------------------------------------------------------------------
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204110><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.756 CCM|MGCPHandler send msg SUCCESSFULLY to: 
> 172.23.8.251
> 200 129359097
>
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204111><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.756 CCM|MGCPHandler send msg SUCCESSFULLY to: 
> 172.23.8.251
> 200 129359097
> ----------------------------------------------------------------------------------
>
> Looks to me the network is not stable or not consistent during that 
> period.
>
> Any feedback from the gurus out there is very much appreciated!
>
> Thanks,
> Wil
>
>
> On Tue, Nov 3, 2009 at 10:52 PM, Wes Sisk <wsisk at cisco.com 
> <mailto:wsisk at cisco.com>> wrote:
>
>     blah! I forgot to mention the service parameter.  CM Service
>     parameter "MGCP Retry Timeout Handling" configures the behavior
>     when a timeout is observed.  This allows marking the endpoint oos,
>     resetting just the port, unregistering the entire gateway. 
>     Unregistering then entire gateway is the default value.
>
>     /wes
>
>
>     On Tuesday, November 03, 2009 9:29:24 AM, Wes Sisk
>     <wsisk at cisco.com> <mailto:wsisk at cisco.com> wrote:
>>     timely question.
>>
>>     MGCP gateway can be viewed as:
>>
>>     MGCP Gateway
>>         mgcp/udp based registration and keepalives
>>         analog endpoints
>>            mgcp/udp based registration and transactions
>>         digital endpoints
>>            backhaul/tcp based
>>
>>     on CM if you see the alarm:
>>     MGCPGatewayLostComm then the top level mgcp process stopped
>>     communicating with CM.  Usually the GW sends keepalives to CM
>>     similar to:
>>     12/27/2005 10:16:40.173 CCM|MGCPHandler received msg from:
>>     10.10.33.250
>>     NTFY 333382 *@HQ-VG224-3rdFlr MGCP 0.1
>>     X: 0
>>     O:
>>     |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017474><IP::10.10.33.250><DEV::>
>>
>>
>>     If CM does not receive keepalive from gateway CM will attempt to
>>     query the gw with this message:
>>     12/27/2005 10:17:07.002 CCM|MGCPHandler send msg SUCCESSFULLY to:
>>     10.10.31.250
>>     AUEP 13561613 AALN/S2/0 at HQ-VG224-1stFlr MGCP 0.1
>>     F: X
>>     |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017448><IP::10.10.31.250><DEV::>
>>
>>
>>     This AUEP is not 'normal'.
>>     F = RequestedInfo
>>     X = RequestIdentifier
>>     Normal AUEP requests much more information. This is a special
>>     "hello, are you there" type exchange.
>>
>>     the gateway should respond:
>>     12/27/2005 10:17:07.002 CCM|MGCPHandler received msg from:
>>     10.10.31.250
>>     200 13561613
>>     X: 2
>>     |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017648><IP::10.10.31.250><DEV::>
>>
>>
>>
>>     This is getting very close to unregistration.  Another way to
>>     look at this is to look for indicates of lost messages to the
>>     gateway.  Each MGCP transaction is retransmitted up to 3 times if
>>     not ack'd.  You can see retries in the CM SDI traces:
>>     01/13/2005 10:34:33.603 CCM|MGCPHandler TransId: 1097943 Timeout.
>>     Retry#1
>>
>>     If you see frequent retries then you are intermittently dropping
>>     or excessively delaying the UDP packets carrying the MGCP payload.
>>
>>
>>     There is also an issue where endpoints may stop responding to
>>     CM.  CM will retry the transaction 3 times and then unregister
>>     the gateway.  This looks similar to the retries tracked above. 
>>     The main difference is that you will see valid exchanges with
>>     other endpoints on the gateway or you will see successful
>>     keepalives with the top level gateway MGCP process.  This was
>>     historically caused by CSCsf26617 and similar.  The signature of
>>     this failure is repeated retransmits of the DLCX, RQNT, or CRCX
>>     messages from CM to the gateway while other endpoints are
>>     responding.  If this is happening then the gateway is having an
>>     internal error such as resource allocation or dsp hang.
>>
>>     HTH.
>>
>>     /Wes
>>
>>
>>
>>     On Tuesday, November 03, 2009 3:39:24 AM, Wilson Hew
>>     <wilsonhew at gmail.com> <mailto:wilsonhew at gmail.com> wrote:
>>>     Bob/Ryan, appreciate your feedback. Thanks.
>>>
>>>     Guess I need to look at the connection between my MGCP gateway
>>>     and CUCM. Any idea what else I may need to check? I am looking
>>>     at the SDI traces, but have no idea what to look at.
>>>
>>>     Thanks,
>>>     Wil
>>>
>>>     On Tue, Nov 3, 2009 at 2:35 AM, Bob Fronk <bob at btrfronk.com
>>>     <mailto:bob at btrfronk.com>> wrote:
>>>
>>>         I had this happening and found out it was an MPLS circuit
>>>         going down.  Due to location of this particular site, our
>>>         12mbps MPLS circuit is supplied by multiple T1s bonded with
>>>         MLPPP.
>>>
>>>          
>>>
>>>         One of the T1s was going up/down several times a day (telco
>>>         problem) and each time, the MLPPP would reset for a couple
>>>         seconds.   The MGCP gateway responded by going into SRST and
>>>         the PRI would go down for a moment.
>>>
>>>          
>>>
>>>         Just something to check
>>>
>>>          
>>>
>>>         *From:* cisco-voip-bounces at puck.nether.net
>>>         <mailto:cisco-voip-bounces at puck.nether.net>
>>>         [mailto:cisco-voip-bounces at puck.nether.net
>>>         <mailto:cisco-voip-bounces at puck.nether.net>] *On Behalf Of
>>>         *Wilson Hew
>>>         *Sent:* Monday, November 02, 2009 11:47 AM
>>>         *To:* cisco-voip at puck.nether.net
>>>         <mailto:cisco-voip at puck.nether.net>
>>>         *Subject:* [cisco-voip] MGCP - Fallback to SRST very often
>>>         even though connectivity to CUCM is fine
>>>
>>>          
>>>
>>>         Hello there,
>>>
>>>         Greetings. I am having problem with my MGCP gateway, and I
>>>         need little help and advice. My MGCP gateway is running as
>>>         SRST, and it will fallback to SRST very often (twice a day).
>>>         And it will go back to normal operation from fallback just
>>>         after that. The connectivity from my MGCP gateway (remote
>>>         site) to CUCM is fine.
>>>
>>>         I noticed my E1 is going down everytime when it falls back
>>>         to SRST - is it considered normal?
>>>
>>>         My gateway is running 12.4(24)T1 and CUCM version 7.0.2.
>>>
>>>         In 'sh ccm-manager', I have the below:
>>>         --------------------------------------------------------------
>>>         TFTP retry count to shut Ports: 2
>>>
>>>         Statistics:
>>>             Packets recvd:   857
>>>             Recv failures:   1
>>>             Packets xmitted: 852
>>>             Xmit failures:   0
>>>         --------------------------------------------------------------
>>>         In 'sh mgcp stats':
>>>
>>>          UDP pkts rx 557379, tx 558783
>>>          Unrecognized rx pkts 0, MGCP message parsing errors 0
>>>          Duplicate MGCP ack tx 9, Invalid versions count 0
>>>          CreateConn rx 36256, successful 36249, failed 7
>>>          DeleteConn rx 36274, successful 36101, failed 173
>>>          ModifyConn rx 66178, successful 66126, failed 52
>>>          DeleteConn tx 154, successful 154, failed 0
>>>          NotifyRequest rx 54652, successful 54516, failed 136
>>>          AuditConnection rx 3, successful 3, failed 0
>>>          AuditEndpoint rx 14887, successful 8080, failed 6807
>>>          RestartInProgress tx 6248, successful 6248, failed 0
>>>          Notify tx 342779, successful 342779, failed 0
>>>          ACK tx 201075, NACK tx 7191
>>>          ACK rx 349100, NACK rx 0
>>>          Collisions: Passive 0, Active 0
>>>         --------------------------------------------------------------
>>>
>>>         Can I tell what is wrong with the above? Apart from that, I
>>>         see numbers of slips in controllers e1 increasing, and I
>>>         have network-clock-participate configured.
>>>
>>>         Would appreciate if you could give me your feedback about
>>>         this. Any feedback is very much appreciated.
>>>
>>>         Thanks,
>>>         Wil
>>>
>>>
>>>     ------------------------------------------------------------------------
>>>
>>>     _______________________________________________
>>>     cisco-voip mailing list
>>>     cisco-voip at puck.nether.net <mailto:cisco-voip at puck.nether.net>
>>>     https://puck.nether.net/mailman/listinfo/cisco-voip
>>>       
>>
>>     ------------------------------------------------------------------------
>>
>>     _______________________________________________
>>     cisco-voip mailing list
>>     cisco-voip at puck.nether.net <mailto:cisco-voip at puck.nether.net>
>>     https://puck.nether.net/mailman/listinfo/cisco-voip
>>       
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20091103/a572abbb/attachment.html>