[cisco-voip] MGCP - Fallback to SRST very often even though connectivity to CUCM is fine

Wed Nov 4 09:35:43 EST 2009

No, there is no multiple paths back to CUCM.

On Wed, Nov 4, 2009 at 7:07 PM, Philip Walenta <pwalenta at wi.rr.com> wrote:

>  By any chance are there multiple paths to your CUCM systems?
>
>
>
> *From:* cisco-voip-bounces at puck.nether.net [mailto:
> cisco-voip-bounces at puck.nether.net] *On Behalf Of *Wilson Hew
> *Sent:* Tuesday, November 03, 2009 10:56 AM
> *To:* Wes Sisk
> *Cc:* cisco-voip at puck.nether.net
> *Subject:* Re: [cisco-voip] MGCP - Fallback to SRST very often even though
> connectivity to CUCM is fine
>
>
>
> One more thing, I saw "CCM|MGCPHandler TransId: 1097943 Timeout. Retry#1"
> in the SDI traces, but can't seem to find the #2 or #3 retry, which causes
> MGCP gateway reset.
>
> Thanks,
> Wil
>
> On Wed, Nov 4, 2009 at 12:47 AM, Wilson Hew <wilsonhew at gmail.com> wrote:
>
> Hello Wes,
>
> Thank you so much for the information. It really benefits me!
>
> Btw, when you say the below AUEP is not 'normal', can you please help to
> elaborate?
>
>
> ----------------------------------------------------------------------------------
> AUEP 76267 AALN/S2/SU0/0 at MLP-VG-01 MGCP 0.1
> F: X
>
> |<CLID::StandAloneCluster><NID::X.X.X.X><CT::1,100,132,1.204039><IP::X.X.X.X><DEV::><LVL::Significant><MASK::2000>
>
> ----------------------------------------------------------------------------------
>
> I found that in my SDI traces and I can see AUEP ACK received. However, I
> got a shocked when I see this (more than 10 msgs received within second,
> together):
>
>
> ----------------------------------------------------------------------------------
> NTFY 129359098 aaln/S2/SU0/3 at MLP-VG-01 MGCP 0.1
> N: ca at 172.22.7.1:2427
> X: 69
> O: L/hd
>
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204114><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.443
> CCM|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204114><MN::MGCPEndPoint><MV::aaln/S2/SU0/3 at MLP-VG-01
> ><DEV::><LVL::All><MASK::ffff>
> 11/03/2009 15:25:38.443 CCM|MGCPHandler received msg from: 172.23.8.251
>
> NTFY 129359097 *@MLP-VG-01 MGCP 0.1
> X: 0
> O:
>
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204115><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.443
> CCM|<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204115><MN::MGCPEndPoint><MV::*@MLP-VG-01><DEV::><LVL::All><MASK::ffff>
> 11/03/2009 15:25:38.443 CCM|MGCPHandler received msg from: 172.23.8.251
>
> ----------------------------------------------------------------------------------
>
> Followed by this (seeing phones keep alive timeout):
>
>
> ----------------------------------------------------------------------------------
> 11/03/2009 15:25:38.445 CCM|StationInit:   TCPPid=[ 1.100.9.210] Keep alive
> timeout.|<CLID::StandAloneCluster><NID::
>
> and the below (is the below trying to tell MGCP gateway restarting?):
>
> 11/03/2009 15:25:38.485 CCM|MGCPInit - //// RSIP <restart> from
> *@MLP-VG-01|<CLID::StandAloneCluster><NID::
>
> 11/03/2009 15:25:38.490 CCM|MGCPManager received DUPLICATE message with
> TransId: 129359097|<CLID::StandAloneCluster><NID::
>
> ----------------------------------------------------------------------------------
>
> Lastly, CUCM is sending messages to MGCP gateway (more than 10 msgs
> received within second, together):
>
>
> ----------------------------------------------------------------------------------
>
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204110><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.756 CCM|MGCPHandler send msg SUCCESSFULLY to:
> 172.23.8.251
> 200 129359097
>
>
> |<CLID::StandAloneCluster><NID::172.22.7.1><CT::1,100,132,1.204111><IP::172.23.8.251><DEV::><LVL::Significant><MASK::2000>
> 11/03/2009 15:25:38.756 CCM|MGCPHandler send msg SUCCESSFULLY to:
> 172.23.8.251
> 200 129359097
>
> ----------------------------------------------------------------------------------
>
> Looks to me the network is not stable or not consistent during that period.
>
> Any feedback from the gurus out there is very much appreciated!
>
> Thanks,
> Wil
>
>
>
>  On Tue, Nov 3, 2009 at 10:52 PM, Wes Sisk <wsisk at cisco.com> wrote:
>
> blah! I forgot to mention the service parameter.  CM Service parameter
> "MGCP Retry Timeout Handling" configures the behavior when a timeout is
> observed.  This allows marking the endpoint oos, resetting just the port,
> unregistering the entire gateway.  Unregistering then entire gateway is the
> default value.
>
> /wes
>
>
>
> On Tuesday, November 03, 2009 9:29:24 AM, Wes Sisk <wsisk at cisco.com><wsisk at cisco.com>wrote:
>
>  timely question.
>
> MGCP gateway can be viewed as:
>
> MGCP Gateway
>     mgcp/udp based registration and keepalives
>     analog endpoints
>        mgcp/udp based registration and transactions
>     digital endpoints
>        backhaul/tcp based
>
> on CM if you see the alarm:
> MGCPGatewayLostComm then the top level mgcp process stopped communicating
> with CM.  Usually the GW sends keepalives to CM similar to:
> 12/27/2005 10:16:40.173 CCM|MGCPHandler received msg from: 10.10.33.250
> NTFY 333382 *@HQ-VG224-3rdFlr MGCP 0.1
> X: 0
> O:
> |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017474><IP::10.10.33.250><DEV::>
>
>
> If CM does not receive keepalive from gateway CM will attempt to query the
> gw with this message:
> 12/27/2005 10:17:07.002 CCM|MGCPHandler send msg SUCCESSFULLY to:
> 10.10.31.250
> AUEP 13561613 AALN/S2/0 at HQ-VG224-1stFlr MGCP 0.1
> F: X
> |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017448><IP::10.10.31.250><DEV::>
>
>
> This AUEP is not 'normal'.
> F = RequestedInfo
> X = RequestIdentifier
> Normal AUEP requests much more information. This is a special "hello, are
> you there" type exchange.
>
> the gateway should respond:
> 12/27/2005 10:17:07.002 CCM|MGCPHandler received msg from: 10.10.31.250
> 200 13561613
> X: 2
> |<CLID::MFCU-CM-1-Cluster><NID::10.10.200.11><CT::2,100,66,1.23017648><IP::10.10.31.250><DEV::>
>
>
>
> This is getting very close to unregistration.  Another way to look at this
> is to look for indicates of lost messages to the gateway.  Each MGCP
> transaction is retransmitted up to 3 times if not ack'd.  You can see
> retries in the CM SDI traces:
> 01/13/2005 10:34:33.603 CCM|MGCPHandler TransId: 1097943 Timeout. Retry#1
>
> If you see frequent retries then you are intermittently dropping or
> excessively delaying the UDP packets carrying the MGCP payload.
>
>
> There is also an issue where endpoints may stop responding to CM.  CM will
> retry the transaction 3 times and then unregister the gateway.  This looks
> similar to the retries tracked above.  The main difference is that you will
> see valid exchanges with other endpoints on the gateway or you will see
> successful keepalives with the top level gateway MGCP process.  This was
> historically caused by CSCsf26617 and similar.  The signature of this
> failure is repeated retransmits of the DLCX, RQNT, or CRCX messages from CM
> to the gateway while other endpoints are responding.  If this is happening
> then the gateway is having an internal error such as resource allocation or
> dsp hang.
>
> HTH.
>
> /Wes
>
>
>
> On Tuesday, November 03, 2009 3:39:24 AM, Wilson Hew <wilsonhew at gmail.com><wilsonhew at gmail.com>wrote:
>
>  Bob/Ryan, appreciate your feedback. Thanks.
>
> Guess I need to look at the connection between my MGCP gateway and CUCM.
> Any idea what else I may need to check? I am looking at the SDI traces, but
> have no idea what to look at.
>
> Thanks,
> Wil
>
> On Tue, Nov 3, 2009 at 2:35 AM, Bob Fronk <bob at btrfronk.com> wrote:
>
> I had this happening and found out it was an MPLS circuit going down.  Due
> to location of this particular site, our 12mbps MPLS circuit is supplied by
> multiple T1s bonded with MLPPP.
>
>
>
> One of the T1s was going up/down several times a day (telco problem) and
> each time, the MLPPP would reset for a couple seconds.   The MGCP gateway
> responded by going into SRST and the PRI would go down for a moment.
>
>
>
> Just something to check
>
>
>
> *From:* cisco-voip-bounces at puck.nether.net [mailto:
> cisco-voip-bounces at puck.nether.net] *On Behalf Of *Wilson Hew
> *Sent:* Monday, November 02, 2009 11:47 AM
> *To:* cisco-voip at puck.nether.net
> *Subject:* [cisco-voip] MGCP - Fallback to SRST very often even though
> connectivity to CUCM is fine
>
>
>
> Hello there,
>
> Greetings. I am having problem with my MGCP gateway, and I need little help
> and advice. My MGCP gateway is running as SRST, and it will fallback to SRST
> very often (twice a day). And it will go back to normal operation from
> fallback just after that. The connectivity from my MGCP gateway (remote
> site) to CUCM is fine.
>
> I noticed my E1 is going down everytime when it falls back to SRST - is it
> considered normal?
>
> My gateway is running 12.4(24)T1 and CUCM version 7.0.2.
>
> In 'sh ccm-manager', I have the below:
> --------------------------------------------------------------
> TFTP retry count to shut Ports: 2
>
> Statistics:
>     Packets recvd:   857
>     Recv failures:   1
>     Packets xmitted: 852
>     Xmit failures:   0
> --------------------------------------------------------------
> In 'sh mgcp stats':
>
>  UDP pkts rx 557379, tx 558783
>  Unrecognized rx pkts 0, MGCP message parsing errors 0
>  Duplicate MGCP ack tx 9, Invalid versions count 0
>  CreateConn rx 36256, successful 36249, failed 7
>  DeleteConn rx 36274, successful 36101, failed 173
>  ModifyConn rx 66178, successful 66126, failed 52
>  DeleteConn tx 154, successful 154, failed 0
>  NotifyRequest rx 54652, successful 54516, failed 136
>  AuditConnection rx 3, successful 3, failed 0
>  AuditEndpoint rx 14887, successful 8080, failed 6807
>  RestartInProgress tx 6248, successful 6248, failed 0
>  Notify tx 342779, successful 342779, failed 0
>  ACK tx 201075, NACK tx 7191
>  ACK rx 349100, NACK rx 0
>  Collisions: Passive 0, Active 0
> --------------------------------------------------------------
>
> Can I tell what is wrong with the above? Apart from that, I see numbers of
> slips in controllers e1 increasing, and I have network-clock-participate
> configured.
>
> Would appreciate if you could give me your feedback about this. Any
> feedback is very much appreciated.
>
> Thanks,
> Wil
>
>
>
> ------------------------------
>
>
>
> _______________________________________________
>
> cisco-voip mailing list
>
> cisco-voip at puck.nether.net
>
> https://puck.nether.net/mailman/listinfo/cisco-voip
>
>
>
>
>
> ------------------------------
>
>
>
> _______________________________________________
>
> cisco-voip mailing list
>
> cisco-voip at puck.nether.net
>
> https://puck.nether.net/mailman/listinfo/cisco-voip
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20091104/b6c280e3/attachment.html>