[cisco-voip] Traffic Issues with 7900 Series Phones

Wes Sisk (wsisk) wsisk at cisco.com
Tue Nov 22 09:45:12 EST 2016


The device must remain listening on a port in case another device on the subnet is upgrading and needs the firmware.



On Nov 22, 2016, at 9:02 AM, Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>> wrote:

I just want to ask this – and it may be unrelated – but it can’t hurt.

When we had attempted to use “Peer Firmware Sharing” in the past, it was a total disaster. We applied a new load and a large percentage of devices just sat there and did nothing on the firmware download screen until they were reset. That was …. Several years ago.

Since it was suggested earlier that we go and look at settings to see what is going on, I noted that a number of these devices have PFS enabled in the configurations somehow. I plan to turn it off, but, before I do – my understanding is that this mechanism is idle once the phone is booted and is only applicable when an upgrade is taking place. There are a couple of bugs out there for UDP traffic to the PFS port causing issues with the phone, as well as the phones getting stuck in configuring IP and not functioning (we have seen some at this site just bomb off and refuse to do anything other than arp for the gateway, despite a successful DHCP transaction. They won’t communicate with anything on the same subnet either until power cycled). Some of these bugs seem to be open or in 9.4(2) in general.

Is this mechanism still “alive” while the phone is operating?  Again I plan to turn it off, but, just curious.

Adam



From: bmeade90 at gmail.com<mailto:bmeade90 at gmail.com> [mailto:bmeade90 at gmail.com] On Behalf Of Brian Meade
Sent: Monday, November 21, 2016 10:55 AM
To: Wes Sisk (wsisk)
Cc: Pawlowski, Adam; cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Subject: Re: [cisco-voip] Traffic Issues with 7900 Series Phones

Also are you running any XML applications on the phones that may be bogging down the CPU?

On Tue, Nov 15, 2016 at 11:31 AM, Wes Sisk (wsisk) <wsisk at cisco.com<mailto:wsisk at cisco.com>> wrote:
Adam,

Are you using dot1x? There are some interesting things in that space.

Otherwise, maybe get 9.4.2es3 to pickup the fix for
CSCuq88325    7965 7945 excessive core files cause phone stability problems


-w


On Nov 15, 2016, at 8:50 AM, Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>> wrote:

All,

We’re still looking at this with TAC, though the initial response was that the 7941, 7961, etc done with hardware and software support. There was an announcement on October 20th that said software maintenance ended immediately (oops). Our timers and such are ubiquitous across our network, all defaults, and we don’t have this problem elsewhere. I went with looking for MAC change traps and didn’t run into anything, going down that road. The phones don’t log any VLAN changes either in their logs.  The phones are going out of service for UCM Closed TCP or UCM Reset TCP, and we see what looks like the UCM not responding back with the proper SCCP KeepAliveAck, which causes the phone to sort of do nothing for 60 seconds. By then, since both the phone is waiting 60.0 seconds and the UCM is as well to hear from it, the connection is reset and closed.

Phones that are not sharing the data VLAN have been fine, but, we cannot implement that across this entire area due to the needed cabling, switchports, etc.

In another location we have these phones going what appears to be high CPU – the latency on the phone goes way up, with ICMP response, the response of the phone to buttons and actions, and the call suffers from high jitter and broken conversations. Oddly enough, when we cap with SPAN enabled on the phone, the data looks fine going through it. Power cycling the phone clears this temporarily.

Everyone thus far has wanted to go down the road of loss somewhere on the network, but, as we continue to take captures, we see the conversation complete at the UCM, and beyond the phone via “SPAN to PC port”, or at it with SPAN at the edge – the phone application itself is simply not responding in a timely manner, at least based on initial observation.

Given the earlier response that these devices are now done with support, this does not bode well, but, we are still looking.

Regards,

Adam Pawlowski
SUNYAB NCS

From: Wes Sisk (wsisk) [mailto:wsisk at cisco.com]
Sent: Tuesday, November 08, 2016 12:50 PM
To: Pawlowski, Adam
Cc: Tommy Schlotterer; cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Subject: Re: [cisco-voip] Traffic Issues with 7900 Series Phones

Not much visibility into L1/L2 on those phones; drop counters on the webpage or phone UI is about all you get.

Are the phones randomly unregistering? This is good baseline: https://supportforums.cisco.com/document/52176/understanding-sccp-phone-unregistration-and-failover-networks-perspective

If some sort of frame issue, correct, not many options.

What are the nature of messages being retransmitted?
Also, anything interesting looking in the log files?

One age old odd one is CDP timers out of sync btwn phone and switch. Phone keeps IP but gets dumped into data vlan.  Your choice on how to approach that.

One possibility: If phones are unregistering then check the lastoutofservice reason on the phone, in the CM traces, or in the RTMT reports if you’re on a new enough version. I *think* we got these phones fixed to say “vlan change’ or ‘cdp timeout’ or ‘ip change’ something like that if there were changes in the network interface.

Alternatively take a few phones stick them in a port that not trunked but in the voice vlan… do these exhibit the same problem?

next ‘heuristic’ guess after that is possibly arp cache refresh on the switch. have seen several issues where arp cache timeout was set low, switch re-arp for many devices concurrently, arp response dropped by input queue overflow and input queue drop. net result the switch ‘forgets’ which port that phone is on.

So…. where do the packets/frames EXIST and NOT EXIST in the network?

-Wes

On Nov 4, 2016, at 4:32 PM, Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>> wrote:

Wes,

Thanks, that's good to know about ICMP. We've seen phones that get into a state where they reply with response times all over the board, lossy, which, Reset/Restart from the UCM does not rectify. Powering the device down does clear the condition - the set is otherwise idle. I need to get into one of those via SSH and pull the CPU to see if it is up at that time, to see if there's an identifiable process that covers this.

We did get some captures from in front of the firewall where the UCM resides, and from a monitor session from the switch out at the edge where the phone is connected. We can see the UCM sending re-transmissions to the phone, and the phone eventually replying some time later. Unless there is a reason for us to try and get a copper tap on the segment between the switch and the phone, then, it would seem to be that there is some reason the phone is not replying to the UCM. There is nothing behind the phone, or any output buffer drops. Our delay here in reply is in some number of seconds, so I don't believe there's any buffering involved that would be to that extent.

What I fear is that if we get to a point where we can determine there is some frame that is an issue, these devices are past the point of any patching being done.... as of a few weeks ago. But, since replacing phones is not free and takes a bunch of time, I still have to come up with something. I only saw a bug for large sized ICMPv6 with nothing particularly helpful in the wording and the workaround of "don't do that" so I'm not hopeful.

We have our AM and SE aware of what is going on, and they've offered to help, so I'm hopeful we can eventually confirm the reason we're having trouble, even if we can't directly fix it.


Adam

-----Original Message-----
From: Wes Sisk (wsisk) [mailto:wsisk at cisco.com]
Sent: Friday, November 04, 2016 12:52 PM
To: Pawlowski, Adam
Cc: Tommy Schlotterer; cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Subject: Re: [cisco-voip] Traffic Issues with 7900 Series Phones

Phones process ICMP traffic with low priority and throttling. This was
implemented to stem DoS attempts. Consider looking more at Voice Quality
effects, retransmits in packet captures, or parsing CCM traces for round
trip times. As you state these phones are relatively late in life and
therefore relatively stable.

-Wes


On Nov 2, 2016, at 2:42 PM, Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>> wrote:

Tommy,

Sorry about that. These are a mixed bag. 41/61 both G and G-GE
phones, with the gigabit ones primarily. Some SCCP, some SIP, mostly
9.4.2SR1-1, but seen on 9.4.2SR2-2. PC attached or not, no difference, the
only difference we've been able to create that stops this, is changing the
data VLAN that runs through the phone to a different one, or something
null (with no PC).

Adam

-----Original Message-----
From: Tommy Schlotterer [mailto:tschlotterer at presidio.com]
Sent: Wednesday, November 02, 2016 2:37 PM
To: Pawlowski, Adam; cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Subject: RE: Traffic Issues with 7900 Series Phones

What specific Models of phones eg. 41s/61s? or 40s/60s?

Thanks

Tommy

Tommy Schlotterer | Systems Engineer
Presidio | www.presidio.com<http://www.presidio.com/>
20 N. Saint Clair, 3rd Floor, Toledo, OH 43604
D: 419.214.1415<tel:419.214.1415> | C: 419.706.0259<tel:419.706.0259> | tschlotterer at presidio.com<mailto:tschlotterer at presidio.com>

-----Original Message-----
From: cisco-voip [mailto:cisco-voip-bounces at puck.nether.net] On Behalf
Of Pawlowski, Adam
Sent: Wednesday, November 02, 2016 2:23 PM
To: cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Subject: [cisco-voip] Traffic Issues with 7900 Series Phones

After much hair pulling and frustration, I wanted to ask the group
here in case anyone has seen this or has any thought on what we should
be looking for.

We have a number of 7900 series phones that have been exhibiting
issues that appear to me to be that the phone is getting hung up on
something.
Some sort of frame or packet is screwing with the network chip/board
or the OS which is causing it trouble. I see missed traffic, missed
responses, high ICMP echo times - and phones that eventually get stuck
with their ICMP echo response times being all over the board - with
some report of call trouble and CMR showing crazy jitter. If I power
cycle the phone that clears and it works fine for a while.

I realize these items are pretty much end of useful life, pretty much
all done with software support, and are going to drop off of the
compatibility matrix and probably functional support in the near
future. But, while we still have a ton of them - has anyone noted any
particular type of traffic that causes the 7900 series phones grief?

I don't have loss on the network, there do not seem to be any
transient broadcast storms rolling by. We do see an increased amount
of mDNS, IPv6 (phones are v4 only) etc, but nothing stands out as
causing a particular problem. It just seems that whatever this is, is
causing a memory leak or something, wherein it gets bad enough that
things go to hell eventually.

Any thoughts?

Adam P
SUNYAB
_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip


This message w/attachments (message) is intended solely for the use of
the intended recipient(s) and may contain information that is
privileged, confidential or proprietary. If you are not an intended
recipient, please notify the sender, and then please delete and
destroy all copies and attachments. Please be advised that any review
or dissemination of, or the taking of any action in reliance on, the
information contained in or attached to this message is prohibited.

_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip


_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20161122/8604ffb9/attachment.html>


More information about the cisco-voip mailing list