[cisco-voip] Traffic Issues with 7900 Series Phones

Pawlowski, Adam ajp26 at buffalo.edu
Mon Nov 21 11:05:35 EST 2016


Good morning,

We are not using dot1x at this site. Our network guys have been working on an implementation of that within this building as a pilot, and we haven’t had any trouble. I did warn them there were some caveats so we would have to test, and to be aware of the CDP link-down signaling for the PC port, if we are operating in a non CDP environment that they shouldn’t expect this.

Regarding XML – nothing is running on the phone. At one point we ran a thing that polled devices for their network information, checked our port map database, and prompted users to confirm their location. That was system wide and has been disabled for some number of years.

“show proc” on the phone does not make it appear as though the CPU is being loaded, or anything is occurring on the phone. Presuming that it is averaged and not instantaneous, 80% idle and 9-10% on the JVM seems to be what we see everywhere.

Adam

From: bmeade90 at gmail.com [mailto:bmeade90 at gmail.com] On Behalf Of Brian Meade
Sent: Monday, November 21, 2016 10:55 AM
To: Wes Sisk (wsisk)
Cc: Pawlowski, Adam; cisco-voip at puck.nether.net
Subject: Re: [cisco-voip] Traffic Issues with 7900 Series Phones

Also are you running any XML applications on the phones that may be bogging down the CPU?

On Tue, Nov 15, 2016 at 11:31 AM, Wes Sisk (wsisk) <wsisk at cisco.com<mailto:wsisk at cisco.com>> wrote:
Adam,

Are you using dot1x? There are some interesting things in that space.

Otherwise, maybe get 9.4.2es3 to pickup the fix for
CSCuq88325    7965 7945 excessive core files cause phone stability problems


-w


On Nov 15, 2016, at 8:50 AM, Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>> wrote:

All,

We’re still looking at this with TAC, though the initial response was that the 7941, 7961, etc done with hardware and software support. There was an announcement on October 20th that said software maintenance ended immediately (oops). Our timers and such are ubiquitous across our network, all defaults, and we don’t have this problem elsewhere. I went with looking for MAC change traps and didn’t run into anything, going down that road. The phones don’t log any VLAN changes either in their logs.  The phones are going out of service for UCM Closed TCP or UCM Reset TCP, and we see what looks like the UCM not responding back with the proper SCCP KeepAliveAck, which causes the phone to sort of do nothing for 60 seconds. By then, since both the phone is waiting 60.0 seconds and the UCM is as well to hear from it, the connection is reset and closed.

Phones that are not sharing the data VLAN have been fine, but, we cannot implement that across this entire area due to the needed cabling, switchports, etc.

In another location we have these phones going what appears to be high CPU – the latency on the phone goes way up, with ICMP response, the response of the phone to buttons and actions, and the call suffers from high jitter and broken conversations. Oddly enough, when we cap with SPAN enabled on the phone, the data looks fine going through it. Power cycling the phone clears this temporarily.

Everyone thus far has wanted to go down the road of loss somewhere on the network, but, as we continue to take captures, we see the conversation complete at the UCM, and beyond the phone via “SPAN to PC port”, or at it with SPAN at the edge – the phone application itself is simply not responding in a timely manner, at least based on initial observation.

Given the earlier response that these devices are now done with support, this does not bode well, but, we are still looking.

Regards,

Adam Pawlowski
SUNYAB NCS

From: Wes Sisk (wsisk) [mailto:wsisk at cisco.com]
Sent: Tuesday, November 08, 2016 12:50 PM
To: Pawlowski, Adam
Cc: Tommy Schlotterer; cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Subject: Re: [cisco-voip] Traffic Issues with 7900 Series Phones

Not much visibility into L1/L2 on those phones; drop counters on the webpage or phone UI is about all you get.

Are the phones randomly unregistering? This is good baseline: https://supportforums.cisco.com/document/52176/understanding-sccp-phone-unregistration-and-failover-networks-perspective

If some sort of frame issue, correct, not many options.

What are the nature of messages being retransmitted?
Also, anything interesting looking in the log files?

One age old odd one is CDP timers out of sync btwn phone and switch. Phone keeps IP but gets dumped into data vlan.  Your choice on how to approach that.

One possibility: If phones are unregistering then check the lastoutofservice reason on the phone, in the CM traces, or in the RTMT reports if you’re on a new enough version. I *think* we got these phones fixed to say “vlan change’ or ‘cdp timeout’ or ‘ip change’ something like that if there were changes in the network interface.

Alternatively take a few phones stick them in a port that not trunked but in the voice vlan… do these exhibit the same problem?

next ‘heuristic’ guess after that is possibly arp cache refresh on the switch. have seen several issues where arp cache timeout was set low, switch re-arp for many devices concurrently, arp response dropped by input queue overflow and input queue drop. net result the switch ‘forgets’ which port that phone is on.

So…. where do the packets/frames EXIST and NOT EXIST in the network?

-Wes

On Nov 4, 2016, at 4:32 PM, Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>> wrote:

Wes,

Thanks, that's good to know about ICMP. We've seen phones that get into a state where they reply with response times all over the board, lossy, which, Reset/Restart from the UCM does not rectify. Powering the device down does clear the condition - the set is otherwise idle. I need to get into one of those via SSH and pull the CPU to see if it is up at that time, to see if there's an identifiable process that covers this.

We did get some captures from in front of the firewall where the UCM resides, and from a monitor session from the switch out at the edge where the phone is connected. We can see the UCM sending re-transmissions to the phone, and the phone eventually replying some time later. Unless there is a reason for us to try and get a copper tap on the segment between the switch and the phone, then, it would seem to be that there is some reason the phone is not replying to the UCM. There is nothing behind the phone, or any output buffer drops. Our delay here in reply is in some number of seconds, so I don't believe there's any buffering involved that would be to that extent.

What I fear is that if we get to a point where we can determine there is some frame that is an issue, these devices are past the point of any patching being done.... as of a few weeks ago. But, since replacing phones is not free and takes a bunch of time, I still have to come up with something. I only saw a bug for large sized ICMPv6 with nothing particularly helpful in the wording and the workaround of "don't do that" so I'm not hopeful.

We have our AM and SE aware of what is going on, and they've offered to help, so I'm hopeful we can eventually confirm the reason we're having trouble, even if we can't directly fix it.


Adam

-----Original Message-----
From: Wes Sisk (wsisk) [mailto:wsisk at cisco.com]
Sent: Friday, November 04, 2016 12:52 PM
To: Pawlowski, Adam
Cc: Tommy Schlotterer; cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Subject: Re: [cisco-voip] Traffic Issues with 7900 Series Phones

Phones process ICMP traffic with low priority and throttling. This was
implemented to stem DoS attempts. Consider looking more at Voice Quality
effects, retransmits in packet captures, or parsing CCM traces for round
trip times. As you state these phones are relatively late in life and
therefore relatively stable.

-Wes


On Nov 2, 2016, at 2:42 PM, Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>> wrote:

Tommy,

Sorry about that. These are a mixed bag. 41/61 both G and G-GE
phones, with the gigabit ones primarily. Some SCCP, some SIP, mostly
9.4.2SR1-1, but seen on 9.4.2SR2-2. PC attached or not, no difference, the
only difference we've been able to create that stops this, is changing the
data VLAN that runs through the phone to a different one, or something
null (with no PC).

Adam

-----Original Message-----
From: Tommy Schlotterer [mailto:tschlotterer at presidio.com]
Sent: Wednesday, November 02, 2016 2:37 PM
To: Pawlowski, Adam; cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Subject: RE: Traffic Issues with 7900 Series Phones

What specific Models of phones eg. 41s/61s? or 40s/60s?

Thanks

Tommy

Tommy Schlotterer | Systems Engineer
Presidio | www.presidio.com<http://www.presidio.com/>
20 N. Saint Clair, 3rd Floor, Toledo, OH 43604
D: 419.214.1415<tel:419.214.1415> | C: 419.706.0259<tel:419.706.0259> | tschlotterer at presidio.com<mailto:tschlotterer at presidio.com>

-----Original Message-----
From: cisco-voip [mailto:cisco-voip-bounces at puck.nether.net] On Behalf
Of Pawlowski, Adam
Sent: Wednesday, November 02, 2016 2:23 PM
To: cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Subject: [cisco-voip] Traffic Issues with 7900 Series Phones

After much hair pulling and frustration, I wanted to ask the group
here in case anyone has seen this or has any thought on what we should
be looking for.

We have a number of 7900 series phones that have been exhibiting
issues that appear to me to be that the phone is getting hung up on
something.
Some sort of frame or packet is screwing with the network chip/board
or the OS which is causing it trouble. I see missed traffic, missed
responses, high ICMP echo times - and phones that eventually get stuck
with their ICMP echo response times being all over the board - with
some report of call trouble and CMR showing crazy jitter. If I power
cycle the phone that clears and it works fine for a while.

I realize these items are pretty much end of useful life, pretty much
all done with software support, and are going to drop off of the
compatibility matrix and probably functional support in the near
future. But, while we still have a ton of them - has anyone noted any
particular type of traffic that causes the 7900 series phones grief?

I don't have loss on the network, there do not seem to be any
transient broadcast storms rolling by. We do see an increased amount
of mDNS, IPv6 (phones are v4 only) etc, but nothing stands out as
causing a particular problem. It just seems that whatever this is, is
causing a memory leak or something, wherein it gets bad enough that
things go to hell eventually.

Any thoughts?

Adam P
SUNYAB
_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip


This message w/attachments (message) is intended solely for the use of
the intended recipient(s) and may contain information that is
privileged, confidential or proprietary. If you are not an intended
recipient, please notify the sender, and then please delete and
destroy all copies and attachments. Please be advised that any review
or dissemination of, or the taking of any action in reliance on, the
information contained in or attached to this message is prohibited.

_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip


_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20161121/c3b1fa5b/attachment.html>


More information about the cisco-voip mailing list