[c-nsp] ASR1K forwarding failures on 10G SPA's

Fri Oct 7 04:21:42 EDT 2016

Cc: Back to list for documentation purposes

Hi Stephen,

sorry for the late reply. Yes, there is a new IOS XE version having the
NTP bug fixed. Otherwise they claim an NTP access-list might help as a workaround.

The bug is:

 CSCva52489 Input queue wedge. NTP packets destined to networks configured
on router

Good luck!
Sascha

Am 04.10.2016 um 23:58 schrieb Stephen Fulton:
> Hi Sascha,
> 
> That's it exactly!
> 
>   Input queue: 376/375/872/0 (size/max/drops/flushes); Total output drops: 0
> 
> (I had cleared the interface a few moments prior).
> 
> Thanks for the SR.  Was your case resolved?
> 
> -- Stephen
> 
> On 2016-10-04 5:15 PM, Sascha E. Pollok wrote:
>> (Not replying to the list but all folks who joined the discussion)
>>
>> Hi Stephen,
>>
>> the drops were high or the input queue? What we've seen before was that the queue was
>> filled exactly one packet more than the maximum. Our case was SR ***.
>>
>> Let me know if this matched your case.
>>
>> Cheers
>> Sascha
>>
>> Am 04.10.2016 um 15:55 schrieb Stephen Fulton:
>>> Gentlemen,
>>>
>>> Interesting, I checked this morning and the input drops were very high, despite being
>>> cleared 12 hours ago on a router no longer in production. If anyone has an TAC case they
>>> can reference (privately or otherwise) I'd appreciate it, as I have a TAC case open now.
>>> I'll wait on updating IOS-XE from 3.16.3.S until TAC is ready.
>>>
>>> Thanks,
>>>
>>> -- Stephen
>>>
>>> On 2016-10-04 1:57 AM, Sascha Pollok wrote:
>>>> Exactly. OP might try to raise hold-queue xx in on those interfaces. If
>>>> it solves the problem temporarily (!) he found it.
>>>>
>>>> If so, show buffers input-interfacw should give a hint.
>>>> The NTP bug came up pretty recently (2 months or so?) so it could
>>>> actually be the cause.
>>>>
>>>> -Sascha
>>>>
>>>> Am 4. Oktober 2016 07:45:36 schrieb Mark Tees <marktees at gmail.com>:
>>>>
>>>>> That sounds like what I experienced in ASR920 land recently with bad
>>>>> packets filling up interface input queues causing a wedge.
>>>>>
>>>>> When it happens check the interface input queues and save the output.
>>>>>
>>>>> The resolution for us so far has been tight CoPP with discards, iACLs,
>>>>> and the like to only allow things towards the boxes that are as
>>>>> trusted as possible.
>>>>>
>>>>> On Tuesday, 4 October 2016, Sascha Pollok <sp at iphh.net
>>>>> <mailto:sp at iphh.net>> wrote:
>>>>>
>>>>>     Just to make sure: latest IOS XE version? Its not the NTP
>>>>>     processing bug filling up interface queues? How does the input
>>>>>     queue look on the affected interfaces?
>>>>>
>>>>>     Cheers
>>>>>     Sascha
>>>>>
>>>>>
>>>>>     Am 4. Oktober 2016 05:33:39 schrieb Stephen Fulton
>>>>>     <sf at lists.esoteric.ca>:
>>>>>
>>>>>         ISIS adjacencies drop as well as BGP sessions on neighboring
>>>>>         devices drop.
>>>>>
>>>>>         Issue just reoccurred.
>>>>>
>>>>>         -- Stephen
>>>>>
>>>>>         On 2016-10-03 10:59 PM, Scott Granados wrote:
>>>>>
>>>>>             Anything logged while this happens?
>>>>>
>>>>>                 On Oct 3, 2016, at 10:52 PM, Stephen Fulton
>>>>>                 <sf at lists.esoteric.ca> wrote:
>>>>>
>>>>>                 Hi all,
>>>>>
>>>>>                 I have run into a number of forwarding failure events
>>>>>                 on ASR1K's with 10G SPA's.  These have occurred across
>>>>>                 a range of IOS-XE versions, using various ROMMON
>>>>>                 versions and across two different ASR1K platforms
>>>>>                 (1002's and 1004's).  Multiple SPA's have been
>>>>>                 replaced, IOS-XE versions and ROMMON versions upgraded
>>>>>                 and in the case of the ASR1004's, SIP's replaced (both
>>>>>                 SIP10 and SIP40's).  TAC cases have been opened
>>>>>                 several times.
>>>>>
>>>>>                 What occurs is forwarding across an interface fails
>>>>>                 completely.  The easiest way to find it is the lack of
>>>>>                 ARP entries on the interface/sub-interface, due to
>>>>>                 time-outs, but traffic is still attempting to traverse
>>>>>                 the interface.  When I ping the IP address associated
>>>>>                 with the failed interface, it fails.  ARP resolution
>>>>>                 of any neighbors fails, and neighboring devices on the
>>>>>                 same broadcast domain cannot reach it - though will
>>>>>                 see its MAC in the ARP table.
>>>>>
>>>>>                 In all cases, ISIS and MPLS was configured on the
>>>>>                 interfaces.  BFD has been on some, not on others.
>>>>>
>>>>>                 I recently found learned of another organization that
>>>>>                 saw the same behavior on an ASR1006 with 10G SPA's.
>>>>>                 SPA's and SIP's were replaced and the last advice they
>>>>>                 received from TAC was that if it occurred again the
>>>>>                 chassis would need to replaced.  It did but they chose
>>>>>                 not to replace the chassis and simply stopped using
>>>>>                 10G entirely.
>>>>>
>>>>>                 Has anyone else seen this?
>>>>>
>>>>>                 -- Stephen
>>>>