[c-nsp] Bug with IOS-XR and SPAN ports?

Thu Jan 12 12:01:16 EST 2017

On 22 December 2016 at 13:58,  <adamv0025 at netconsultings.com> wrote:
> Hi James,
>
>> James Bensley
>> Sent: Thursday, December 22, 2016 10:06 AM
>>
>> On 15 December 2016 at 14:00,  <adamv0025 at netconsultings.com> wrote:
>> > Hi,
>> >
>> >> James Bensley
>> >> Sent: Thursday, December 15, 2016 12:29 PM Is it something to do with
>> >> the central arbiter counting the packets at
>> > ingress
>> >> twice (once for the "normal" transit packet and once for the
>> >> duplicate of
>> > the
>> >> packet made for the SPAN session) so when the SPAN destination port
>> >> becomes congested its dropping the ingress traffic even though the
>> > original
>> >> ingress packet is destined for a different non-congested port?
>> >>
>>
>> > I think that the central arbiter actually has to count the packets
>> > twice to keep track of how much is being sent over the fabric as well
>> > as to both, the original NPU and the SPAN NPU.
>> >
>> > The problem however I'd see in how the ingress FIA queues the SPAN
>> > traffic -SPAN packets should have the lowest fabric priority so should
>> > be sitting with other/legit low priority packets in the low priority
>> > VOQ for a given egress NPU.
>>
>> I think the SPAN traffic should be sitting below legit unmarked traffic in
> fact
>> (in an ideal world) as the lowest of the three priorities for most people
> will be
>> Internet traffic which customers are paying for so it shouldn't be dropped
>> because of congestion caused by a SPAN session (in my opinion).
>>
>> In the scenario that Hank is describing, I think you are correct that the
> arbiter
>> needs to count the packets entering the FIA twice so that the switch
> fabric
>> doesn't become congested however it needs to differentiate between
>> packets that have come from a SPAN source and those that haven't. In this
>> case with congestion on the SPAN egress port, dropping the legitimate
> traffic
>> at ingress is not "right", there needs to be backpressure within the SPAN
>> session to stop duplicating the packets (somehow?) until the egress SPAN
>> port has more capacity.
>> Perhaps another queue priority across the switch fabric?
>>

Hi Adam,

Super late response, festive period an all that :)

> I was reading the original post again and now I think the VOQs should have
> nothing to do with this.
> ASR9k has a very granular VOQ architecture -that is, VOQs/VQIs per egress
> port(10GE entity)&fabric-priority-level(3 Levels).
>
> Case1:
> If the SPAN traffic is destined to the same egress physical port and has the
> same fabric priority as regular traffic, only then the regular traffic will
> compete with SPAN traffic for a specific FIA VOQ buffer during egress port
> congestion.
> (would have to be same egress port just different sub-interface).
> - Expected behaviour.
>
> Case2:
> If the SPAN traffic is destined to the same egress NPU as regular traffic
> (just a different port).
> And the egress NPU gets congested, for example high pps in combination with
> lot of features enabled, QOS, ABF, NetFlow...
> Then all traffic (starting from lowest fabric priority first) will
> experience the effects of backpressure (till the egress NPU can cope with
> the pps rate).
> (NPU will also issue WAN backpressure in this case).
> - Expected behaviour.
>
> Case3: (most likely)
> Ingress NPU is overloaded, for example high pps in combination with a lot of
> features enabled, QOS, NetFlow and SPAN).
> In this case NPU will initiate WAN backpressure and the EFD will start
> dropping packets before they get to NPU's pipeline starting from
> packets(heads) placed by pre-classifier(ICU TOP) into low priority ICFD
> queues.
> (NPU will also issue Fabric backpressure in this case).
> - Expected behaviour.

I don't think this is "case 3" an ingress NPU issue (although its
really up to Hank and his TAC support to confirm) but reading his
original email, in monitor session 1 he has 11Gbps of source ports
SPAN'ed to 10Gbps destination port(s).

So I am reading it as Case 4:
The regular traffic ingressing the SPAN source ports is routed to
"some" destination ports/NPUs. The SPAN traffic is sent to a port on a
potentially different line card or NPU to the routed regular traffic.
When the SPAN port or NPU becomes congest back pressure is triggered
across the switch fabric to the ingress NPU where it is not
differentiating between SPAN traffic and regular traffic.

I agree with your statement, another priority fabric is needed for SPAN traffic.

Hopefully Hank will be able to confirm what's what soon.

Cheers,
James.