[j-nsp] Juniper PTX1000

Adam Vitkovsky Adam.Vitkovsky at gamma.co.uk
Wed Dec 21 05:50:56 EST 2016


> Saku Ytti
> Sent: Tuesday, December 20, 2016 7:22 PM
>
> On 20 December 2016 at 18:42,  <adamv0025 at netconsultings.com> wrote:
>
> > Both CRS-X and NCS6k are powered by nPower X1e NPU.
> > And my understanding is that it's Homogeneous(Same PPE type) MPSoC
> i.e. Symmetric MultiProcessing (SMP), much like all the chips out there (used
> in ASR9k or MX and PTX, ...).
> > The difference I understand is in the instruction set that the PPE is running.
> > And my guess is that threads on each PPE are using run to completion
> scheduling.
> > Let me know your thoughts please.
> >
> > And by pipeline with regards to NPU design I understand pipelining of
> arrays of PPEs where each array in the pipeline consists of PPEs dedicated to
> a specific function(parse search modify). -like in ASR9k.
>
> Current gen ASR9k, EZchip, is like Trio, ALU FP or Huawei Solar, many identical
> cores, fully programmable, essentially you're only limited by time in what you
> can do. Where as NCS5k/Arista/Jericho, PTX are ASIC/pipelines, with much
> more specialised hardware with lot less flexibility, but what they do do, they
> do far more efficiently, which means denser boxes are pragmatic.
> Roughly speaking pipeline/ASIC is great for core, DC, in Edge you often may
> require richer features offered by NPU designs, and density isn't that crucial.
>
With regards to raw processing speed comparison I don't think it matter that much whether it's an SMP(single PPE completely processes the packet head) or Pipeline (packet head is processed through a pipeline PPE stages -each specialized for different function (different instructions set)).
I think what matters the most is how much data does the PPE get (size of packet head that will be processed) and the amount of instructions in the set (#of computations/lookups -and resulting memory accesses).
Obviously apart from clock-rate and number of threads for each PPE of course.

A good example is QFP(ASR1K) and QFA(CRS3),
Same SMP architecture, but QFP PPE gets whole packet bodies and executes a massive instruction set on each resulting in very limited pps performance, whereas QFA PPE gets only packet heads and executes limited instructions set resulting in massive improvement of pps performance.
Another good example is the hyper-mode on MX PFE, by reducing the instruction set that each PPE executes on every packet head it needs to process you gain some extra pps performance.

What I'm trying to say is that it doesn't matter that much how are the PPEs organized on the NPU chip (SMP, Pipeline or even SIMD architecture).

adam


        Adam Vitkovsky
        IP Engineer

T:      0333 006 5936
E:      Adam.Vitkovsky at gamma.co.uk
W:      www.gamma.co.uk

This is an email from Gamma Telecom Ltd, trading as “Gamma”. The contents of this email are confidential to the ordinary user of the email address to which it was addressed. This email is not intended to create any legal relationship. No one else may place any reliance upon it, or copy or forward all or any of it in any form (unless otherwise notified). If you receive this email in error, please accept our apologies, we would be obliged if you would telephone our postmaster on +44 (0) 808 178 9652 or email postmaster at gamma.co.uk

Gamma Telecom Limited, a company incorporated in England and Wales, with limited liability, with registered number 04340834, and whose registered office is at 5 Fleet Place London EC4M 7RD and whose principal place of business is at Kings House, Kings Road West, Newbury, Berkshire, RG14 5BY.
---------------------------------------------------------------------------------------
 This email has been scanned for email related threats and delivered safely by Mimecast.
 For more information please visit http://www.mimecast.com
---------------------------------------------------------------------------------------


More information about the juniper-nsp mailing list