[j-nsp] interpreting 10Gb interface "PCS statistics" values

Michael Loftis mloftis at wgops.com
Fri Oct 21 15:23:18 EDT 2016


Was hoping someone who knew more could chime in...but it's measured in
seconds basically because the PCS (physical coding sublayer) does NOT
keep detailed statistics...so the "Seconds" value means there were X
distinct seconds in which an error was flagged in that category...the
previous response detailing bit vs errored blocks I think is wrong.
The PCS layer can repair single bit errors, thus a second with one or
more single bit (but correctable!) errors is a "bit errored second" -
if it is unabled to correct and recover a valid PCS block then you get
the "errored block" seconds...

It's not a raw count of the number of those errors, just that it
occurred in a ~1s window X times.  You can totally get PCS errors
unplugging an optic or otherwise shutting down the remote end.  You
can totally get spurious PCS errors from a marginal ish link that
shows PLENTY of light (SNR is low or a marginal cable).  in MX
specifically it *can* in very rare circumstances indicate a problem
even between the optic and the MIC....most of the time my suggestion
for PCS errors is clear counters and check in 1h and 24h.  If you get
a significant number of errored seconds in a 24h period then
check/clean ends and patches, maybe replace optics.

Also beware, lots of DOM bugs in various JunOS releases cause the DOM
values to get stuck, and it can be hard or impossible to check in a
non outage causing way (sometimes you can safely bend the patch cable
and observe the increase in loss to verify your DOM values aren't
stuck) - I've had this most commonly in the past on DPC cards but have
also observed it in MPC cards.  The DOM data is also highly dependent
upon the optic itself and there's a LOT of buggy stuff out there so
it's not all juniper's fault there.


On Fri, Oct 21, 2016 at 11:07 AM, David B Funk
<dbfunk at engineering.uiowa.edu> wrote:
> Thanks guys but this isn't what I was asking.
>
> The optical power is similar (within a few tenths of a dBm) at my end, down
> by 3 dBm at the far end of the link that is having issues (-6.23 dBm as
> opposed to -3.73 dBm) but not enough to explain what I'm seeing.
>
> The big question I have is: What does "30 Seconds" mean for an attribute
> that by description of the docs is supposed to be number of PCS blocks with
> invalid Sync headers?
> Particularly when the guy on the Cisco at the other end says his error
> counters are going up like crazy (and packets are being dropped) while the
> stats my end stays constant at "30 Seconds".
> What does that mean?
>
> The particularly frustrating thing is that data streams are dropping packets
> (EG iperf3 showing retries and seriously degraded performance) but none of
> the interface stats are showing any values that indicate an issue other than
> that "30 Seconds".
>
> Can anybody tell me what "30 Seconds" means (in the context of an error
> counter)?
>
>
>
>
> On Fri, 21 Oct 2016, Christopher Costa wrote:
>
>> Here's my notes from a jtac review about these a couple years ago:
>>
>>
>>
>> [pcs] encoding is continually transmitting to keep the line in sync. The
>> PCS layer is directly below the MAC layer so for MX,
>> it’s on the MIC. PCS errors can be caused by anything MIC or lower, i.e.
>> transceiver, fiber, line equipment, etc.
>>
>>
>>
>>  PCS functionality:
>>  ===================
>>  IEEE 802.3ae 10GbE interfaces use a 64B/66B encoder/decoder in the
>> PHY-PCS (Physical Coding Sub layer) to allow reasonable
>> clock recovery and facilitate alignment of the data stream at the
>> receiver.
>>  As the scheme name suggests, 64 bits of data on the MAC layer are
>> transmitted as a 66-bit code block on the PHY layer, which
>> realizes easier clock/timing synchronization. A 66-bit code block contains
>> a 2-bit Sync. Header + 8 octets data/control field.
>>   If the Sync. header is '01', the 8 octets are entirely data.
>>  If the Sync. header is '10', an 8-bit Type field follows, plus 56 bits of
>> data/control field.
>>   The 8 octets data/control field is scrambled by using a self-synchronous
>> scrambler to achieve complete DC-balance on the
>> serial line.
>>  PCS statistics displays PCS fault conditions by checking valid Sync.
>> headers received with every 66 bits interval, so that we
>> can monitor 10Gbps high speed transmission line quality.
>>   If the 64B/66B receiver does not detect the 2-bit Sync.
>>  Header with regular 66-bit interval and it estimates the high BER (Bit
>> Error Rate of >10^-4), PCS statistics will report a
>> problem.
>>   PCS statistics :
>>  ================
>>  - "Bit errors" indicates the number of PCS blocks with invalid Sync
>> headers.
>>  - "Errored blocks" indicates the number of PCS blocks with a valid Sync.
>> header but invalid block format.
>>
>>
>> On Fri, Oct 21, 2016 at 9:37 AM, Michael Carey <mcarey at kinber.org> wrote:
>>       David,
>>
>>       When I've seen PCS statistical errors before, it pointed to either a
>>       failing optic that needed replaced in our MX or a drastic change in
>> optical
>>       light levels caused by an OSP fiber issue.  How do your "show
>> interface
>>       diagnostic optic" levels look?
>>
>>       On Wed, Oct 19, 2016 at 7:40 PM, David B Funk
>> <dbfunk at engineering.uiowa.edu>
>>       wrote:
>>
>>       > I've got a couple of 10Gig-eth interfaces (xe- on MX480) of which
>> I'm
>>       > trying to interpret the "PCS statistics" values.
>>       >
>>       > One of them is pretty steady at:
>>       >
>>       >   PCS statistics                      Seconds
>>       >     Bit errors                             4
>>       >     Errored blocks                         4
>>       >
>>       > The other one seems to vary with the values ranging from 10 to 70.
>>       > EG:
>>       >
>>       >   PCS statistics                      Seconds
>>       >     Bit errors                            61
>>       >     Errored blocks                        69
>>       >
>>       > The second interface will will trigger a number of error
>> conditions at the
>>       > other end which terminates in a Cisco router with out showing any
>> error
>>       > conditions at my end (EG BPDU Error: None, MAC-REWRITE Error:
>> None,
>>       > CRC/Align errors 0, FIFO errors 0, etc..) During some of these
>> times I'll
>>       > see significant packet loss and others see minimal problems.
>>       >
>>       > According to Juniper docs the PCS statistics should mean:
>>       >
>>       >  PCS statistics
>>       >   (10-Gigabit Ethernet interfaces) Displays Physical Coding
>> Sublayer (PCS)
>>       > fault
>>       >   conditions from the WAN PHY or the LAN PHY device.
>>       >
>>       >     Bit errors—High bit error rate. Indicates the number of bit
>> errors
>>       > when the
>>       >       PCS receiver is operating in normal mode.
>>       >     Errored blocks—Loss of block lock. The number of errored
>> blocks when
>>       > PCS
>>       >       receiver is operating in normal mode.
>>       >
>>       > But I don't know how to interpret a value of "16 seconds" with
>> that
>>       > definition.
>>       > Can anybody shed some light on what those numbers mean.
>>       >
>>       > Thanks.
>>       >
>>       >
>>       > --
>>       > Dave Funk                                  University of Iowa
>>       > <dbfunk (at) engineering.uiowa.edu>        College of Engineering
>>       > 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
>>       > Sys_admin/Postmaster/cell_admin            Iowa City, IA
>> 52242-1527
>>       > #include <std_disclaimer.h>
>>       > Better is not better, 'standard' is better. B{
>>       > _______________________________________________
>>       > juniper-nsp mailing list juniper-nsp at puck.nether.net
>>       > https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
>>
>>
>>
>> --
>>
>>
>> [image: photo]
>> *Michael Carey*
>> Director of Operations, KINBER
>> 717-963-7490
>>
>> <717-963-7490?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>> | 814-777-5027
>>
>> <814-777-5027?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>> | mcarey at kinber.org | 5775 Allentown Blvd., Suite 101, Harrisburg, PA
>> 17112
>>
>> <https://www.facebook.com/Keystone-Initiative-for-Network-Based-Education-and-Research-188743104566075/?utm_source=WiseStamp&
>> utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>
>> <http://www.twitter.com/kinber?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>
>> <https://www.linkedin.com/company/kinber-keystone-initiative-for-network-based-education-and-research-?utm_source=WiseStamp&u
>> tm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>
>> <https://vimeo.com/kinber/videos?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
>>
>>
>>
>> --
>> Chris Costa
>> ∆○×□
>>
>>
>
> --
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler


More information about the juniper-nsp mailing list