[VoiceOps] Adtran TA900 timing problems
Matt Yaklin
myaklin at g4.net
Mon Nov 4 16:54:20 EST 2013
On Mon, 4 Nov 2013, Jay Hennigan wrote:
> On 11/4/13 9:41 AM, Matt Yaklin wrote:
>>
>> I forgot to mention that the cisco T1s on the DS3 use line
>> timing in my setup. So line and line on both sides. The T1
>> and the Adtran.
>
> Don't do that then. :-)
>
> Every T1 span must have exactly one source of clocking.
>
Yea, I saw that as I was typing out my email but did not
change it. I just did now. Why it was set that way? I am unsure.
Probably because a mistake was made and then duplicated. Then
we probably followed the good old if it aint broken dont fix it.
I am sure I will be punished with a few support calls over the
next day for touching it. ;-)
But... see below...
> My recommendation is to configure the T3 controller on the 7206VXR to
> internal clocking on all of its T1s, and clock the Adtrans from T1 0/1
> (line).
>
> If both sides are set to internal, you'll get clock slips as the
> oscillators drift slightly with respect to each other.
>
> If both sides are set to line, then there's no reference clock. The
> line will probably sync up initially as the free-running T1 will be
> close enough to sync. Over time the frequency will drift up or down
> until one side can no longer sync. In some cases they'll recover on
> their own, and in others you need to shut/no shut or otherwise force a
> resync.
>
> So, don't do that.
>
But things get even more complicated then that if you wish to be
a clocking purist. Here we are talking about timing over a DS3. To
some that is ? humurous ?.
This is a post from the Taqua mailing list from Dave Long. Pretty
interesting if you are talking about timing and DS3s! I obviously had
a mistake in my config but even when corrected it is still, technically,
wrong?
I would love to know what people's opinions are of the post below.
I bet Paul has read it a few times now.
------------------------------------------------------------------------------
david.long at taqua.com
Springtime for Switching
Ahhh, its springtime! In the switch supplier business, that means two
things:
1) More release upgrades (since the holiday season is well behind us, we
get larger numbers of upgrade requests)
2) GR-303/PRI issues/outages due to network clocking problems.
Number 2 is somewhat related to number 1, but not always... But, that's
what I'm here to talk about...
When we deployed some of our first switches, we saw network clocking
reference problems due to not getting a valid clock reference to the
switch. We changed our installation process to check for this early in
the deployment process. The T7000 has a lot of logic to constantly
monitor and qualify multiple clock sources, and to switch to secondary and
tertiary references, issue logs/events/alarms as a result. So, we can
spot if we don't have a valid timing reference coming to the switch. We
will typically recommend a BITS clock if we see issues.
We now have 6 years of experience with the Broadband Interface Card (BIC).
And with the BIC came a different deployment model. Not only were the
DS3s deployed to the carrier side of the switch, they were also deployed
on the access side as well. The Digital Loop Carriers (DLCs) are
typically in remote areas, take DS1s muxed up with other DS1s with M13s to
a DS3, many times this transported via fiber/SONET transport to the
central office, where the DS3 is demuxed to DS1 and recombined with other
DS1s, remuxed to DS3s and connected to the T7000 BIC card. The DLCs will
then take their network clocking reference off of the DS1 that leads
(ultimately) to the T7000.
So, what's the problem? It's that a DS3 is not a valid clock transport.
That includes DS1's embedded within DS3's. Now, this is typically the
time where the confusion sets in... This is not a "Taqua rule" or a
"T7000 limitation." This is clearly laid out by Telcordia in the network
clocking standards. But, the Telcordia rule is not just some bureaucratic
decision, it's basically a statement of engineering and physics. The
problem is that a DS3 is not a synchronous transport, it is asynchronous.
When DS1s are packed into a DS3, transported and unpacked, the integrity
of the DS1 is not the same on the output as it is on the input. It is
close, but from a timing perspective, the process adds jitter and wander.
Throw in there Fiber/SONET transport, and bigger problems happen (pointer
adjustment/justification in particular). If I said "why not transport a
clocking reference as part of RTP in a VoIP network?" People would laugh.
But, DS3s have some of the same characteristics (asynchronous transport).
(2011 revision... IEEE 1588 is a protocol that does work over IP networks
to transport a clock synchronous... My point is still valid in that even
1588 does many things to adjust to jitter/wander and was designed to work
in an asynchronous network, where TDM clocking was not).
Keep in mind that clock reference transport requires much more precision
than voice traffic transport. Jitter and wander in the voice path may be
barely noticeable, but if two "ends" of the span are using this for
clocking, the results can be unpredictable. All bets are off when it
comes to High-Level Data Link Control (HDLC) channels. Being "a little
bit off" may mean that HDLC links do not want to sync up at all. For
GR303, these links are the TMC and the EOC. All DLCs need a network
reference for clocking and that can't be an embedded DS1 from a DS3
(unless your M13 mux injects a clock signal on the DS1, but these are
rare). For PRI, these links are D-channels.
You may be saying "but Dave, I've been clocking from embedded DS1s for
years and I've never had a problem?" That is the reason I'm writing this
note. A phenomenon called plesiosynchronous timing occurs in many cases.
This means that the clocking in a network can "coincidently" or "almost"
be synchronized. Your network can work PERFECTLY, for years, right until
it does not work at all. Then, it seems like nothing you do can get the
two ends talking to each other again. Eventually, after power cycling
muxes, DLCs, switches, etc., finally the ends will talk to each other.
The blame will typically lay with whatever the last box was that was power
cycled, kicked, or yelled at rudely.
The other item that conspires in this problem is that DLCs typically have
little if any clocking qualification logic and circuitry. So, even if
they would detect slips or sags in the timing, many are hard pressed to
report it.
Back at Tekelec, when I had responsibility for R&D on the T9000 and T7000,
we had an 8 hour T9000 outage of this nature. We had a team of a dozen
engineers looking at the software, another dozen TAC engineers looking at
all the provisioning. There wasn't anything that seemed to help. The
RDTs were power cycled, the switch was rebooted. Finally some of the
muxes were rebooted, at the same time someone typed the "ls" (directory
listing) command and everything synced up and worked like nothing
happened. It took weeks to convince this customer that the "ls" command
was not the reason for the fix. That customer did not believe that the
network clocking issue was to blame. Some months later, after we formed
the T7000 business unit and I no longer had the T9000, I walked by TAC one
day when they were fighting the same problem with the same customer. I
noticed an almost frantic typing of "ls" by the customer to no avail...
How can this "plesiosynchronous" thing work at all? Well, if you have one
of those "atomic clock" alarm clocks or wristwatches, you have a good
example of how network synchronization works. Your clock or watch has its
own quartz oscillator to keep time. Every so often it looks at the atomic
clock radio broadcast and resynchronizes. If the atomic clock signal gets
a little "loopy" between updates, no one knows. Think of it as the
network gets "into a rhythm" when it is plesiosynchronized. Everything is
fine until something upsets that rhythm.
"So, what upsets this "rhythm" in my network," you ask? A number of
things...
* Anytime any piece of gear is power cycled or rebooted (which
typically happens anytime one of these elements is upgraded). If an
upgrade was being performed, the thought is "something in the new software
load broke my network."
* Another cause is anytime significant change happens to the amount of
traffic over the DS3s. In particular, putting more DS1s into the DS3 can
cause disruption. This changes the packing in the DS3 and will change the
rhythm.
* And, the most hideous of them all is temperature fluctuations in a
collocation. Remember the quartz clock I was talking about? Most/all
DLCs use quartz oscillators of their own. The problem is quartz
oscillators are very susceptible to changing their frequency based on
temperature. And what happens in the spring? Many locations go through a
heating at night, cooling during the day HVAC cycle, where we see broad
temperature fluctuations in the office. There is one particular story
where timing slips occurred every time the CO tech used the door to go
outside for a smoke break.
So, while I've talked mostly about DLCs, the same issues can be found with
any signaling that uses an HDLC channel. In particular, we've seen this
with PRI and SS7. But, mostly the problem is with GR303.
So hopefully, if you've read this far, I've convinced you that this is a
problem. So, if you have this situation in your network, what can you do
about it?
* If you are using fiber equipment, many have the special capabilities
for a separate clocking feed and transport.
* Another is to add BITS timing at each of your locations. There are
BITS vendors that have a variety of "wireless" references that work very
well. Anything from GPS based, to taking a CDMA clock signal over the
air. The only downside is that they can be expensive (a few thousand
dollars) and if you don't have roof access for a GPS and don't have a CDMA
signal, they won't work.
* "Rent" a valid clocking source from your collocation provider.
Hopefully, there is a reasonable charge for this.
* We have heard of an inexpensive network regeneration device that
takes a DS1 signal coming out of a mux and is able to "reformulate" the
network reference before feeding into the DLC. Maybe someone on the
mailing list has heard of this or uses one and can report on it.
We have worked with a few "timing consultants" that can also analyze your
network. If you think have a need for this service, Larry Cooley
(larry.cooley at taqua.com<mailto:larry.cooley at taqua.com>) can refer
you to a couple of different shops.
"So Dave, I've read this far and I'm still awake! Do you have any
references that I can read to help me with this insomnia thing?" Why yes,
I do:
* Seriously, the one I like the best is "Engineering Networks for
Synchronization, CCS7, and ISDN" by P.K. Bhatnagar, IEEE Press, 1997 (Our
friend Whit Reeve is the Series Editor). I refer to this one a lot.
* If you have access to Bellcore (Telcordia) Standards:
* "Clocks for the Synchronized Network: Common Generic Criteria."
BELLCORE TA-NWT-001244, issue 2, Nov. 1992.
* "Digital network synchronization plan." BELLCORE Technical
Advisory TA-NWT-000436, issue 2, June 1993.
* American National Standard for Telecommunications. Synchronization
Interface Standard. ANSI T1.101-1994.
Hopefully, this information has been useful. If you have any questions,
send me email directly or reply to this message.
Thanks,
Dave
------------------------------------------------------------------------
> --
> Jay Hennigan - CCIE #7880 - Network Engineering - jay at impulse.net
> Impulse Internet Service - http://www.impulse.net/
> Your local telephone and internet company - 805 884-6323 - WB6RDV
> _______________________________________________
> VoiceOps mailing list
> VoiceOps at voiceops.org
> https://puck.nether.net/mailman/listinfo/voiceops
>
More information about the VoiceOps
mailing list