[VoiceOps] Geographic redundancy
Mark R Lindsey
lindsey at e-c-group.com
Thu Aug 13 15:45:42 EDT 2009
It works in that (a) BroadWorks can synchronize between servers at
different sites. (b) When either of the sites fails, service is
restored automatically and quickly after the fault. (c) When the fault
is corrected, BroadWorks servers re-synchronize gracefully without
interruption to subscribers.
But the hard part is not the BroadWorks servers -- it's the access
network.
In most cases, the geographic distribution the voip servers
(BroadWorks application servers, in this discussion) means that you
have geographically-distributed SBCs. And such SBCs are usually not
call-state synchronized.
And typically that means that each SBC has a separate IP address
facing subscribers.
And most subscribers only have a single link to the access network.
The customer's device has to handle this properly, and the SBC has to
handle this properly.
For example, when a failure of one site occurs, you want all of your
devices to re-register through the secondary site. How long will it
take before they re-register? How will they detect that one site has
failed? What happens to calls headed toward the subscriber during the
period until the customer re-registers?
For the SBC, how will it handle the mass re-registration from
subscribers moving over from the other SBC? How will it protect the
core registrar (BroadWorks AS in our example) against attack?
Nathan, "Zero" impact on customers is incredibly expensive to achieve.
You can, in fact, engineer capacity to make this switching (and even
route flapping) graceful, but it means you have orders-of-magnitude
more expense in your access network. An SBC that can handle 10,000
subscribers today might be able to handle 100 subscribers if we need
to ensure zero new calls are lost, because each those subscribers has
to hammer away at the SBC doing polling. It's probably less costly to
bring each subscriber to each of your two sites, then to put more
failover at the customer premise (like a smart ALG).
Nevertheless, without call state synchronization in the SBCs, it may
not be possible to achieve full site-to-site failover with Nathan's
Zero affect on customers. For example: If a call started on SBC-site-A
and then fails to SBC-site-B, SBC-site-B would normally reject re-
INVITES for that dialog; therefore session audits, call hold/resume,
etc. can cause the standing call to drop.
Application Server / Call Server redundancy is great, and there's much
more to consider in fault-tolerant voip network designs.
On Aug 13, 2009, at 3:19 PM, Nathan Stratton wrote:
> On Thu, 13 Aug 2009, Mark Holloway wrote:
>
>> When you say "it works" - what is the impact to the customer?
>
> Zero
Mark R Lindsey lindsey at e-c-group.com http://e-c-group.com/~lindsey
+12293160013
More information about the VoiceOps
mailing list