[Outages-discussion] [outages] SMS issues

Jeremy Chadwick jdc at koitsu.org
Mon Jul 8 18:29:37 EDT 2013


On Mon, Jul 08, 2013 at 10:02:27PM +0000, Dobbins, Roland wrote:
> 
> On Jul 9, 2013, at 3:49 AM, Jared Geiger wrote:
> 
> >  These TCP connections are the ones that transfer all the messages. Usually there are hundreds per large carrier.
> 
> Does this traffic run across the public Internet (either raw or VPNned), or is it WAN traffic?

Moving to -discussion because of the nature of the convo at this stage.

The setup varies -- there is no consistency -- so your question is
extremely subjective.

I won't name any names for what should be obvious reasons, but the few
SMPP providers (that includes mobile carriers) I had the "pleasure" of
dealing with used either a) VPNs atop private transport (that they
themselves could provide), or b) VPNs atop Internet-based transport.
Most in my experience were (a), and sometimes with (b) even private
routes were exchanged.

The problems I saw were not transport nor VPN-related, but almost always
(~85% of the time) service-level, as in the SMPP servers the providers
ran would crap themselves and the support staff responsible for
maintaining them were mindless/had no actual familiarity with things
other than "restart the service" or other nonsense.  Other times those
production servers were placed in "cold standby" datacenters, which
engineering would forget (or choose to forget) during "datacenter testing"
events, operating under the mentality "nothing here is used in
production".

SMPP in general, in my experience, is treated very badly, meaning it's
considered "expendable" by some carriers during outage situations.
Didn't get those driving directions you wanted?  The attitude is "oh
well, the customer will surely try again..." -- which is sadly true.
Transient failures are considered "the norm" when it comes to any kind
of mobile technology at this point in people's lives -- just another one
of many reasons I don't own a mobile phone.

SMPP is an awful protocol to troubleshoot/debug as well, mainly because
many of the daemons hide the inner workings, or you're forced to
write/develop your own (and pray the programmers do proper logging).
This is further compounded by complex situations, such as the providers
intentionally throttling SMPP submissions (as in "you've sent too many
messages in the past N minutes, so we're not accepting any more until
later"), which makes for a great situation when there's an outage on
their side.  This can sometimes take 3-4 hours to fully recover from,
depending on how long the outage was to begin with (and depending upon
how many SMPP transactions there are).  Ever wonder why your SMS
messages take hours to arrive?  *cough*  This is even further compounded
by other complications, such as "behavioural incompatibilities" between
implementations, where after 4 hours on a phone call with engineers,
someone deep within the bowels of the company -- i.e. one of the guys
who actually knows the protocol -- will admit "that's not how we
actually behave in that kind of situation, we do it like this instead",
causing you to have to create one-off solutions to deal with that
circumstance.

Finally, there are some "intermediary" SMPP providers that act as common
hubs across multiple carriers -- think an Internet eXchange but for
SMPP, e.g. Company X peers/has SMPP arrangements with AT&T, Verizon,
Sprint, T-Mobile, and quite a few others, so you establish a
relationship with Company X rather than with each provider directly.
Draw your own conclusions as to whether or not this is a Good Thing or a
Bad Thing -- there are pros and cons to such.

P.S. -- Do mobile providers in Europe and Asia even use SMS any more?
I've understood it to be a predominantly US thing at this point, but
this is just going off of what my European and Asian friends have
relayed to me over the years.

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |



More information about the Outages-discussion mailing list