[Outages-discussion] [outages] Discord: Sever outages and Increased API errors

Carsten Bormann cabo at tzi.org
Sun Mar 22 12:30:14 EDT 2020


On 2020-03-21, at 05:04, Jay Ashworth via Outages <outages at outages.org> wrote:
> 
> Do I correctly understand, then, that the root cause was the broken golang library function / routine?

I wouldn’t put it this way.  It seems the root cause was a protocol that relied on a TCP connection always completely transporting a message, and not checking that transported message for completeness before putting it to use.  No library function on the receiving end can check that the sending application did not crash in mid-transmission and seemingly did an orderly close.

The message for protocol design 101: Do not use a TCP connection as the outer envelope, always use something transported inside that connection to properly delimit its end.
(Ironically, the JSON encoding used in this instance does provide such delimiting in most ways of using it; apparently it just wasn’t checked here before relying on a successful transfer.)

That message has been learned the hard way in multiple spaces; e.g., look up how TLS gained the closed_notify “closure alert”.

Grüße, Carsten



More information about the Outages-discussion mailing list