<div dir="auto">Here is the latest RCA from Microsoft on the DNS issues<div dir="auto"><div style="font-family:sans-serif;font-size:12.8px" dir="auto"><div style="height:80px"></div><div style="width:352px;margin:16px 0px"><div><div><div dir="ltr"><div></div><div><div><div style="color:rgb(136,136,136)"></div><div style="color:rgb(136,136,136)"></div><div style="color:rgb(136,136,136)"></div><table align="center" style="margin:auto;font-family:'segoe ui','arial','verdana','helvetica',sans-serif;border-spacing:0px!important;border-collapse:collapse!important;table-layout:auto!important;width:320px!important"><tbody><tr></tr><tr><td colspan="2" style="padding:10px 24px;font-size:15px;line-height:20px"><span style="font-size:12px">STATUS: </span><br><span style="font-weight:bold">RCA</span></td></tr><tr><td colspan="2" style="padding:10px 24px;font-size:15px;line-height:20px"><span style="font-size:12px">COMMUNICATION: </span><br><span style="font-size:18px"><p><strong>Summary of Impact: </strong>Between 21:21 UTC and 22:00 UTC on 1 Apr 2021, Azure DNS experienced a service availability issue. This resulted in customers being unable to resolve domain names for services they use, which resulted in intermittent failures accessing or managing Azure and Microsoft services. Due to the nature of DNS, the impact of the issue was observed across multiple regions. Recovery time varied by service, but the majority of services recovered by <a dir="ltr" style="text-decoration-line:underline!important">22:30 UTC</a>.</p><p></p><p><strong>Root Cause:</strong> Azure DNS servers experienced an anomalous surge in DNS queries from across the globe targeting a set of domains hosted on Azure. Normally, Azure’s layers of caches and traffic shaping would mitigate this surge. In this incident, one specific sequence of events exposed a code defect in our DNS service that reduced the efficiency of our DNS Edge caches. As our DNS service became overloaded, DNS clients began frequent retries of their requests which added workload to the DNS service. Since client retries are considered legitimate DNS traffic, this traffic was not dropped by our volumetric spike mitigation systems. This increase in traffic led to decreased availability of our DNS service.</p><p></p><p><strong>Mitigation:</strong> The decrease in service availability triggered our monitoring systems and engaged our engineers. Our DNS services automatically recovered themselves by <a dir="ltr" style="text-decoration-line:underline!important">22:00 UTC</a>. This recovery time exceeded our design goal, and our engineers prepared additional serving capacity and the ability to answer DNS queries from the volumetric spike mitigation system in case further mitigation steps were needed. The majority of services were fully recovered by <a dir="ltr" style="text-decoration-line:underline!important">22:30 UTC</a>. Immediately after the incident, we updated the logic on the volumetric spike mitigation system to protect the DNS service from excessive retries.</p><p></p><p><strong>Next Steps:</strong> We apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):</p><ul><li>Repair the code defect so that all requests can be efficiently handled in cache.</li><li>Improve the automatic detection and mitigation of anomalous traffic patterns.</li></ul><div style="color:rgb(136,136,136)"><p></p><p></p></div></span></td></tr></tbody></table><div style="color:rgb(136,136,136)"><br style="color:rgb(0,0,0)"><br></div></div><div style="color:rgb(136,136,136)"><div><br></div><div><div style="direction:ltr">Catherine Durig</div></div></div></div></div></div></div></div><div style="height:41px"></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 1, 2021, 6:16 PM Jason Kuehl via Outages <<a href="mailto:outages@outages.org" target="_blank" rel="noreferrer">outages@outages.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Kind of an extreme April fools' joke.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 1, 2021 at 6:24 PM Jeffrey Ollie via Outages <<a href="mailto:outages@outages.org" rel="noreferrer noreferrer" target="_blank">outages@outages.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Can confirm that all of our Azure DNS was offline for a few minutes. Seems to be coming back now.</div><div><br></div><div>Jeff</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 1, 2021 at 4:59 PM Cary Wiedemann via Outages <<a href="mailto:outages@outages.org" rel="noreferrer noreferrer" target="_blank">outages@outages.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Details scant so far, but there's something awful going on with DNS for a large portion of the Internet, especially Azure.</div><div><br></div><div>Microsoft just posted this to Twitter via @MSFT365Status:</div><div><img src="https://mail.google.com/mail/?ui=2&ik=1c3ef00db6&attid=0.1&th=17895fe92291c3f9&view=fimg&rm=17895fe92291c3f9&sz=w1600-h1000&attbid=ANGjdJ_LbluUgJsfpCixcrZ_NkLLCQwSzPA_REhdVR4bLuzQvJbnwrywZtQfngP2f0j7yy2ZipprQwW8iqCGiz8gGG0ck7ZcFE1Ugk6bWRMJjT08K_JAEJfikEVgSGI&disp=emb&realattid=ii_kmzexui00&zw" alt="image.png" width="541" height="262"><br></div><div></div><div><a href="https://twitter.com/MSFT365Status/status/1377738432265396225" rel="noreferrer noreferrer" target="_blank">https://twitter.com/MSFT365Status/status/1377738432265396225</a></div><div><br></div><div>But I have vendors using UltraDNS (for example) that are also failing to resolve, and those servers aren't hosted in Azure.</div><div><br></div><div>Investigating now. Early warning. Further reports would be appreciated.</div><div><br></div><div>Thanks all!</div><div><br></div><div>- Cary<br></div></div>
_______________________________________________<br>
Outages mailing list<br>
<a href="mailto:Outages@outages.org" rel="noreferrer noreferrer" target="_blank">Outages@outages.org</a><br>
<a href="https://puck.nether.net/mailman/listinfo/outages" rel="noreferrer noreferrer noreferrer" target="_blank">https://puck.nether.net/mailman/listinfo/outages</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr">Jeff Ollie<br>The majestik møøse is one of the mäni interesting furry animals in Sweden.</div>
_______________________________________________<br>
Outages mailing list<br>
<a href="mailto:Outages@outages.org" rel="noreferrer noreferrer" target="_blank">Outages@outages.org</a><br>
<a href="https://puck.nether.net/mailman/listinfo/outages" rel="noreferrer noreferrer noreferrer" target="_blank">https://puck.nether.net/mailman/listinfo/outages</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr">Sincerely,<br> <br>Jason W Kuehl<br>Cell 920-419-8983<br><a href="mailto:jason.w.kuehl@gmail.com" rel="noreferrer noreferrer" target="_blank">jason.w.kuehl@gmail.com</a></div>
_______________________________________________<br>
Outages mailing list<br>
<a href="mailto:Outages@outages.org" rel="noreferrer noreferrer" target="_blank">Outages@outages.org</a><br>
<a href="https://puck.nether.net/mailman/listinfo/outages" rel="noreferrer noreferrer noreferrer" target="_blank">https://puck.nether.net/mailman/listinfo/outages</a><br>
</blockquote></div>