[outages] ServiceNow DNSSEC issues?

Michael Sinatra
Fri Jan 27 19:54:35 EST 2017

Looks like ServiceNow may have recently botched a KSK roll. is intermittently giving out SERVFAIL responses, and my own 
resolvers had SERVFAILs (we are a SN shop), starting around 1545 US/Pacific.

Examining the cache (and DNSVIZ) shows that at some point within the 
last 3 hours, the DS record for service-now.com switched from having 
keytag 30126 to keytag 31893.  The DS record has a TTL of 24 hours.

It appears that at the same time, or a *short* (under 3 hours) time 
later, the DNSKEY record for KSK with tag 30126 was removed from 
service-now.com.  The DNSKEY records have a TTL of 2 hours.

This would have caused systems to continue to reference the cached DS 
record to fail validation after the DNSKEY cache TTL expired.  (The 
DNSKEY would have needed to be retained for at least a full TTL, and 
probably 2xTTL to be on the safe side.)  Again, this is causing 
intermittent breakage of name resolution in service-now.com.

Appendix: repeated queries of the DS record for service-now.com to

service-now.com.        86395   IN      DS      31893 7 1 

;; Query time: 6 msec
;; WHEN: Sat Jan 28 00:33:58 UTC 2017

service-now.com.        60788   IN      DS      30126 7 1 

;; Query time: 24 msec
;; WHEN: Sat Jan 28 00:33:59 UTC 2017
;; MSG SIZE  rcvd: 80

I'll spare you the dig output, but suffice it to say that the auth 
servers for service-now.com don't have KSK 30126 in their DNSKEY RRSET.


