[ednog] DNS server monitoring
michael at rancid.berkeley.edu
Mon Nov 28 17:29:39 EST 2005
What are EDU folks doing to monitor their nameservers? I know I have
posed this question before to individuals on this list, but I'd like to
survey the group. (I can summarize and post the summary if you just
want to reply to me.)
The question is basically in two parts:
o Monitoring actual queries, such as via syslog or some other method.
Do you do this and do you ship all of your queries to a central syslog
server? If not, how do you monitor DNS queries? How long to you save
the logged queries? What kinds of trolling do you do (looking for
naughty queries that might indicate compromise, botnets, etc).
o Aggregate statistics such as number and type of queries per second.
Do you have any (say, RRD-backed) script that either monitors the server
itself or goes through syslogs and generates aggregate statistics? Do
you use SNMP on your DNS servers? Any issues with that? (It might be
useful to mention the OS you're running.)
I'm leaning toward a regime where I would log all my queries to a
dedicated syslog server, which would then have a script that would parse
the raw logs and generate RRD graphs of aggregate query statistics. Any
gotchas you can think of? (One that I know if is that the syslog server
can't be configured to do reverse lookups, using one of the DNS servers
it's monitoring, or it will get into a rather nasty loop as it does a
lookup for every query, which generates a query log, which generates a
lookup, which generates a...) I also plan to test how the syslog
processes on the DNS servers deal with the syslog server going down for
an extended period of time. I don't think that should be a problem
because they're just throwing udp at the server...
So far, performance hasn't been an obvious problem, even with some of
the syslog testing I have been doing.
Anyone have any suggestions for relevant off-the-shelf (open-source)
tools that might help?
PS. Currently, I monitor queries locally on each box and then have a
script that sshes into each box every 5 minutes and scoops up the
queries and dumps them in a central location. That's clunky for a
variety of (probably obvious) reasons.
More information about the ednog