[sysmon-help] Do-Not-Contact Periods

Jason Huebel jhuebel at gmail.com
Thu Dec 13 10:31:25 EST 2007


I have to say, I like "silence-begin"/"end" myself. But whatever you decide
sounds great to me. Just having the functionality will be a big plus.

Jason Huebel

"We must use time as a tool, not as a crutch." -John F. Kennedy (1917 -
1963)

-----Original Message-----
From: sysmon-help-bounces at puck.nether.net
[mailto:sysmon-help-bounces at puck.nether.net] On Behalf Of Morgan Aldridge
Sent: Thursday, December 13, 2007 9:26 AM
To: sysmon-help at puck.nether.net
Cc: jared at sysmon.org
Subject: Re: [sysmon-help] Do-Not-Contact Periods

On Jun 22, 2007 12:40 PM, Morgan Aldridge <morgant at makkintosshu.com> wrote:
>
> On Jun 20, 2007 10:18 AM, Morgan Aldridge <morgant at makkintosshu.com>
wrote:
> >
> > I have a number of servers and/or services that "go down" during the
> > night for backups and other maintenance, but I obviously don't want
> > to get paged during those times. I actually "spawn" a specific script
> > to send the page, so I could do some parsing to ignore pages for
> > certain servers & services during their respective maintenance
> > periods, but that's a whole bunch of work in a script that it doesn't
> > belong in (meaning more work to modify my Sysmon config as well).
> >
> > I propose the following configuration options (where "dnc" stands for
> > "do not contact" and "time" would be a 24-hour "HH:MM" string):
> >
> >        dnc-begin [ time ]
> >        dnc-end [ time ]
> >
> > In terms of functionality, the up/down status of an object would not
> > be tested (just as if the object it depended on were down) and,
> > naturally, no "contact" or "spawn" configuration options would be
> > acted upon during that time period. This way, if a server didn't come
> > back up at the end of the scheduled maintenance period, one would
> > still get paged, but no testing/paging would occur during the
> > maintenance period.
> >
> > Whether or not the uptime would be affected is open for discussion.
> > I'd argue it should affect uptime, but then you'd have to still test
> > the object, but just not contact/spawn.
> >
> > How much work would this be to implement, realistically? If you can
> > point me in the right direction (e.g. where to add the new config
> > options and which function does the contact/spawn then I'll gladly
> > implement it myself and send you the diffs (even if I have to do it
> > off-the-clock).
> >
> > BTW - Another related option that might be useful would be a
> > "timezone [ string ]" option for those of us that have servers in
> > different time zones. Obviously less important.
>
> The other night I did a quick exploration of the source and it
> appears that most of the modifications needed would be in
> handle_retval() in syswatch.c (or possibly in page_someone() and
> run_command_and_mail_output() in page.c). Unfortunately, I'm really
> not familiar with lex, so the parsing of the config file will take be
> a bit.
>
> Is anyone else interested in this functionality? Or am I the only one
> that has some services that actually go "down" for regular
> maintenance & backup?

Okay, I'm going to revisit this and see if anyone (Jared?) can point
me in the direction of adding an object configuration object or two to
parser.c or wherever that goes. This is now a functionality that I
can, unfortunately, no longer live without, but I know I can add with
the same amount of effort as hacking around it.

The following are my notes from the previous exploration of the source
(sysmon-0.93-pre3) and are enough for me to add the functionality once
I know how to parse the configuration option and then access it.

- paging done by page_someone() in page.c
- 'spawn' done by run_command_and_mail_output() in page.c
- page_someone() called by handle_retval() &
walk_periodic_page_checks() in syswatch.c
- handle_retval() handles the return value from service_this() in
service_checks() in syswatch.c
- line 1213 of syswatch.c (in handle_retval()) does the following
check to see if the service came back up:
	/* check to see if the service came up */
	if ((handle_this->retval == SYSM_OK) &&
(handle_this->checkent->contacted == TRUE) &&
(handle_this->checkent->lastcheck != SYSM_OK))
- line 1239 of syswatch.c (in handle_retval()) does the following
check to see if the service is still down:
	/* if service is down, and state is same as last time */
	if ((handle_this->retval != SYSM_OK) &&
(handle_this->checkent->lastcheck == handle_this->retval))
- there are a couple other checks that also end up sending pages, see
lines 1263 & 1288 in syswatch.c (in handle_retval())

Does anyone have any additional input/suggestions on the names of the
object config options? dnc-begin/dnc-end (as originally proposed)?
silent-begin/silent-end? blackout-start/blackout-stop?

Morgan Aldridge
---
morgant at makkintosshu.com
http://www.makkintosshu.com/
_______________________________________________
Sysmon-help mailing list
Sysmon-help at puck.nether.net
https://puck.nether.net/mailman/listinfo/sysmon-help



More information about the Sysmon-help mailing list