You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Nick Kew <ni...@webthing.com> on 2005/06/04 16:02:41 UTC

Watchdog code for Apache

A little while back, I hacked up a quick&dirty experimental watchdog 
module.  It forks a watchdog process in the pre_mpm hook, which then
watches the scoreboard and kills any process in which some request
has taken more than some predefined time.

Currently this is limited to killing processes, which puts it
at the same level of usefulness as mod_watchcat.  My attempts
to do anything better than that in a signal handler haven't
gone anywhere useful, and this looks like an unpromising
approach.  It'll work for prefork, but as far as other MPMs
are concerned it looks like a dead end.  What I'd like to do
is rather than kill the process, terminate the errant thread,
including winding down its pools.

It also seems better for the watchdog code to run in Apache's
master process than to fork off a separate process.

My current thinking is to use ap_wait_or_timeout and:

* go through the scoreboard looking for threads tied up in a
  request that's gone on too long.
* send SIGUSR2 to the process
* return with pid.pid set to the process for the MPM to deal with

Then the per-process signal handler is reduced to setting a flag
for the MPM to deal with.  Now the MPM can at worst terminate it
cleanly, perhaps using the graceful restart code on the process.

Questions:
* Any objections in principle to adding watchdog code in this manner?
* Does this plan make sense?
* Is there a better plan that'll enable me to get down to thread level
  and stop just the errant thread?  Perhaps the above with an optional
  thread-shutdown hook somewhere?

-- 
Nick Kew


Re: Watchdog code for Apache

Posted by Jeff Trawick <tr...@gmail.com>.
On 6/6/05, Nick Kew <ni...@webthing.com> wrote:
> Jeff Trawick wrote:
> > On 6/4/05, Nick Kew <ni...@webthing.com> wrote:
> >
> >>It also seems better for the watchdog code to run in Apache's
> >>master process than to fork off a separate process.
> >
> > ...
> >
> >>My current thinking is to use ap_wait_or_timeout and:
> >
> >
> > create a hook that runs there
> 
> Agreed.  I've made a patch for that: it works in place of my previous
> watchdog code, and offers a better solution.  Any objections to
> my committing it?

not really

I think that something more generic than "watchdog" is appropriate,
but I don't have any good idea for a hook name.  "monitor" is only
slightly more generic.

Windows folks: Where would this hook have to run from the Windows MPM
in order for a scoreboard-monitoring-module to be able to access the
same type of information at appropriate intervals?

Re: Watchdog code for Apache

Posted by Nick Kew <ni...@webthing.com>.
Jeff Trawick wrote:
> On 6/4/05, Nick Kew <ni...@webthing.com> wrote:
> 
>>It also seems better for the watchdog code to run in Apache's
>>master process than to fork off a separate process.
> 
> ...
> 
>>My current thinking is to use ap_wait_or_timeout and:
> 
> 
> create a hook that runs there

Agreed.  I've made a patch for that: it works in place of my previous
watchdog code, and offers a better solution.  Any objections to
my committing it?

--- include/mpm_common.h        2005-04-12 21:39:50.649851544 +0100
+++ include/mpm_common.h.new    2005-06-06 23:41:46.778013008 +0100
@@ -296,6 +296,8 @@
                                              const char *arg);
 #endif

+AP_DECLARE_HOOK(int,watchdog,(apr_pool_t *p))
+
 #ifdef __cplusplus
 }
 #endif


--- server/mpm_common.c 2005-04-12 21:39:40.682366832 +0100
+++ server/mpm_common.c.new     2005-06-06 23:42:58.649086944 +0100
@@ -59,6 +59,21 @@
 #include <unistd.h>
 #endif

+APR_HOOK_STRUCT(
+#if AP_ENABLE_EXCEPTION_HOOK
+    APR_HOOK_LINK(fatal_exception)
+#endif
+    APR_HOOK_LINK(watchdog)
+)
+
+#if AP_ENABLE_EXCEPTION_HOOK
+AP_IMPLEMENT_HOOK_RUN_ALL(int, fatal_exception,
+                          (ap_exception_info_t *ei), (ei), OK, DECLINED)
+#endif
+AP_IMPLEMENT_HOOK_RUN_ALL(int, watchdog,
+                          (apr_pool_t *p), (p), OK, DECLINED)
+
+
 #ifdef AP_MPM_WANT_RECLAIM_CHILD_PROCESSES

 typedef enum {DO_NOTHING, SEND_SIGTERM, SEND_SIGKILL, GIVEUP} action_t;
@@ -275,6 +290,7 @@
     ++wait_or_timeout_counter;
     if (wait_or_timeout_counter == INTERVAL_OF_WRITABLE_PROBES) {
         wait_or_timeout_counter = 0;
+        ap_run_watchdog(p);
     }

     rv = apr_proc_wait_all_procs(ret, exitcode, status, APR_NOWAIT, p);
@@ -1028,13 +1044,6 @@
     return NULL;
 }

-APR_HOOK_STRUCT(
-    APR_HOOK_LINK(fatal_exception)
-)
-
-AP_IMPLEMENT_HOOK_RUN_ALL(int, fatal_exception,
-                          (ap_exception_info_t *ei), (ei), OK, DECLINED)
-
 static void run_fatal_exception_hook(int sig)
 {
     ap_exception_info_t ei = {0};


> 
> maybe mpms implement an optional function that a module could call to
> do process management work?
> 
> even when the MPM doesn't implement that function, it is still useful
> to have a module identify stuck requests

That's the next step:-)

-- 
Nick Kew


Re: Watchdog code for Apache

Posted by Jeff Trawick <tr...@gmail.com>.
On 6/4/05, Nick Kew <ni...@webthing.com> wrote:
> It also seems better for the watchdog code to run in Apache's
> master process than to fork off a separate process.
...
> My current thinking is to use ap_wait_or_timeout and:

create a hook that runs there

> * go through the scoreboard looking for threads tied up in a
>   request that's gone on too long.
> * send SIGUSR2 to the process
> * return with pid.pid set to the process for the MPM to deal with

maybe mpms implement an optional function that a module could call to
do process management work?

even when the MPM doesn't implement that function, it is still useful
to have a module identify stuck requests

> 
> Then the per-process signal handler is reduced to setting a flag
> for the MPM to deal with.  Now the MPM can at worst terminate it
> cleanly, perhaps using the graceful restart code on the process.

for threaded MPM, graceful shutdown doesn't help, since you now have
an entire hung process which is doing no useful work instead of a
process where n-1 threads are doing useful work and 1 thread is hung

> 
> Questions:
> * Any objections in principle to adding watchdog code in this manner?

IMHO it should be in separate module and invoke high level services
optionally provided by the MPM in order to do dirty work

> * Does this plan make sense?
> * Is there a better plan that'll enable me to get down to thread level
>   and stop just the errant thread?  Perhaps the above with an optional
>   thread-shutdown hook somewhere?

unclear to me what can be done in a threaded MPM

Re: Watchdog code for Apache

Posted by Stas Bekman <st...@stason.org>.
Nick Kew wrote:
> A little while back, I hacked up a quick&dirty experimental watchdog 
> module.  It forks a watchdog process in the pre_mpm hook, which then
> watches the scoreboard and kills any process in which some request
> has taken more than some predefined time.

Also check:
http://search.cpan.org/dist/Apache-Watchdog-RunAway/
which exists since 2000 and works with both Apache 1.3 and 2.x.

Though with 2.0 it has the same issues, as Nick have mentioned

-- 
__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Watchdog code for Apache

Posted by Graham Leggett <mi...@sharp.fm>.
Nick Kew wrote:

> A little while back, I hacked up a quick&dirty experimental watchdog 
> module.  It forks a watchdog process in the pre_mpm hook, which then
> watches the scoreboard and kills any process in which some request
> has taken more than some predefined time.

Depends on what "taken more time" means. I am currently in the deepest 
backwater of the internet (in fact I think South Africa is officially 
the slowest area of the net), and many requests for us "take a long 
time". If the predetermined time is too short, we may find some parts of 
the net inaccessible to us.

Regards,
Graham
--