You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Chris Darroch <ch...@pearsoncmg.com> on 2006/05/04 20:16:37 UTC

[PATCH 5/6] hard restart on Linux #38737

Hi --

   An older but essentially identical version of this patch is
in Bugzilla PR #38737.

   Using the worker MPM (but not the event MPM), if Keep-Alives
are enabled and the timeout is reasonably long (e.g., 15 seconds),
then worker threads wait in poll() after handling a request
for any further requests on a Keep-Alive connection.

   On most Unix-like OSes (e.g., Solaris), when you perform a hard
restart or shutdown (not a graceful one), the close_worker_sockets()
function successfully alerts all worker threads that they should
exit quickly.  In particular, if the worker threads are polling
on the socket, closing the socket in the main thread has the
side-effect of causing the worker thread to immediately receive
an EBADF.

   However, on Linux, this side-effect doesn't occur.  This is
intentional; see:

http://bugme.osdl.org/show_bug.cgi?id=546

   The consequence is that the child process's main thread
waits in apr_thread_join() for each worker thread and if these
are just starting a reasonably long Keep-Alive timeout period,
then the parent process soon decides that the child process
has stalled and ap_reclaim_child_processes() sends SIGTERM
and SIGKILL to the child process.

   In turn, any modules that have registered pool cleanup
functions on the pchild memory pool, as passed in the child_init
stage, won't see their cleanup functions run.  For certain
modules, e.g., mod_dbd, this has relatively severe consequences,
such as failing to cleanly disconnect from a database.

   This patch causes the main thread to signal each worker
thread before attempting apr_thread_join(); if any workers
are waiting in poll() they then receive EBADF immediately
after the signal and this allows the restart or shutdown
process to proceed as expected, with calls to all the
registered pool cleanup functions.

Chris.

=====================================================================
--- server/mpm/worker/worker.c.orig	2006-05-03 15:04:28.429547123 -0400
+++ server/mpm/worker/worker.c	2006-05-03 15:07:04.659719568 -0400
@@ -213,6 +213,19 @@
  */
 #define LISTENER_SIGNAL     SIGHUP
 
+/* The WORKER_SIGNAL signal will be sent from the main thread to the
+ * worker threads during an ungraceful restart or shutdown.
+ * This ensures that on systems (i.e., Linux) where closing the worker
+ * socket doesn't awake the worker thread when it is polling on the socket
+ * (especially in apr_wait_for_io_or_timeout() when handling
+ * Keep-Alive connections), close_worker_sockets() and join_workers()
+ * still function in timely manner and allow ungraceful shutdowns to
+ * proceed to completion.  Otherwise join_workers() doesn't return
+ * before the main process decides the child process is non-responsive
+ * and sends a SIGKILL.
+ */
+#define WORKER_SIGNAL       AP_SIG_GRACEFUL
+
 /* An array of socket descriptors in use by each thread used to
  * perform a non-graceful (forced) shutdown of the server. */
 static apr_socket_t **worker_sockets;
@@ -822,6 +835,11 @@
     ap_scoreboard_image->servers[process_slot][thread_slot].generation = ap_my_generation;
     ap_update_child_status_from_indexes(process_slot, thread_slot, SERVER_STARTING, NULL);
 
+#ifdef HAVE_PTHREAD_KILL
+    unblock_signal(WORKER_SIGNAL);
+    apr_signal(WORKER_SIGNAL, dummy_signal_handler);
+#endif
+
     while (!workers_may_exit) {
         if (!is_idle) {
             rv = ap_queue_info_set_idle(worker_queue_info, last_ptrans);
@@ -1077,6 +1095,13 @@
 
     for (i = 0; i < ap_threads_per_child; i++) {
         if (threads[i]) { /* if we ever created this thread */
+#ifdef HAVE_PTHREAD_KILL
+            apr_os_thread_t *worker_os_thread;
+
+            apr_os_thread_get(&worker_os_thread, threads[i]);
+            pthread_kill(*worker_os_thread, WORKER_SIGNAL);
+#endif
+
             rv = apr_thread_join(&thread_rv, threads[i]);
             if (rv != APR_SUCCESS) {
                 ap_log_error(APLOG_MARK, APLOG_CRIT, rv, ap_server_conf,

Re: [PATCH 5/6] hard restart on Linux #38737

Posted by Colm MacCarthaigh <co...@stdlib.net>.
On Mon, May 08, 2006 at 06:19:59AM -0400, Jeff Trawick wrote:
> >Question: what other side-effects might this have?
> 
> unpredictable if somebody/something sends AP_SIG_GRACEFUL to the child
> process; it will land on random worker thread
> 
> (it might be nice for debugging if you could make processes gracefully
> exit by sending them the signal; 

sending LISTENER_SIGNAL to a child should do that anyway :)

-- 
Colm MacCárthaigh                        Public Key: colm+pgp@stdlib.net

Re: [PATCH 5/6] hard restart on Linux #38737

Posted by Jeff Trawick <tr...@gmail.com>.
On 5/7/06, Nick Kew <ni...@webthing.com> wrote:
> On Thursday 04 May 2006 19:16, Chris Darroch wrote:
>
> > +#ifdef HAVE_PTHREAD_KILL
> > +    unblock_signal(WORKER_SIGNAL);
> > +    apr_signal(WORKER_SIGNAL, dummy_signal_handler);
> > +#endif
> > +
>
> OK, unblock a signal.  This happens after child_init, but presumably
> won't interfere with existing modules 'cos a module that attaches a
> handler for that in a child_init wouldn't have worked before, either.

AP_SIG_GRACEFUL is our signal to play with.  Modules can't change how
that is delivered and expect the server to work reliably.

>
> Question: what other side-effects might this have?

unpredictable if somebody/something sends AP_SIG_GRACEFUL to the child
process; it will land on random worker thread

(it might be nice for debugging if you could make processes gracefully
exit by sending them the signal; but we don't have such code so I
don't see how using it for this intra-child communication will break
anything)

APR generally eats signals.  The app can set a flag in their signal
handler and check later, but APR won't return EINTR except from a
couple of functions (poll and accept perhaps?).  So there is the
possibility that the signal won't prod the worker thread from wherever
it is  But that's no worse than today, and we still close the socket.

>                                                                                Anyone recollect
> why signals are commonly blocked in the child processes?

We want the main thread in the child process to see any signals and
the worker and listener threads to be oblivious to signals.

If a third-party module has its own thread and uses a signal for its
own purpose, it unblocks that signal on its thread and we're
oblivious.

Re: [PATCH 5/6] hard restart on Linux #38737

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 5/7/06, Nick Kew <ni...@webthing.com> wrote:
> On Sunday 07 May 2006 23:07, Garrett Rooney wrote:
> > On 5/7/06, Nick Kew <ni...@webthing.com> wrote:
> > > Now, what about a platform that HAS_PTHREAD_KILL, but which uses some
> > > other form of threading in its APR (isn't that at least an option on some
> > > FreeBSD versions?)  Wouldn't this break horribly when it pthread_kills a
> > > non-pthread?  Couldn't it even happen on Linux, in principle at least?
> >
> > On FreeBSD you basically need to pick one threading library, if you're
> > linked against more than one of them bad things happen, since they all
> > implement the same pthread functions.
>
> So there's no thread API that isn't pthreads?

Correct, there are user level pthreads implementations (libc_r, GNU
pth), and kernel or mixed userland/kernel implementations (libpthread,
libthread), but they all expose the same pthreads API, and you can't
mix them.

> > On Solaris, which does have
> > multiple threading implementations with different APIs, I don't think
> > it would matter, since pthreads is implemented on top of the lower
> > level solaris threads.
>
> But if an APR was built on the lower-level threads, then we might
> be pthread_kill()ing something that isn't a pthread?  Even if it's
> only a theoretical possibility, we should avoid it.

Does APR actually do this on any platform?

> If we had an APR #define for pthread vs non-pthread threading
> implementations, we should test that.  As it stands, if we take
> the patch as-is, then IMO we need to document the MPMs
> explicitly as requiring APR threads to be built on pthreads.
>
> > I suspect that's the common case, if there is
> > another threading library it's lower level and pthreads is generally
> > implemented on top of it. The only other case I can think of is if
> > you're using a user level threading library but the system has its own
> > pthreads library.
>
> Exactly.  What if someone out there has a super-optimised APR
> built on their lower-level threads, on a platform that also has pthreads?

Well, if they do then it's not in the actual APR distribution, so I
have trouble caring...  If we're on unix systems, I think it's safe to
assume that it's either pthreads, or something that interoperates well
with pthreads (i.e. solaris threads, where you can use either the
pthreads functions or the solaris threads functions and they play
reasonably nicely together).

-garrett

Re: [PATCH 5/6] hard restart on Linux #38737

Posted by Nick Kew <ni...@webthing.com>.
On Sunday 07 May 2006 23:07, Garrett Rooney wrote:
> On 5/7/06, Nick Kew <ni...@webthing.com> wrote:
> > Now, what about a platform that HAS_PTHREAD_KILL, but which uses some
> > other form of threading in its APR (isn't that at least an option on some
> > FreeBSD versions?)  Wouldn't this break horribly when it pthread_kills a
> > non-pthread?  Couldn't it even happen on Linux, in principle at least?
>
> On FreeBSD you basically need to pick one threading library, if you're
> linked against more than one of them bad things happen, since they all
> implement the same pthread functions.

So there's no thread API that isn't pthreads?

> On Solaris, which does have 
> multiple threading implementations with different APIs, I don't think
> it would matter, since pthreads is implemented on top of the lower
> level solaris threads.

But if an APR was built on the lower-level threads, then we might
be pthread_kill()ing something that isn't a pthread?  Even if it's
only a theoretical possibility, we should avoid it.

If we had an APR #define for pthread vs non-pthread threading
implementations, we should test that.  As it stands, if we take
the patch as-is, then IMO we need to document the MPMs
explicitly as requiring APR threads to be built on pthreads.

> I suspect that's the common case, if there is 
> another threading library it's lower level and pthreads is generally
> implemented on top of it. The only other case I can think of is if 
> you're using a user level threading library but the system has its own
> pthreads library.

Exactly.  What if someone out there has a super-optimised APR
built on their lower-level threads, on a platform that also has pthreads?

-- 
Nick Kew

Re: [PATCH 5/6] hard restart on Linux #38737

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 5/7/06, Nick Kew <ni...@webthing.com> wrote:

> Now, what about a platform that HAS_PTHREAD_KILL, but which uses some
> other form of threading in its APR (isn't that at least an option on some
> FreeBSD versions?)  Wouldn't this break horribly when it pthread_kills a
> non-pthread?  Couldn't it even happen on Linux, in principle at least?

On FreeBSD you basically need to pick one threading library, if you're
linked against more than one of them bad things happen, since they all
implement the same pthread functions.  On Solaris, which does have
multiple threading implementations with different APIs, I don't think
it would matter, since pthreads is implemented on top of the lower
level solaris threads.  I suspect that's the common case, if there is
another threading library it's lower level and pthreads is generally
implemented on top of it.  The only other case I can think of is if
you're using a user level threading library but the system has its own
pthreads library.

-garrett

Re: [PATCH 5/6] hard restart on Linux #38737

Posted by Nick Kew <ni...@webthing.com>.
On Thursday 04 May 2006 19:16, Chris Darroch wrote:

> +#ifdef HAVE_PTHREAD_KILL
> +    unblock_signal(WORKER_SIGNAL);
> +    apr_signal(WORKER_SIGNAL, dummy_signal_handler);
> +#endif
> +

OK, unblock a signal.  This happens after child_init, but presumably
won't interfere with existing modules 'cos a module that attaches a
handler for that in a child_init wouldn't have worked before, either.

Question: what other side-effects might this have?  Anyone recollect
why signals are commonly blocked in the child processes?

>      while (!workers_may_exit) {
>          if (!is_idle) {
>              rv = ap_queue_info_set_idle(worker_queue_info, last_ptrans);
> @@ -1077,6 +1095,13 @@
>
>      for (i = 0; i < ap_threads_per_child; i++) {
>          if (threads[i]) { /* if we ever created this thread */
> +#ifdef HAVE_PTHREAD_KILL
> +            apr_os_thread_t *worker_os_thread;
> +
> +            apr_os_thread_get(&worker_os_thread, threads[i]);
> +            pthread_kill(*worker_os_thread, WORKER_SIGNAL);
> +#endif
> +

Works for me on Linux.

Now, what about a platform that HAS_PTHREAD_KILL, but which uses some
other form of threading in its APR (isn't that at least an option on some
FreeBSD versions?)  Wouldn't this break horribly when it pthread_kills a
non-pthread?  Couldn't it even happen on Linux, in principle at least?


-- 
Nick Kew