You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Jeff Trawick <tr...@attglobal.net> on 2002/05/01 15:36:05 UTC

[PATCH] get worker to wait for workers to exit even for graceless

If somebody wants to play, this is perhaps all that is necessary.  I
need to straighten out a test script problem on the machine that
regularly exhibits the segfault, then try this out there.

The caveat with this is that if a worker thread is doing
time-consuming processing (e.g., lengthy database transaction) then
the pthread_join will hang for a while and the parent will probably
nail us.

Index: server/mpm/worker/worker.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/mpm/worker/worker.c,v
retrieving revision 1.121
diff -u -r1.121 worker.c
--- server/mpm/worker/worker.c	1 May 2002 07:15:39 -0000	1.121
+++ server/mpm/worker/worker.c	1 May 2002 13:34:58 -0000
@@ -1312,16 +1312,14 @@
             }
         }
 
-        if (rv == AP_GRACEFUL) {
-            /* A terminating signal was received. Now join each of the
-             * workers to clean them up.
-             *   If the worker already exited, then the join frees
-             *   their resources and returns.
-             *   If the worker hasn't exited, then this blocks until
-             *   they have (then cleans up).
-             */
-            join_workers(ts->listener, threads);
-        }
+        /* A terminating signal was received. Now join each of the
+         * workers to clean them up.
+         *   If the worker already exited, then the join frees
+         *   their resources and returns.
+         *   If the worker hasn't exited, then this blocks until
+         *   they have (then cleans up).
+         */
+        join_workers(ts->listener, threads);
     }
 
     free(threads);

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

Re: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Jeff Trawick <tr...@attglobal.net>.

Jeff Trawick <tr...@attglobal.net> writes:

> Index: server/mpm/worker/worker.c
> ===================================================================
> RCS file: /home/cvs/httpd-2.0/server/mpm/worker/worker.c,v
> retrieving revision 1.121
> diff -u -r1.121 worker.c
> --- server/mpm/worker/worker.c	1 May 2002 07:15:39 -0000	1.121
> +++ server/mpm/worker/worker.c	1 May 2002 13:34:58 -0000
> @@ -1312,16 +1312,14 @@
>              }
>          }
>  
> -        if (rv == AP_GRACEFUL) {
> -            /* A terminating signal was received. Now join each of the
> -             * workers to clean them up.
> -             *   If the worker already exited, then the join frees
> -             *   their resources and returns.
> -             *   If the worker hasn't exited, then this blocks until
> -             *   they have (then cleans up).
> -             */
> -            join_workers(ts->listener, threads);
> -        }
> +        /* A terminating signal was received. Now join each of the
> +         * workers to clean them up.
> +         *   If the worker already exited, then the join frees
> +         *   their resources and returns.
> +         *   If the worker hasn't exited, then this blocks until
> +         *   they have (then cleans up).
> +         */
> +        join_workers(ts->listener, threads);

With this patch:

  AIX

    no more segfaults cleaning up the bucket allocator when I run a
    script which consistently led to segfaults before

    in a pound-on-the-server-and-send-sighup test, I don't see any
    messages in the log for errors caused by the sockets getting
    closed; at least the server restarts in a timely manner though

  Linux (RH 6.1, kernel 2.2.12, glibc 2.1.2)

    unknown bad stuff is happening...  with the
    pound-on-the-server-and-send-sighup test, I don't see any messages
    in the log for errors caused by the sockets getting closed;
    something hangs (trying to join a worker thread AFAICT) and the
    processes don't go away until the parent gets upset and sends
    SIGKILL

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

Re: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Jeff Trawick <tr...@attglobal.net>.

"Sander Striker" <st...@apache.org> writes:

> >> Meaning? (I hope there isn't a short timeout on the join).
> > 
> > meaning that the parent process will give up on us ever exiting and
> > will send SIGKILL
> > 
> > there is no timeout on the join
> > 
> >> I think it is perfectly acceptable to wait for the server to shutdown.
> > 
> > The long-standing design is that the parent process first tries
> > sending SIGTERM to children but will give up after a while and send
> > SIGKILL if the child is hung somewhere.
> 
> How long is 'a while'?  If this is 'long enough' this will certainly
> fall under 'acceptable'.

6-7 seconds, I think

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

RE: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Sander Striker <st...@apache.org>.

> From: trawick@rdu88-250-035.nc.rr.com
> [mailto:trawick@rdu88-250-035.nc.rr.com]On Behalf Of Jeff Trawick
> Sent: 01 May 2002 16:04
> "Sander Striker" <st...@apache.org> writes:
> 
>>> From: trawick@rdu88-250-035.nc.rr.com
>>> [mailto:trawick@rdu88-250-035.nc.rr.com]On Behalf Of Jeff Trawick
>>> Sent: 01 May 2002 15:36
>> 
>>> If somebody wants to play, this is perhaps all that is necessary.  I
>>> need to straighten out a test script problem on the machine that
>>> regularly exhibits the segfault, then try this out there.
>>> 
>>> The caveat with this is that if a worker thread is doing
>>> time-consuming processing (e.g., lengthy database transaction) then
>>> the pthread_join will hang for a while and the parent will probably
>>> nail us.
>> 
>> Meaning? (I hope there isn't a short timeout on the join).
> 
> meaning that the parent process will give up on us ever exiting and
> will send SIGKILL
> 
> there is no timeout on the join
> 
>> I think it is perfectly acceptable to wait for the server to shutdown.
> 
> The long-standing design is that the parent process first tries
> sending SIGTERM to children but will give up after a while and send
> SIGKILL if the child is hung somewhere.

How long is 'a while'?  If this is 'long enough' this will certainly
fall under 'acceptable'.

This just brings us back to the point where we need apr_thread_cancel
(*sigh*).  The only entity that knows if it is in a cancellable state
is the thread itself, not the parent.  The parent can only signal a
thread to cancel and wait.

Sander

Re: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Jeff Trawick <tr...@attglobal.net>.

"Sander Striker" <st...@apache.org> writes:

> > From: trawick@rdu88-250-035.nc.rr.com
> > [mailto:trawick@rdu88-250-035.nc.rr.com]On Behalf Of Jeff Trawick
> > Sent: 01 May 2002 15:36
> 
> > If somebody wants to play, this is perhaps all that is necessary.  I
> > need to straighten out a test script problem on the machine that
> > regularly exhibits the segfault, then try this out there.
> > 
> > The caveat with this is that if a worker thread is doing
> > time-consuming processing (e.g., lengthy database transaction) then
> > the pthread_join will hang for a while and the parent will probably
> > nail us.
> 
> Meaning? (I hope there isn't a short timeout on the join).

meaning that the parent process will give up on us ever exiting and
will send SIGKILL

there is no timeout on the join

> I think it is perfectly acceptable to wait for the server to shutdown.

The long-standing design is that the parent process first tries
sending SIGTERM to children but will give up after a while and send
SIGKILL if the child is hung somewhere.

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

RE: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Sander Striker <st...@apache.org>.

> From: trawick@rdu88-250-035.nc.rr.com
> [mailto:trawick@rdu88-250-035.nc.rr.com]On Behalf Of Jeff Trawick
> Sent: 01 May 2002 15:36

> If somebody wants to play, this is perhaps all that is necessary.  I
> need to straighten out a test script problem on the machine that
> regularly exhibits the segfault, then try this out there.
> 
> The caveat with this is that if a worker thread is doing
> time-consuming processing (e.g., lengthy database transaction) then
> the pthread_join will hang for a while and the parent will probably
> nail us.

Meaning? (I hope there isn't a short timeout on the join).

I think it is perfectly acceptable to wait for the server to shutdown.
One of the reasons to do an ungraceful termination is to close the open
connections, which we do now.  If our backend needs to cleanup, so be it,
we have to wait.  If you can't, resort to more drastic stuff like kill -9,
but prepare to cleanup the mess that it might leave behind.

Sander

Re: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Cliff Woolley <jw...@virginia.edu>.

On Wed, 1 May 2002, Cliff Woolley wrote:

> > All I could tell is that the child's main thread is stuck in
> > thread-join.

Yep, seems like the same thing, though everything happens so fast it's
awfully hard to be sure.  I *think* this is a valid backtrace:

(gdb) bt
#0  0x403a89be in select () from /lib/libc.so.6
#1  0x40042128 in __DTOR_END__ () from /root/apache/test/lib/libapr.so.0
#2  0x08062e06 in join_workers (listener=0x8118c68, threads=0x813f078)
    at worker.c:1127
#3  0x08063213 in child_main (child_num_arg=4) at worker.c:1322
#4  0x080632fc in make_child (s=0x80b4490, slot=4) at worker.c:1376
#5  0x080636a7 in perform_idle_server_maintenance () at worker.c:1537
#6  0x0806387b in server_main_loop (remaining_children_to_start=0)
    at worker.c:1630
#7  0x08063ac8 in ap_mpm_run (_pconf=0x80b2740, plog=0x80dc7e8,
s=0x80b4490)
    at worker.c:1726
#8  0x0806920e in main (argc=1, argv=0xbffff364) at main.c:632
#9  0x402f874f in __libc_start_main () from /lib/libc.so.6


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Cliff Woolley <jw...@virginia.edu>.

On 1 May 2002, Jeff Trawick wrote:

> All I could tell is that the child's main thread is stuck in
> thread-join.  I have not checked to see what the worker threads were
> doing.

I'll try to investigate further.  PS: FWIW, graceful is working
beautifully.  I just did an absolute torture test on it, and it passed
with flying colors.  :)  The only things in the error log were the
following, and I *think* all are harmless:

[Wed May 01 15:04:14 2002] [warn] long lost child came home! (pid 12136)
[Wed May 01 15:04:14 2002] [crit] the listener thread didn't exit
[Wed May 01 15:04:14 2002] [crit] the listener thread didn't exit
[Wed May 01 15:04:15 2002] [warn] long lost child came home! (pid 12193)
[Wed May 01 15:04:16 2002] [warn] long lost child came home! (pid 12222)
[Wed May 01 15:04:17 2002] [warn] long lost child came home! (pid 12251)

The test was ./ab -n 1000 -c 100 http://localhost/linux-2.5.12.tar.bz2,
which amounts to 1.5 GB of downloads <smirk>, while pounding it with
graceful after graceful after graceful.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Jeff Trawick <tr...@attglobal.net>.

Cliff Woolley <jw...@virginia.edu> writes:

> It shuts down, but ouch, is it nasty:
> 
> [Wed May 01 14:52:26 2002] [warn] child process 11253 still did not exit,
> sending a SIGTERM

All I could tell is that the child's main thread is stuck in
thread-join.  I have not checked to see what the worker threads were
doing.

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

Re: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Cliff Woolley <jw...@virginia.edu>.

On 1 May 2002, Jeff Trawick wrote:

> > In any case, +1 on the patch.  Worker won't be perfect, but at least
> > it will be better than 2.0.35.
>
> I just committed it.  I'd love to get some feedback from other Linux
> users before a roll, though.


It shuts down, but ouch, is it nasty:

[Wed May 01 14:52:26 2002] [warn] child process 11253 still did not exit,
sending a SIGTERM
[Wed May 01 14:52:26 2002] [warn] child process 11254 still did not exit,
sending a SIGTERM
[Wed May 01 14:52:26 2002] [warn] child process 11253 still did not exit,
sending a SIGTERM
[Wed May 01 14:52:26 2002] [warn] child process 11254 still did not exit,
sending a SIGTERM
[Wed May 01 14:52:28 2002] [warn] child process 11253 still did not exit,
sending a SIGTERM
[Wed May 01 14:52:28 2002] [warn] child process 11254 still did not exit,
sending a SIGTERM
[Wed May 01 14:52:32 2002] [error] child process 11253 still did not exit,
sending a SIGKILL
[Wed May 01 14:52:32 2002] [error] child process 11254 still did not exit,
sending a SIGKILL
[Wed May 01 14:52:49 2002] [notice] caught SIGTERM, shutting down

Note: as soon as I told it to stop, my big download *did* get stopped.
But the process just sat there until it got the SIGKILL, at which point it
was a zombie until the parent shut down a few seconds later.

:-/

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Cliff Woolley <jw...@virginia.edu>.

On 1 May 2002, Jeff Trawick wrote:

> I just committed it.  I'd love to get some feedback from other Linux
> users before a roll, though.

I will test it within the hour.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Jeff Trawick <tr...@attglobal.net>.

"Sander Striker" <st...@apache.org> writes:

> > From: trawick@rdu88-250-035.nc.rr.com
> > [mailto:trawick@rdu88-250-035.nc.rr.com]On Behalf Of Jeff Trawick
> > Sent: 01 May 2002 15:36
> 
> > If somebody wants to play, this is perhaps all that is necessary.  I
> > need to straighten out a test script problem on the machine that
> > regularly exhibits the segfault, then try this out there.
> > 
> > The caveat with this is that if a worker thread is doing
> > time-consuming processing (e.g., lengthy database transaction) then
> > the pthread_join will hang for a while and the parent will probably
> > nail us.
> 
> In any case, +1 on the patch.  Worker won't be perfect, but at least
> it will be better than 2.0.35.

I just committed it.  I'd love to get some feedback from other Linux
users before a roll, though.

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

RE: [PATCH] get worker to wait for workers to exit even for graceless

Posted by Sander Striker <st...@apache.org>.

> From: trawick@rdu88-250-035.nc.rr.com
> [mailto:trawick@rdu88-250-035.nc.rr.com]On Behalf Of Jeff Trawick
> Sent: 01 May 2002 15:36

> If somebody wants to play, this is perhaps all that is necessary.  I
> need to straighten out a test script problem on the machine that
> regularly exhibits the segfault, then try this out there.
> 
> The caveat with this is that if a worker thread is doing
> time-consuming processing (e.g., lengthy database transaction) then
> the pthread_join will hang for a while and the parent will probably
> nail us.

In any case, +1 on the patch.  Worker won't be perfect, but at least
it will be better than 2.0.35.

Sander