You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Jeff Trawick <tr...@bellsouth.net> on 2000/07/25 22:18:30 UTC

[PATCH] bringing down the server from an MPM thread

There doesn't seem to be any existing logic in the Unix mpm(s) to
bring down the server after a fatal error that is encountered in 
a worker thread.  For the moment, I'm concerned with errors from
ap_poll() and ap_accept().

Currently, after an error from ap_poll() in a worker thread, the
worker threads will go away and then the process will exit(0).  The
parent will then restart the process.  With this change, after a fatal
error is encountered in a worker thread, a flag
(fatal_error_in_worker) is set so that the process exits with status
APEXIT_CHILDFATAL.  This triggers the parent process to bring
everything down.

Any better ideas for how to do this?

The patch is just a proof of concept.  I need to pick up the right
accept() error classification from Apache 1.3 and also get the change
to help the DEXTER flavor of mpmt.c too.

Index: modules/mpm/mpmt/mpmt.c
===================================================================
RCS file: /cvs/apache/apache-2.0/src/modules/mpm/mpmt/mpmt.c,v
retrieving revision 1.18
diff -u -r1.18 mpmt.c
--- mpmt.c	2000/07/25 17:39:16	1.18
+++ mpmt.c	2000/07/25 20:15:40
@@ -125,6 +125,7 @@
 static int min_spare_threads = 0;
 static int max_spare_threads = 0;
 static int workers_may_exit = 0;
+static int fatal_error_in_worker = 0;
 static int requests_this_child;
 static int num_listensocks = 0;
 static ap_socket_t **listensocks;
@@ -818,6 +819,7 @@
                 ap_log_error(APLOG_MARK, APLOG_ERR, ret, (const server_rec *)
                              ap_server_conf, "ap_poll: (listen)");
                 workers_may_exit = 1;
+                fatal_error_in_worker = 1;
             }
 
             if (workers_may_exit) break;
@@ -857,6 +859,18 @@
             if ((rv = ap_accept(&csd, sd, ptrans)) != APR_SUCCESS) {
                 ap_log_error(APLOG_MARK, APLOG_ERR, rv, ap_server_conf, 
                              "ap_accept");
+                csd = NULL;
+                switch(rv)
+                {
+                /* TODO: From Apache 1.3, bring in the classification of 
+                 * various accept() errors.
+                 */
+                default:
+                    ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf,
+                                 "ap_accept: giving up.");
+                    workers_may_exit = 1;
+                    fatal_error_in_worker = 1;
+                }
             }
             if ((rv = SAFE_ACCEPT(ap_unlock(process_accept_mutex)))
                 != APR_SUCCESS) {
@@ -868,7 +882,9 @@
 #ifndef PREFORK
             pthread_mutex_unlock(&thread_accept_mutex);
 #endif
-            process_socket(ptrans, csd, process_slot * HARD_THREAD_LIMIT + thread_slot);
+            if (csd) {
+                process_socket(ptrans, csd, process_slot * HARD_THREAD_LIMIT + thread_slot);
+            }
             requests_this_child--;
         }
         else {
@@ -1026,7 +1042,15 @@
     switch (signal_received) {
         case SIGTERM:
         case SIGINT:
-            just_die(signal_received);
+            if (fatal_error_in_worker) {
+                ap_log_error(APLOG_MARK, APLOG_ALERT | APLOG_NOERRNO, 0, 
+                             ap_server_conf,
+                             "A fatal error occured in a worker thread.");
+                exit(APEXIT_CHILDFATAL);
+            }
+            else {
+                just_die(signal_received);
+            }
             break;
         default:
             ap_log_error(APLOG_MARK, APLOG_ALERT, errno, ap_server_conf,


-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...

Re: [PATCH] bringing down the server from an MPM thread

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
rbb@covalent.net wrote:
> 
> His patch makes the child die with
> APCHILD_FATAL, and the parent kills off the whole server.
> 
> This is wrong.  Apache 1.3 never killed off the whole server because a
> child process had problems, at least not that I am aware of.

Ah, I missed something the first time through.  Yes, I agree
that Jeff's behavioural modification is not optimal.
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

Re: [PATCH] bringing down the server from an MPM thread

Posted by dean gaudet <dg...@arctic.org>.
On Fri, 28 Jul 2000, Ben Laurie wrote:

> What scares me most is that a corrupt child could exit with that status
> and take out the server. Something a bit more solid would be good.

hrm.

well in 1.3 a child could drop a cookie in its scoreboard too.

-dean


Re: [PATCH] bringing down the server from an MPM thread

Posted by Ben Laurie <be...@algroup.co.uk>.
dean gaudet wrote:
> 
> On Wed, 26 Jul 2000, Ben Laurie wrote:
> 
> > dean gaudet wrote:
> > >
> > > On Tue, 25 Jul 2000 rbb@covalent.net wrote:
> > >
> > > > This is wrong.  Apache 1.3 never killed off the whole server because a
> > > > child process had problems, at least not that I am aware of.
> > >
> > > it did.
> > >
> > > see http_main.c.  that's what APEXIT_CHILDFATAL means -- the world is
> > > fucked, the child doesn't think things can continue.
> > >
> > >     if ((WIFEXITED(status)) &&
> > >         WEXITSTATUS(status) == APEXIT_CHILDFATAL) {
> > >         ap_log_error(APLOG_MARK, APLOG_ALERT|APLOG_NOERRNO, server_conf,
> > >                         "Child %d returned a Fatal error... \n"
> > >                         "Apache is exiting!",
> > >                         pid);
> > >         exit(APEXIT_CHILDFATAL);
> > >     }
> >
> > That's scary!
> 
> i'm not sure there's an alternative though.
> 
> it could try a cleanup -- as long as it protects itself from nested fatal
> errors and just dies on subsequent fatals.

What scares me most is that a corrupt child could exit with that status
and take out the server. Something a bit more solid would be good.

> at some point you want a human to make a decision, and this would seem to
> be one of them.  it's going to be much better than putting in complex
> decision making foo which is only going to screw up more.

Agreed.

> (there was this time that VCS, veritas cluster server, decided a host was
> down and double-mounted all of its filesystems on the hot spare...
> meanwhile the host was up, but kernel NFS was starving userland so much it
> couldn't heartbeat... massive corruption resulted.  just say no to
> automated failover!  i'd rather staff a 24x7 NOC or spend more effort on
> better monitoring :)

I've been saying no to automated failover for at least two decades now,
and this is a classic example of why!

Cheers,

Ben.

--
http://www.apache-ssl.org/ben.html

Coming to ApacheCon Europe 2000? http://apachecon.com/

Re: [PATCH] bringing down the server from an MPM thread

Posted by dean gaudet <dg...@arctic.org>.
On Wed, 26 Jul 2000, Ben Laurie wrote:

> dean gaudet wrote:
> > 
> > On Tue, 25 Jul 2000 rbb@covalent.net wrote:
> > 
> > > This is wrong.  Apache 1.3 never killed off the whole server because a
> > > child process had problems, at least not that I am aware of.
> > 
> > it did.
> > 
> > see http_main.c.  that's what APEXIT_CHILDFATAL means -- the world is
> > fucked, the child doesn't think things can continue.
> > 
> >     if ((WIFEXITED(status)) &&
> >         WEXITSTATUS(status) == APEXIT_CHILDFATAL) {
> >         ap_log_error(APLOG_MARK, APLOG_ALERT|APLOG_NOERRNO, server_conf,
> >                         "Child %d returned a Fatal error... \n"
> >                         "Apache is exiting!",
> >                         pid);
> >         exit(APEXIT_CHILDFATAL);
> >     }
> 
> That's scary!

i'm not sure there's an alternative though.

it could try a cleanup -- as long as it protects itself from nested fatal
errors and just dies on subsequent fatals.

at some point you want a human to make a decision, and this would seem to
be one of them.  it's going to be much better than putting in complex
decision making foo which is only going to screw up more.

(there was this time that VCS, veritas cluster server, decided a host was
down and double-mounted all of its filesystems on the hot spare...
meanwhile the host was up, but kernel NFS was starving userland so much it
couldn't heartbeat... massive corruption resulted.  just say no to
automated failover!  i'd rather staff a 24x7 NOC or spend more effort on
better monitoring :)

-dean


Re: [PATCH] bringing down the server from an MPM thread

Posted by Ben Laurie <be...@algroup.co.uk>.
dean gaudet wrote:
> 
> On Tue, 25 Jul 2000 rbb@covalent.net wrote:
> 
> > This is wrong.  Apache 1.3 never killed off the whole server because a
> > child process had problems, at least not that I am aware of.
> 
> it did.
> 
> see http_main.c.  that's what APEXIT_CHILDFATAL means -- the world is
> fucked, the child doesn't think things can continue.
> 
>     if ((WIFEXITED(status)) &&
>         WEXITSTATUS(status) == APEXIT_CHILDFATAL) {
>         ap_log_error(APLOG_MARK, APLOG_ALERT|APLOG_NOERRNO, server_conf,
>                         "Child %d returned a Fatal error... \n"
>                         "Apache is exiting!",
>                         pid);
>         exit(APEXIT_CHILDFATAL);
>     }

That's scary!

Cheers,

Ben.

--
http://www.apache-ssl.org/ben.html

Coming to ApacheCon Europe 2000? http://apachecon.com/

Re: [PATCH] bringing down the server from an MPM thread

Posted by dean gaudet <dg...@arctic.org>.
On Tue, 25 Jul 2000 rbb@covalent.net wrote:

> This is wrong.  Apache 1.3 never killed off the whole server because a
> child process had problems, at least not that I am aware of.

it did.

see http_main.c.  that's what APEXIT_CHILDFATAL means -- the world is
fucked, the child doesn't think things can continue.

    if ((WIFEXITED(status)) &&
        WEXITSTATUS(status) == APEXIT_CHILDFATAL) {
        ap_log_error(APLOG_MARK, APLOG_ALERT|APLOG_NOERRNO, server_conf,
                        "Child %d returned a Fatal error... \n"
                        "Apache is exiting!",
                        pid);
        exit(APEXIT_CHILDFATAL);
    }

-dean


Re: [PATCH] bringing down the server from an MPM thread

Posted by rb...@covalent.net.
On Tue, 25 Jul 2000, Rodent of Unusual Size wrote:

> rbb@covalent.net wrote:
> > 
> > We do not want to kill the whole server just because we hit an
> > error.  This would remove our robustness completely.
> 
> It's not just any old errors, it's FATAL errors.  If 1.3 died
> in comparable cases then I think it's appropriate for 2.0 as
> well.  The master will restart the process anyway, so the
> remark about robustness doesn't make sense.

This isn't what Jeff said.  Jeff's comment is that currently if we hit a
fatal error in 2.0 in ap_poll or ap_accept, the child process dies and the
parent restarts that child.  His patch makes the child die with
APCHILD_FATAL, and the parent kills off the whole server.

This is wrong.  Apache 1.3 never killed off the whole server because a
child process had problems, at least not that I am aware of.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: [PATCH] bringing down the server from an MPM thread

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
rbb@covalent.net wrote:
> 
> We do not want to kill the whole server just because we hit an
> error.  This would remove our robustness completely.

It's not just any old errors, it's FATAL errors.  If 1.3 died
in comparable cases then I think it's appropriate for 2.0 as
well.  The master will restart the process anyway, so the
remark about robustness doesn't make sense.
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

RE: Patch for winnt MPM ??

Posted by "William A. Rowe, Jr." <wr...@lnd.com>.
> From: greg@raleigh.ibm.com [mailto:greg@raleigh.ibm.com]On Behalf Of
> Greg Ames
> 
> Gregory Nicholls wrote:
> >
> >     Now bearing in mind that I've _never_ used CVS before 
> today I've prepared what
> > I _think_ is a patch for registry.c. If no one dies 
> laughing I'll try service.c as
> > well.
> >     G.
> 
> Being a fellow newbie with a mainframe background (and a fellow Greg -
> dang, there's 4 of us now) I can sympathize with the learning 
> curve.  A
> quick hint - please use the "-u" flag with cvs diff.  On Unix a lot of
> us use .cvsrc files in our home directories where we stash 
> all the flags
> so we don't have to think about it.  I'm sure cvs on Windows has the
> kind of thing.
> 
> Here's what I have in my .cvsrc, inherited from Fred S. and/or Ralf:
> 
> checkout -P
> update -P -d
> diff -u -d -b
> co -P
> rdiff -u
> cvs -q -z3

A better list then my own  [note to self... flag this message :]

I didn't recognize if you are using Win32 for cvs, if you are, this
info might help also...

Set HOME=C:\WinNT\profiles\%UserName%\unix or whatever you like,
and toss that .cvsrc file in the HOME directory.  Same with .ssh, if
you ever need to store keys for the SSH util.  Cvs will even play nice
and keep your .cvspass passwords in there.





Re: Patch for winnt MPM ??

Posted by Greg Ames <gr...@raleigh.ibm.com>.
Gregory Nicholls wrote:
>
>     Now bearing in mind that I've _never_ used CVS before today I've prepared what
> I _think_ is a patch for registry.c. If no one dies laughing I'll try service.c as
> well.
>     G.

Being a fellow newbie with a mainframe background (and a fellow Greg -
dang, there's 4 of us now) I can sympathize with the learning curve.  A
quick hint - please use the "-u" flag with cvs diff.  On Unix a lot of
us use .cvsrc files in our home directories where we stash all the flags
so we don't have to think about it.  I'm sure cvs on Windows has the
kind of thing.

Here's what I have in my .cvsrc, inherited from Fred S. and/or Ralf:

checkout -P
update -P -d
diff -u -d -b
co -P
rdiff -u
cvs -q -z3

Greg

Re: Patch for winnt MPM ??

Posted by Manoj Kasichainula <ma...@io.com>.
On Tue, Jul 25, 2000 at 05:08:06PM -0400, Gregory Nicholls wrote:
> A couple of the files in the winnt mpm (registry.c and service.c)
> are using APR_NOTFOUND to indicate missing files. I _think_ they
> should be using APR_ENOFILE.  My code base has APR_NOTFOUND listed
> as a missing socket in a poll set.

Any objection to renaming APR_NOTFOUND? Sounds too vague to me.


Patch for winnt MPM part 2

Posted by Gregory Nicholls <gn...@level8.com>.
 Well apparently I didn't make a complete idiot of myself so here's part 2 of the fix.
It's for service.c using diff -u -d -b as suggested by another Greg.

    -------------- cut here -------------------
Index: service.c
===================================================================
RCS file: /home/cvspublic/apache-2.0/src/modules/mpm/winnt/service.c,v
retrieving revision 1.19
diff -u -d -b -r1.19 service.c
--- service.c 2000/07/25 00:58:22 1.19
+++ service.c 2000/07/26 12:50:07
@@ -657,7 +657,7 @@

     /* Take the given literal name if there is no service entry */
     display_name = ap_pstrdup(p, name);
-    return APR_NOTFOUND;
+    return APR_ENOFILE;
 }



RE: Patch for winnt MPM ??

Posted by "William A. Rowe, Jr." <wr...@lnd.com>.
> From: rbb@covalent.net [mailto:rbb@covalent.net]
> Sent: Tuesday, July 25, 2000 4:08 PM
> 
> Gregory,
> 
> This looks good to me.  I'll try to commit this later today.  
> If I don't get to it until tomorrow, don't worry, I will commit it.

Looks good here as well... that would probably be my snafu.
I still need to do a once-over before release 6 (I won't have time
before release 5) but G... feel free to continue submitting patches :)

> On Tue, 25 Jul 2000, Gregory Nicholls wrote:
> 
> >  Hi,
> >     A couple of the files in the winnt mpm (registry.c and 
> service.c) are using
> > APR_NOTFOUND to indicate missing files. I _think_ they 
> should be using APR_ENOFILE.
> > My code base has APR_NOTFOUND listed as a missing socket in 
> a poll set.
> >     Now bearing in mind that I've _never_ used CVS before 
> today I've prepared what
> > I _think_ is a patch for registry.c. If no one dies 
> laughing I'll try service.c as
> > well.
> >     G.
> > 
> > 
> > --------------------------------- cut here
> > --------------------------------------------
> > Index: registry.c
> > ===================================================================
> > RCS file: 
> /home/cvspublic/apache-2.0/src/modules/mpm/winnt/registry.c,v
> > retrieving revision 1.24
> > diff -b -B -r1.24 registry.c
> > 184c184
> > <  * The return value is APR_SUCCESS, APR_ENOPATH, 
> APR_NOTFOUND, or the OS error
> > ---
> > >  * The return value is APR_SUCCESS, APR_ENOPATH, 
> APR_ENOFILE, or the OS error
> > 228c228
> > <         rv = APR_NOTFOUND;
> > ---
> > >         rv = APR_ENOFILE;
> > 268c268
> > <         rv = APR_NOTFOUND;
> > ---
> > >         rv = APR_ENOFILE;
> > 
> > 
> 
> 
> ______________________________________________________________
> _________________
> Ryan Bloom                        	rbb@apache.org
> 406 29th St.
> San Francisco, CA 94131
> --------------------------------------------------------------
> -----------------
> 

Re: Patch for winnt MPM ??

Posted by rb...@covalent.net.
Gregory,

This looks good to me.  I'll try to commit this later today.  If I don't
get to it until tomorrow, don't worry, I will commit it.

Ryan

On Tue, 25 Jul 2000, Gregory Nicholls wrote:

>  Hi,
>     A couple of the files in the winnt mpm (registry.c and service.c) are using
> APR_NOTFOUND to indicate missing files. I _think_ they should be using APR_ENOFILE.
> My code base has APR_NOTFOUND listed as a missing socket in a poll set.
>     Now bearing in mind that I've _never_ used CVS before today I've prepared what
> I _think_ is a patch for registry.c. If no one dies laughing I'll try service.c as
> well.
>     G.
> 
> 
> --------------------------------- cut here
> --------------------------------------------
> Index: registry.c
> ===================================================================
> RCS file: /home/cvspublic/apache-2.0/src/modules/mpm/winnt/registry.c,v
> retrieving revision 1.24
> diff -b -B -r1.24 registry.c
> 184c184
> <  * The return value is APR_SUCCESS, APR_ENOPATH, APR_NOTFOUND, or the OS error
> ---
> >  * The return value is APR_SUCCESS, APR_ENOPATH, APR_ENOFILE, or the OS error
> 228c228
> <         rv = APR_NOTFOUND;
> ---
> >         rv = APR_ENOFILE;
> 268c268
> <         rv = APR_NOTFOUND;
> ---
> >         rv = APR_ENOFILE;
> 
> 


_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Patch for winnt MPM ??

Posted by Gregory Nicholls <gn...@level8.com>.
 Hi,
    A couple of the files in the winnt mpm (registry.c and service.c) are using
APR_NOTFOUND to indicate missing files. I _think_ they should be using APR_ENOFILE.
My code base has APR_NOTFOUND listed as a missing socket in a poll set.
    Now bearing in mind that I've _never_ used CVS before today I've prepared what
I _think_ is a patch for registry.c. If no one dies laughing I'll try service.c as
well.
    G.


--------------------------------- cut here
--------------------------------------------
Index: registry.c
===================================================================
RCS file: /home/cvspublic/apache-2.0/src/modules/mpm/winnt/registry.c,v
retrieving revision 1.24
diff -b -B -r1.24 registry.c
184c184
<  * The return value is APR_SUCCESS, APR_ENOPATH, APR_NOTFOUND, or the OS error
---
>  * The return value is APR_SUCCESS, APR_ENOPATH, APR_ENOFILE, or the OS error
228c228
<         rv = APR_NOTFOUND;
---
>         rv = APR_ENOFILE;
268c268
<         rv = APR_NOTFOUND;
---
>         rv = APR_ENOFILE;



Re: [PATCH] bringing down the server from an MPM thread

Posted by rb...@covalent.net.
We do not want to kill the whole server just because we hit an
error.  This would remove our robustness completely.

-1

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------