You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Jeff Trawick <tr...@attglobal.net> on 2001/11/15 22:09:58 UTC

[PATCH] rough fix for cgid hanging around after MPM bails

If an MPM's child process exits with APEXIT_CHILDFATAL,
ap_process_child_status() will exit, skipping any cleanups and other
shutdown logic that the MPM and/or main() usually perform.

Any comments on the following changes (other MPMs would follow coding
model of worker?

With the changes as-is, prefork doesn't clean up the pid file and
doesn't add its own shutdown log message, whereas worker does clean up
the pid file and adds its own log message saying that it got SIGTERM.

The discrepancy isn't cool; if anybody is going to have a preference
about these little details, speaking up now would be most
appreciated.

This patch is intended to resolve this STATUS item:

    * when a Unix MPM bails out due to an initialization error in the 
      detached process (e.g., mutex init failure, pthread_create failure), 
      other children (cgid, at least) are left hanging around

Thanks,

Jeff

Index: include/mpm_common.h
===================================================================
RCS file: /home/cvs/httpd-2.0/include/mpm_common.h,v
retrieving revision 1.29
diff -u -r1.29 mpm_common.h
--- include/mpm_common.h	2001/10/23 17:30:07	1.29
+++ include/mpm_common.h	2001/11/15 20:37:52
@@ -134,9 +134,10 @@
  * parent signalling it.
  * @param pid The child that has died
  * @param status The status returned from ap_wait_or_timeout
+ * @return 0 on success, APEXIT_CHILDFATAL if MPM should terminate
  */
 #ifdef AP_MPM_WANT_PROCESS_CHILD_STATUS
-void ap_process_child_status(apr_proc_t *pid, apr_exit_why_e why, int status);
+int ap_process_child_status(apr_proc_t *pid, apr_exit_why_e why, int status);
 #endif
 
 #if defined(TCP_NODELAY) && !defined(MPE) && !defined(TPF)
Index: server/mpm_common.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/mpm_common.c,v
retrieving revision 1.70
diff -u -r1.70 mpm_common.c
--- server/mpm_common.c	2001/10/23 17:30:07	1.70
+++ server/mpm_common.c	2001/11/15 20:37:56
@@ -225,21 +225,24 @@
 #endif /* AP_MPM_WANT_WAIT_OR_TIMEOUT */
 
 #ifdef AP_MPM_WANT_PROCESS_CHILD_STATUS
-void ap_process_child_status(apr_proc_t *pid, apr_exit_why_e why, int status)
+int ap_process_child_status(apr_proc_t *pid, apr_exit_why_e why, int status)
 {
     int signum = status;
     const char *sigdesc = apr_signal_get_description(signum);
 
     /* Child died... if it died due to a fatal error,
-        * we should simply bail out.
-        */
+     * we should simply bail out.  The caller needs to
+     * check for bad rc from us and exit, running any
+     * appropriate cleanups.
+     */
     if ((APR_PROC_CHECK_EXIT(why)) &&
         (status == APEXIT_CHILDFATAL)) {
         ap_log_error(APLOG_MARK, APLOG_ALERT|APLOG_NOERRNO, 0, ap_server_conf,
-                        "Child %ld returned a Fatal error..." APR_EOL_STR
-                        "Apache is exiting!",
-                        (long)pid->pid);
-        exit(APEXIT_CHILDFATAL);
+                     "Child %" APR_OS_PROC_T_FMT
+                     " returned a Fatal error..." APR_EOL_STR
+                     "Apache is exiting!",
+                     pid->pid);
+        return APEXIT_CHILDFATAL;
     }
 
     if (APR_PROC_CHECK_SIGNALED(why)) {
Index: server/mpm/prefork/prefork.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/mpm/prefork/prefork.c,v
retrieving revision 1.211
diff -u -r1.211 prefork.c
--- server/mpm/prefork/prefork.c	2001/11/11 05:17:51	1.211
+++ server/mpm/prefork/prefork.c	2001/11/15 20:37:56
@@ -1126,7 +1126,10 @@
 	 * extra child
 	 */
 	if (pid.pid != -1) {
-	    ap_process_child_status(&pid, exitwhy, status);
+            if (ap_process_child_status(&pid, exitwhy, status) == APEXIT_CHILDFATAL) {
+                return 1;
+            }
+
 	    /* non-fatal death... note that it's gone in the scoreboard. */
 	    ap_sync_scoreboard_image();
 	    child_slot = find_child_by_pid(&pid);
Index: server/mpm/worker/worker.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/mpm/worker/worker.c,v
retrieving revision 1.34
diff -u -r1.34 worker.c
--- server/mpm/worker/worker.c	2001/11/12 17:50:59	1.34
+++ server/mpm/worker/worker.c	2001/11/15 20:37:57
@@ -1195,7 +1196,10 @@
         ap_wait_or_timeout(&exitwhy, &status, &pid, pconf);
         
         if (pid.pid != -1) {
-            ap_process_child_status(&pid, exitwhy, status);
+            if (ap_process_child_status(&pid, exitwhy, status) == APEXIT_CHILDFATAL) {
+                shutdown_pending = 1;
+                return;
+            }
             /* non-fatal death... note that it's gone in the scoreboard. */
             child_slot = find_child_by_pid(&pid);
             if (child_slot >= 0) {

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: [PATCH] rough fix for cgid hanging around after MPM bails

Posted by Jeff Trawick <tr...@attglobal.net>.
Greg Ames <gr...@remulak.net> writes:

> Jeff Trawick wrote:
> > 
> > With this newer patch, the behavior w.r.t. pid file cleanup and
> > SIGTERM messages should be consistent between prefork and worker.
> > Neither pid file cleanup nor the SIGTERM message is issued.
> 
> Personally, I like the idea of pid file cleanup when bad stuff happens. 
> It's probably not done everywhere it needs to be done within worker,  It
> isn't consistant between MPMs, as you point out, and isn't consistant
> with 1.3 behavior.  But so what?  Cleaning up the pid file just seems
> like the right way to go, IMO.

I think that if you really want the pid file cleaned up, then a
cleanup should be registered (or maybe the file left open, with
the APR_DELONCLOSE flag?  dunno).  If you really want it removed, try
harder than just removing it for this path.

The absense of httpd with presence of httpd.pid means something hit
the fan, which is what this type of exit is.

but the bottom line:

However the code looks w.r.t. to httpd.pid when I commit, I'll still
buy coffee Monday if you change it :)

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: [PATCH] rough fix for cgid hanging around after MPM bails

Posted by Greg Ames <gr...@remulak.net>.
Jeff Trawick wrote:
> 
> With this newer patch, the behavior w.r.t. pid file cleanup and
> SIGTERM messages should be consistent between prefork and worker.
> Neither pid file cleanup nor the SIGTERM message is issued.

Personally, I like the idea of pid file cleanup when bad stuff happens. 
It's probably not done everywhere it needs to be done within worker,  It
isn't consistant between MPMs, as you point out, and isn't consistant
with 1.3 behavior.  But so what?  Cleaning up the pid file just seems
like the right way to go, IMO.

no opinion yet on the messages.

Greg

Re: [PATCH] rough fix for cgid hanging around after MPM bails

Posted by Jeff Trawick <tr...@attglobal.net>.
With this newer patch, the behavior w.r.t. pid file cleanup and
SIGTERM messages should be consistent between prefork and worker.
Neither pid file cleanup nor the SIGTERM message is issued.

I expect that other MPMs which call ap_process_child_status (beos,
perchild, threaded) would be modified in the same way as worker.

Index: server/mpm_common.c
===================================================================
RCS file: /cvs/apache/httpd-2.0/server/mpm_common.c,v
retrieving revision 1.70
diff -u -r1.70 mpm_common.c
--- mpm_common.c	2001/10/23 17:30:07	1.70
+++ mpm_common.c	2001/11/16 13:45:38
@@ -225,21 +225,24 @@
 #endif /* AP_MPM_WANT_WAIT_OR_TIMEOUT */
 
 #ifdef AP_MPM_WANT_PROCESS_CHILD_STATUS
-void ap_process_child_status(apr_proc_t *pid, apr_exit_why_e why, int status)
+int ap_process_child_status(apr_proc_t *pid, apr_exit_why_e why, int status)
 {
     int signum = status;
     const char *sigdesc = apr_signal_get_description(signum);
 
     /* Child died... if it died due to a fatal error,
-        * we should simply bail out.
-        */
+     * we should simply bail out.  The caller needs to
+     * check for bad rc from us and exit, running any
+     * appropriate cleanups.
+     */
     if ((APR_PROC_CHECK_EXIT(why)) &&
         (status == APEXIT_CHILDFATAL)) {
         ap_log_error(APLOG_MARK, APLOG_ALERT|APLOG_NOERRNO, 0, ap_server_conf,
-                        "Child %ld returned a Fatal error..." APR_EOL_STR
-                        "Apache is exiting!",
-                        (long)pid->pid);
-        exit(APEXIT_CHILDFATAL);
+                     "Child %" APR_OS_PROC_T_FMT
+                     " returned a Fatal error..." APR_EOL_STR
+                     "Apache is exiting!",
+                     pid->pid);
+        return APEXIT_CHILDFATAL;
     }
 
     if (APR_PROC_CHECK_SIGNALED(why)) {
Index: server/mpm/prefork/prefork.c
===================================================================
RCS file: /cvs/apache/httpd-2.0/server/mpm/prefork/prefork.c,v
retrieving revision 1.211
diff -u -r1.211 prefork.c
--- prefork.c	2001/11/11 05:17:51	1.211
+++ prefork.c	2001/11/16 13:45:38
@@ -1126,7 +1126,10 @@
 	 * extra child
 	 */
 	if (pid.pid != -1) {
-	    ap_process_child_status(&pid, exitwhy, status);
+            if (ap_process_child_status(&pid, exitwhy, status) == APEXIT_CHILDFATAL) {
+                return 1;
+            }
+
 	    /* non-fatal death... note that it's gone in the scoreboard. */
 	    ap_sync_scoreboard_image();
 	    child_slot = find_child_by_pid(&pid);
Index: server/mpm/worker/worker.c
===================================================================
RCS file: /cvs/apache/httpd-2.0/server/mpm/worker/worker.c,v
retrieving revision 1.34
diff -u -r1.34 worker.c
--- worker.c	2001/11/12 17:50:59	1.34
+++ worker.c	2001/11/16 13:45:38
@@ -302,6 +302,7 @@
 static int volatile restart_pending;
 static int volatile is_graceful;
 ap_generation_t volatile ap_my_generation;
+static volatile int child_fatal;
 
 /*
  * ap_start_shutdown() and ap_start_restart(), below, are a first stab at
@@ -1195,7 +1197,11 @@
         ap_wait_or_timeout(&exitwhy, &status, &pid, pconf);
         
         if (pid.pid != -1) {
-            ap_process_child_status(&pid, exitwhy, status);
+            if (ap_process_child_status(&pid, exitwhy, status) == APEXIT_CHILDFATAL) {
+                shutdown_pending = 1;
+                child_fatal = 1;
+                return;
+            }
             /* non-fatal death... note that it's gone in the scoreboard. */
             child_slot = find_child_by_pid(&pid);
             if (child_slot >= 0) {
@@ -1359,9 +1365,9 @@
             ap_log_error(APLOG_MARK, APLOG_WARNING, errno, ap_server_conf, "killpg SIGTERM");
         }
         ap_reclaim_child_processes(1);		/* Start with SIGTERM */
-    
-        /* cleanup pid file on normal shutdown */
-        {
+
+        if (!child_fatal) {
+            /* cleanup pid file on normal shutdown */
             const char *pidfile = NULL;
             pidfile = ap_server_root_relative (pconf, ap_pid_fname);
             if ( pidfile != NULL && unlink(pidfile) == 0)
@@ -1369,11 +1375,10 @@
             		 ap_server_conf,
             		 "removed PID file %s (pid=%ld)",
             		 pidfile, (long)getpid());
-        }
-    
-        ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_NOTICE, 0, ap_server_conf,
-            "caught SIGTERM, shutting down");
     
+            ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_NOTICE, 0,
+                         ap_server_conf, "caught SIGTERM, shutting down");
+        }
 	return 1;
     }
 

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...