You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2003/03/28 21:30:49 UTC

Showstopper? Inheritence, CGI Exec() changes bad for cgid

It seems that the combination of Jeff T's and my inheritence
patches collude to crashing the cgid worker.

If we fail to exec(), we end up calling log_error_core, which
causes that fork()ed cgid worker to SIGPIPE and die.  Apparently
we don't have a valid fd to the server's error log at that point.

Any volunteers to help track down this regression before we roll
out this release???  This is the last unfixed regression/showstopper 
that either Sander Temme or I have identified.

Bill


Re: Showstopper? Inheritence, CGI Exec() changes bad for cgid

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 07:34 AM 3/30/2003, Jeff Trawick wrote:

>oh darn, I took a wild branch...  this doesn't have anything to do with cgid_child_errfn at all...  it is this code, isn't it??? (look at Sander's minimal traceback (which would be much better with a -g build and line numbers!!!))
>
>      rc = ap_os_create_privileged_process(r, procnew, argv0, argv,
>                                           (const char * const *)env,
>                                           procattr, ptrans);
>
>      if (rc != APR_SUCCESS) {
>            /* Bad things happened. Everyone should have cleaned up. */
>            ap_log_rerror(APLOG_MARK, APLOG_ERR|APLOG_TOCLIENT, rc, r,
>                          "couldn't create child process: %d: %s", rc,
>                          apr_filename_of_pathname(r->filename));
>
>when I force rc != APR_SUCCESS on Linux, I get a segfault...  the problem is that ap_log_rerror() references r->headers_in, which wasn't replicated in the daemon...  the attached patch works around that in the same manner as cgid_child_errfn()

Right... and we had a third I cleaned up :-(

>no theory here for why I get segfault and you get SIGPIPE from writing to a closed descriptor; it certainly is an indication that we are talking about different problems, but I've been up and down the cgid_child_errfn() path and can't make it fail

No doubt this is a strange overload condition.  The point is that we were
being pretty ungraceful about it.  I think all the ap_log_rerror foobars were
the biggest issue.  Hope these clear it all up.

>also, shouldn't the APLOG_TOCLIENT flag get zapped since it doesn't apply to ap_log_error()?

Fixed.  I agree.

See the last commit to httpd-2.1 (1.149)... if I see anyone else agree 
I'll go ahead and commit to -2.0 and we can call this closed.

Thanks for they eyeballs, Jeff!

Bill 


Re: Showstopper? Inheritence, CGI Exec() changes bad for cgid

Posted by Jeff Trawick <tr...@attglobal.net>.
William A. Rowe, Jr. wrote:
> At 08:07 PM 3/29/2003, Jeff Trawick wrote:
> 
> 
>>Maybe there is a Solaris patch for this :)  It seems really sucky for exec to have a failure point after close-for-exec files are closed and after signal handling is reset.
> 
> 
> No... look at what we do in apr_proc_child_create ... we are the ones closing
> all of the handles that we need in invoking ap_log_error()!  Can't blame the OS
> for this one :-)

oh, in apr_proc_create()->apr_pool_cleanup_for_exec() I guess

yeah, I can see the normal error log file descriptor getting cleaned up 
by apr_pool_cleanup_for_exec()

but something I forgot about: mod_cgid sets r->server->error_log 
descriptor to STDERR after accepting a new CGI request:

     while (!daemon_should_exit) {
         int errfileno = STDERR_FILENO;
         ...
         apr_os_file_put(&r->server->error_log, &errfileno, 0, r->pool);
         apr_os_file_put(&inout, &sd2, 0, r->pool);

and when I hit cgid_child_errfn() due to an exec failure, stderr is 
where it will try to write

(gdb) p *r->server->error_log
$6 = {pool = 0x813fba0, filedes = 2, fname = 0x0, flags = 2048, eof_hit = 0,
   is_pipe = 0, timeout = -1, buffered = 0, blocking = BLK_UNKNOWN,
   ungetchar = -1, buffer = 0x0, bufpos = 0, dataRead = 0, direction = 0,
   filePtr = 0, thlock = 0x0}

oh darn, I took a wild branch...  this doesn't have anything to do with 
cgid_child_errfn at all...  it is this code, isn't it??? (look at 
Sander's minimal traceback (which would be much better with a -g build 
and line numbers!!!))

       rc = ap_os_create_privileged_process(r, procnew, argv0, argv,
                                            (const char * const *)env,
                                            procattr, ptrans);

       if (rc != APR_SUCCESS) {
             /* Bad things happened. Everyone should have cleaned up. */
             ap_log_rerror(APLOG_MARK, APLOG_ERR|APLOG_TOCLIENT, rc, r,
                           "couldn't create child process: %d: %s", rc,
                           apr_filename_of_pathname(r->filename));

when I force rc != APR_SUCCESS on Linux, I get a segfault...  the 
problem is that ap_log_rerror() references r->headers_in, which wasn't 
replicated in the daemon...  the attached patch works around that in the 
same manner as cgid_child_errfn()

no theory here for why I get segfault and you get SIGPIPE from writing 
to a closed descriptor; it certainly is an indication that we are 
talking about different problems, but I've been up and down the 
cgid_child_errfn() path and can't make it fail

alternatively, this code could just do ap_log_error(..., main_server, ...)

also, shouldn't the APLOG_TOCLIENT flag get zapped since it doesn't 
apply to ap_log_error()?

recap:
I find different code that is problematic
I can make that different code fail, but with different symptom from you
please try the patch to see if your symptom changes or goes away

Re: Showstopper? Inheritence, CGI Exec() changes bad for cgid

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 08:07 PM 3/29/2003, Jeff Trawick wrote:

>Maybe there is a Solaris patch for this :)  It seems really sucky for exec to have a failure point after close-for-exec files are closed and after signal handling is reset.

No... look at what we do in apr_proc_child_create ... we are the ones closing
all of the handles that we need in invoking ap_log_error()!  Can't blame the OS
for this one :-)

>If we really want that log file descriptor closed when the CGI script runs, then the child errfn can't use it and must write to stderr, since I believe that that is the only reporting mechanism guaranteed to be available regardless of where the failure to fork+tweak+exec occurred.

We really do want all the 'extra' handles closed.  The 'one' error logging handle
might or might not be closed based on our choice, but we certainly don't need
a whole horde of handles lying around :-)  I agree that the stderr logging, to get
2.0.45 out the door, is the way to go while we reconsider all these implications.

Bill 


Re: Showstopper? Inheritence, CGI Exec() changes bad for cgid

Posted by Jeff Trawick <tr...@attglobal.net>.
William A. Rowe, Jr. wrote:
> Yes - this is with suexec

so the exec of suexec fails...  that shouldn't make any difference 
(infamous last words) whether we run suexec or printenv/perl or whatever...

I'm guessing that in your scenario where the failure of exec() is due to 
heavy load, the file descriptors marked as close-on-exec have been 
closed prior to the point where the failure occurred, whereas in my 
scenario where I forced a hard exec() failure the kernel hadn't closed 
the file descriptors yet.

(The same issue arises in some other modules if this is really the case 
(exec failed *after* closing close-on-exec file descriptors) :( )

As for SIGPIPE, handling of it should only get set back to SIG_IGN when 
the new program is exec-ed (forked child about to run the CGI script (or 
suexec in your case) keeps the SIGPIPE settings of the cgid daemon process).

Maybe there is a Solaris patch for this :)  It seems really sucky for 
exec to have a failure point after close-for-exec files are closed and 
after signal handling is reset.

If we really want that log file descriptor closed when the CGI script 
runs, then the child errfn can't use it and must write to stderr, since 
I believe that that is the only reporting mechanism guaranteed to be 
available regardless of where the failure to fork+tweak+exec occurred.


Re: Showstopper? Inheritence, CGI Exec() changes bad for cgid

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Yes - this is with suexec - quite possibly adding mayhem.  But the problem
encountered is a regression, the old cgid would fail under load for a few requests,
and then resume it's happy existence.  Now cgid is dying absolutely with a
SIGPIPE failure (which should have been trapped according to the parent's
signal mask???).  Again, this is a very heavy load scenario.

Sander Temme attached GDB to the CGId process and prodded it until 
it crashed here:

(gdb) continue
Continuing.
[New Thread 1 (LWP 1)]
[Switching to Thread 1 (LWP 1)]

Program received signal SIGPIPE, Broken pipe.
0x3c7a0 in log_error_core ()
(gdb) bt
#0  0x3c7a0 in log_error_core ()
#1  0x3cb0c in ap_log_error ()
#2  0xfe7b3228 in cgid_server ()
   from /local0/sctemme/inst/ersN14/apache/modules/standard/mod_cgid.so
#3  0xfe7b35e4 in cgid_init ()
   from /local0/sctemme/inst/ersN14/apache/modules/standard/mod_cgid.so
#4  0x3badc in ap_run_post_config ()
#5  0x3e10c in main ()

Without load you might see no side effects.  Sander was using specweb99
but you might also try ab(using) a valid printenv script, with 40 concurrent 
connections or so.  You might even throw in a  sleep(2) into that script just 
to increase the chances of overload.

Remember that we've already run the child_cleanups, so all uninherited
files are closed upfront.  If you want to leave one dangling file hanging
around, you remember this patch;

  --- log.c     15 Feb 2003 01:21:31 -0000      1.127.2.2
  +++ log.c     20 Mar 2003 21:56:06 -0000      1.127.2.3
  @@ -320,8 +320,6 @@
                            ap_server_argv0, fname);
               return DONE;
           }
  -
  -        apr_file_inherit_set(s->error_log);
       }
   
       return OK;

So if we call apr_file_inherit_set(r->server->error_log) in the cgid daemon
to ensure we can log as you've attempted to do, and immediately call
apr_file_inherit_unset(r->server->error_log) when finished with that one
request, this could solve the problem.

But I still suspect it would be better to either print the error exception
to something predictable like stderr, or create a 500 response in the
forked cgid exec process itself to stdout, with perhaps an apache-private
header (e.g. x-httpd-logerror: foo) that will be shuttled to the log file and
removed before sending the response to the client.

Bill

At 09:25 PM 3/28/2003, Jeff Trawick wrote:
>Jeff Trawick wrote:
>
>>Let me try it with your latest tag on Solaris 8 and see if my results are any different.
>
>Solaris 8, your _RC2 tag, worker+mod_cgid or prefork+mod_cgid:
>
>[Fri Mar 28 21:43:07 2003] [debug] prefork.c(1039): AcceptMutex: posixsem (default: posixsem)
>[Fri Mar 28 21:43:10 2003] [error] (13)Permission denied: exec of 'printenv' failed
>[Fri Mar 28 21:43:10 2003] [error] [client 127.0.0.1] Premature end of script headers: printenv
>
>[Fri Mar 28 22:04:55 2003] [debug] worker.c(1736): AcceptMutex: posixsem (default: posixsem)
>[Fri Mar 28 22:04:58 2003] [error] (13)Permission denied: exec of 'printenv' failed
>[Fri Mar 28 22:04:58 2003] [error] [client 127.0.0.1] Premature end of script headers: printenv
>
>Here's the set of modules I have:
>
>Compiled in modules:
>  core.c
>  mod_access.c
>  mod_auth.c
>  mod_file_cache.c
>  mod_charset_lite.c
>  mod_case_filter.c
>  mod_case_filter_in.c
>  mod_ext_filter.c
>  mod_include.c
>  mod_log_config.c
>  mod_env.c
>  mod_mime_magic.c
>  mod_expires.c
>  mod_headers.c
>  mod_usertrack.c
>  mod_unique_id.c
>  mod_setenvif.c
>  worker.c
>  http_core.c
>  mod_mime.c
>  mod_dav.c
>  mod_status.c
>  mod_autoindex.c
>  mod_asis.c
>  mod_info.c
>  mod_cgid.c
>  mod_dav_fs.c
>  mod_vhost_alias.c
>  mod_negotiation.c
>  mod_dir.c
>  mod_imap.c
>  mod_actions.c
>  mod_userdir.c
>  mod_alias.c
>  mod_rewrite.c
>  mod_so.c
>
>nothing loaded dynamically
>
>Maybe you want to use truss to see which file descriptor leads to SIGPIPE, and then back up to see approx. where it was closed?
>
>Maybe not catching SIGPIPE in this short path is a good thing after all.
>



Re: Showstopper? Inheritence, CGI Exec() changes bad for cgid

Posted by Jeff Trawick <tr...@attglobal.net>.
Jeff Trawick wrote:

> Let me try it with your latest tag on Solaris 8 and see if my results 
> are any different.

Solaris 8, your _RC2 tag, worker+mod_cgid or prefork+mod_cgid:

[Fri Mar 28 21:43:07 2003] [debug] prefork.c(1039): AcceptMutex: 
posixsem (default: posixsem)
[Fri Mar 28 21:43:10 2003] [error] (13)Permission denied: exec of 
'printenv' failed
[Fri Mar 28 21:43:10 2003] [error] [client 127.0.0.1] Premature end of 
script headers: printenv

[Fri Mar 28 22:04:55 2003] [debug] worker.c(1736): AcceptMutex: posixsem 
(default: posixsem)
[Fri Mar 28 22:04:58 2003] [error] (13)Permission denied: exec of 
'printenv' failed
[Fri Mar 28 22:04:58 2003] [error] [client 127.0.0.1] Premature end of 
script headers: printenv

Here's the set of modules I have:

Compiled in modules:
   core.c
   mod_access.c
   mod_auth.c
   mod_file_cache.c
   mod_charset_lite.c
   mod_case_filter.c
   mod_case_filter_in.c
   mod_ext_filter.c
   mod_include.c
   mod_log_config.c
   mod_env.c
   mod_mime_magic.c
   mod_expires.c
   mod_headers.c
   mod_usertrack.c
   mod_unique_id.c
   mod_setenvif.c
   worker.c
   http_core.c
   mod_mime.c
   mod_dav.c
   mod_status.c
   mod_autoindex.c
   mod_asis.c
   mod_info.c
   mod_cgid.c
   mod_dav_fs.c
   mod_vhost_alias.c
   mod_negotiation.c
   mod_dir.c
   mod_imap.c
   mod_actions.c
   mod_userdir.c
   mod_alias.c
   mod_rewrite.c
   mod_so.c

nothing loaded dynamically

Maybe you want to use truss to see which file descriptor leads to 
SIGPIPE, and then back up to see approx. where it was closed?

Maybe not catching SIGPIPE in this short path is a good thing after all.



Re: Showstopper? Inheritence, CGI Exec() changes bad for cgid

Posted by Jeff Trawick <tr...@attglobal.net>.
Jeff Trawick wrote:

> Weird.  Here is what I get with prefork+modcgi on Linux, built a couple 
> of days ago:

silly me, I meant to say prefork+mod_cgid...


Re: Showstopper? Inheritence, CGI Exec() changes bad for cgid

Posted by Jeff Trawick <tr...@attglobal.net>.

William A. Rowe, Jr. wrote:
> It seems that the combination of Jeff T's and my inheritence
> patches collude to crashing the cgid worker.
> 
> If we fail to exec(), we end up calling log_error_core, which
> causes that fork()ed cgid worker to SIGPIPE and die.  Apparently
> we don't have a valid fd to the server's error log at that point.

Weird.  Here is what I get with prefork+modcgi on Linux, built a couple 
of days ago:

[error] (13)Permission denied: exec of 'printenv' failed

(i.e., exec failed, reported to the log with no problem)

Are you using suexec perhaps?

Are you using a vhost-specific error log?  That is a long-standing 
problem with mod_cgid: There isn't logic to reconstruct enough of the 
right server_rec to log to the right place.  (not sure that the symptoms 
match, but it can't hurt to ask)

Let me try it with your latest tag on Solaris 8 and see if my results 
are any different.



Re: Showstopper? Inheritence, CGI Exec() changes bad for cgid

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
This patch seems to be one issue;

--- httpd-2.0/modules/generators/mod_cgid.c	2003/02/03 17:31:39	1.145.2.2
+++ httpd-2.0/modules/generators/mod_cgid.c	2003/02/27 12:33:08	1.145.2.3
+static void cgid_child_errfn(apr_pool_t *pool, apr_status_t err,
+                             const char *description)
+{
+    request_rec *r;
+    void *vr;
+
+    apr_pool_userdata_get(&vr, ERRFN_USERDATA_KEY, pool);
+    r = vr;
+
+    /* sure we got r, but don't call ap_log_rerror() because we don't
+     * have r->headers_in and possibly other storage referenced by
+     * ap_log_rerror()
+     */
+    ap_log_error(APLOG_MARK, APLOG_ERR, err, r->server, "%s", description);
+}

We can't call ap_log_error after we've invoked the child_cleanups 
because all those oddball log files are already closed.

I see a couple quick choices... inherit the "REAL" log handle as fd3
to every cgi process and print to that fd.  Print such a log message
to stderr/fd2.  Or toggle the log handle as FD_CLOEXEC but remove
it's cleanup function.

Finally, we see fatal SIGPIPE's on Solaris... it looks like we need 
to do something like;


Index: modules/generators/mod_cgid.c
===================================================================
RCS file: /home/cvs/httpd-2.0/modules/generators/mod_cgid.c,v
retrieving revision 1.145.2.3
diff -u -r1.145.2.3 mod_cgid.c
--- modules/generators/mod_cgid.c	27 Feb 2003 12:33:08 -0000	1.145.2.3
+++ modules/generators/mod_cgid.c	28 Mar 2003 22:07:12 -0000
@@ -600,5 +600,13 @@
 
     apr_pool_create(&ptrans, pcgi); 
 
+    /* The child process doesn't do anything for AP_SIG_GRACEFUL.  
+     * and family.  Instead, SIGHUP is used for signalling restart.
+     */
     apr_signal(SIGCHLD, SIG_IGN); 
+    apr_signal(AP_SIG_GRACEFUL, SIG_IGN);
+    apr_signal(SIGTERM, SIG_IGN);
+#ifdef SIGPIPE
+    apr_signal(SIGPIPE, SIG_IGN);
+#endif
     apr_signal(SIGHUP, daemon_signal_handler);

to cover some of the other signals we should be respecting/ignoring
in the cgid worker process.

Oh, and this code;

        case APR_OC_REASON_LOST:
            /* it would be better to restart just the cgid child
             * process but for now we'll gracefully restart the entire 
             * server by sending AP_SIG_GRACEFUL to ourself, the httpd 
             * parent process
             */
            kill(getpid(), AP_SIG_GRACEFUL);

is broken when the child triggers it (wrong pid!!!)

Perhaps, why didn't we generate one cgid per worker process?  It would
make alot more sense if one child could begin it's own graceful shutdown
and let the parent replace it when its requests (that weren't CGI) are all
satisfied.

Bill