You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2006/03/15 21:31:42 UTC

DO NOT REPLY [Bug 38989] New: - restart + piped logs stalls httpd for 24 minutes (Solaris)

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38989>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38989

           Summary: restart + piped logs stalls httpd for 24 minutes
                    (Solaris)
           Product: Apache httpd-1.3
           Version: HEAD
          Platform: Sun
        OS/Version: Solaris
            Status: NEW
          Severity: normal
          Priority: P2
         Component: core
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: rainer.jung@kippdata.de


When using piped logs "apachectl restart" stalls apache on Solaris for 1431 seconds.

This report is quite lengthy, because I tried to fully analyze and I will
explain, why that happens e.g. on Solaris, but not on Linux.

Tested on Solaris/Sparc 8, 9 and 10 with Apache 1.3.27 and 1.3.34. Also tested
with our own build and with the standard Solaris 10 build. It doesn't matter,
which piped logger is being used. Problem found with rotatelogs, but /bin/cat
shows the same problem.

>From the code analysis the bug must be quite old.

1) Piped Logs and children

Piped logs for apache 1.3 spawn there children by fork/exec to /bin/sh -c
"piped_logger". Under Solaris this results in two processes per pipe. The first
one is a child of the main httpd process and is /bin/sh, the second one is a
child of /bin/sh and is a piped_logger process.

Under Linux there will be only one child per piped_logger, the /bin/sh-process
does not exist. I assume, /bin/sh has a very different implementation for
handling the "-c" flag for Solaris and for Linux.

2) Signal masks of the logging children

All logging children have SIGHUP ignored. This is true for the two process case
(Solaris) and the one process case (Linux). I assum this comes from the fact,
that the three methods in http_log.c which spawn logging child all set SIGHUP to
SIG_IGN before forking.

All logging children have SIGUSR1 set to default. This again is true for Solaris
and Linux.

Now concerning SIGTERM: Under Solaris the /bin/sh child of httpd has a signal
handler for SIGTERM. psig shows:

TERM    caught  0x16c64         0

The second looging child (the piped_logger itself) has SIGTERM set to default.

Under Linux the only logging child, the piped_logger has SIGTERM set to default.

3) What happens when restart is done

apachectl sends a SIGHUP to the httpd master process. That one itself sends
SIGHUP to the full process group. On Solaris and on Linux all logging children
ignore SIGHUP, so no effect on the logging children. The httpd processes which
are children of the master terminate during SIGHUP.

Now in reclaim_child_processes(0) near the end there is a loop which calls
waitpid for all direct logging children. The call returns with 0 and then the
maintenance method is caled, which sends a TERM signal to the direct logging child.

Under Solaris this means, that a TERM signal is being send to the /bin/sh
children. These are temselves inside waitid, which gets interrupted and starts
again. The signal is not passed along to their child (the piped_loger itself).

Under Linux the logging child is the piped_logger itself. It receives the
SIGTERM and terminates.

Since the /bin/sh (and the piped_logger) did not terminate for Solaris,
reclaim_child_processes(0) takes another pass through the tries loop and
increases the wait time before callng another maintenance/sending another kill
in each pass by a factor of 4. We get waits for

...
13479:          timeout: 1.048576000 sec
13479:          timeout: 4.194304000 sec
13479:          timeout: 16.777216000 sec
13479:          timeout: 67.108864000 sec
13479:          timeout: 268.435456000 sec
13479:          timeout: 1073.741824000 sec

The last line is for the 9th iteration. Then the waittime gets too big and as a
result, the remaining 3 iterations wait with a timeout of 0 (minor bug).

So after about 1431 seconds httpd gives up. It then closes several file
descriptors and at that moment the piped_loggers terminate and as a result the
/bin/sh also.

4) Why is graceful working?

graceful uses SIGUSR1 which is not masked by any of the logging children. So
when sending the signal to the process group, all logging children terminate.
httpd never has to use the TERM signal like in restart.

5) Why is apache2 working

apache2 does not use /bin/sh to spawn the logging children, so even under
Solaris the piped logger is a direct child of httpd, so receives SIGTERM and
terminates.

6) What can be done?

I've no especially good solution. Possible are:

- not increasing the timeout that often/to such huge values
- sending SIGTERM not only to the children, but instead to the process group
- closing the file descriptors earlier, so that the piped loggers have a chance
to terminate without the signal

I hope you made it to the end. Any Solutions or ideas for discussion?

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 38989] - restart + piped logs stalls httpd for 24 minutes (Solaris)

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38989>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38989


rpluem@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|1.3.34                      |HEAD




------- Additional Comments From rpluem@apache.org  2006-03-16 12:01 -------
Setting back to HEAD. I did not notice that the product had been set to "Apache
httpd-1.3".

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 38989] - restart + piped logs stalls httpd for 24 minutes (Solaris)

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38989>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38989





------- Additional Comments From rainer.jung@kippdata.de  2006-03-17 09:29 -------
I found a workaround, which is sufficient for us: I changed the SHELL_PATH in
httpd.h from "/bin/sh" to "/usr/xpg4/bin/sh". That's a XPG4 compliant shell
which exists on Solaris since Solaris 2.5, so about 10 years.

The shell is similar to ksh (see "man ksh") and does *not* fork when called with
"-c".

As a consequence Solaris then behaves like Linux, only one process is being
forked from the httpd for each piped logger and the signal handling is OK.

Unfortunately I could not figure out how to pass a
'CFLAGS=-DSHELL_PATH="/usr/xpg4/bin/sh"' successfully to configure, because when
I don't escape the '"', they get stripped of, and when I escape them, then
mod_ssl configure complains :(

I finally just hacked httpd.h before running make.

It would be nice to at least document the behaviour:

- restart in combination with piped logs is broken on any platform,
  where "/bin/sh -c" forks (e.g. Solaris)
- if the platform has another shell, that does not fork, like
  /usr/xpg4/bin/sh or /bin/ksh, then defining SHELL_PATH to
  this shell will help.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 38989] - restart + piped logs stalls httpd for 24 minutes (Solaris)

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38989>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38989


rpluem@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|HEAD                        |1.3.34




-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org