You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Stefan Fritsch <sf...@sfritsch.de> on 2008/01/04 14:42:05 UTC

PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

Hi,

this bug can be quite annoying because of the resources used by the hung
processes. It happens e.g. under Linux when epoll is used.

The patch from http://issues.apache.org/bugzilla/show_bug.cgi?id=42829#c14
has been in Debian unstable/Ubuntu hardy for several weeks and there have
not been any complaints.

It would be nice if you could look at it and commit it to svn.

Thanks,
Stefan

RE: No error message for startup errors with the Apache service.

Posted by Ashwani Kumar Sharma <As...@mindtree.com>.

OK James.



Thanks and Regards,
Ashwani Sharma
Mob: +91+9916454843
Off: +91-80-26265053


-----Original Message-----
From: James Park (pencil_ethics) [mailto:pencilethics.list@gmail.com] 
Sent: Sunday, January 20, 2008 7:06 PM
To: dev@httpd.apache.org
Subject: Re: No error message for startup errors with the Apache service.

Ashwani,

This is the wrong mailing list for your question.
Please send your question to users@httpd.apache.org.
This mailing list is for discussion related to httpd development.
Support questions belong in the users@ mailing list.

- James Park


DISCLAIMER:
This message (including attachment if any) is confidential and may be privileged. If you have received this message by mistake please notify the sender by return e-mail and delete this message from your system. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited.
E-mail may contain viruses. Before opening attachments please check them for viruses and defects. While MindTree Consulting Limited (MindTree) has put in place checks to minimize the risks, MindTree will not be responsible for any viruses or defects or any forwarded attachments emanating either from within MindTree or outside.
Please note that e-mails are susceptible to change and MindTree shall not be liable for any improper, untimely or incomplete transmission.
MindTree reserves the right to monitor and review the content of all messages sent to or from MindTree e-mail address. Messages sent to or from this e-mail address may be stored on the MindTree e-mail system or else where.

Re: No error message for startup errors with the Apache service.

Posted by "James Park (pencil_ethics)" <pe...@gmail.com>.

Ashwani,

This is the wrong mailing list for your question.
Please send your question to users@httpd.apache.org.
This mailing list is for discussion related to httpd development.
Support questions belong in the users@ mailing list.

- James Park

No error message for startup errors with the Apache service.

Posted by Ashwani Kumar Sharma <As...@mindtree.com>.

Hi Folks,

When I accidently start Apache service on Windows on a port number which is
aleady engaged. Why I don,t see any log file being created by Apache?

If I try to strart apache exe httpd.exe from the cmd prompt I see the port
number error message being flashed on the console. Why this error message is
not seen in case of the service?



Thanks and Regards,
Ashwani Sharma
Mob: +91+9916454843
Off: +91-80-26265053




DISCLAIMER:
This message (including attachment if any) is confidential and may be privileged. If you have received this message by mistake please notify the sender by return e-mail and delete this message from your system. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited.
E-mail may contain viruses. Before opening attachments please check them for viruses and defects. While MindTree Consulting Limited (MindTree) has put in place checks to minimize the risks, MindTree will not be responsible for any viruses or defects or any forwarded attachments emanating either from within MindTree or outside.
Please note that e-mails are susceptible to change and MindTree shall not be liable for any improper, untimely or incomplete transmission.
MindTree reserves the right to monitor and review the content of all messages sent to or from MindTree e-mail address. Messages sent to or from this e-mail address may be stored on the MindTree e-mail system or else where.

Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

Posted by Martin Kraemer <ma...@apache.org>.

On Fri, Jan 04, 2008 at 02:42:05PM +0100, Stefan Fritsch wrote:
> Hi,
> 
> this bug can be quite annoying because of the resources used by the hung
> processes. It happens e.g. under Linux when epoll is used.
> 
> The patch from http://issues.apache.org/bugzilla/show_bug.cgi?id=42829#c14
> has been in Debian unstable/Ubuntu hardy for several weeks and there have
> not been any complaints.
> 
> It would be nice if you could look at it and commit it to svn.

I can confirm that there are problems with the restart at least on
FreeBSD-4.x/prefork.

On FreeBSD-4.x/prefork I see this after a graceful restart:
--snip--
$ apachectl status

                      Apache Server Status for localhost

   Server Version: Apache/2.3.0-dev (Unix) mod_ssl/2.3.0-dev
          OpenSSL/0.9.7d-p1 DAV/2

   Server Built: Jan 16 2008 04:19:11
[..]
   CPU Usage: u4.45313 s4.3125 cu0 cs0 - .00454% CPU load
   .0265 requests/sec - 9 B/second - 372 B/request
   10 requests currently being processed, 7 idle workers

GGGGGG_G__GG____W...............................................
................................................................
[...]
--snip--

After another graceful restart, I see
GGGGGGGGGGGGGGGWG____...........................................
and the 'G' processes are stuck at state 'G'.

With the patch applied, I no longer see any of the hanging
"gracefully stuck" processes.

So, from my side, I'd +1 the patch (although I understand the intention
of the code, I have not "brain-traced" all code paths, so this is not
a final "code +1" but just a "appears to fix the problem +1").

Anyone else?

   Martin
-- 
<Ma...@Fujitsu-Siemens.com>        |     Fujitsu Siemens
http://www.fujitsu-siemens.com/imprint.html | 81730  Munich,  Germany

Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

Posted by Jeff Trawick <tr...@gmail.com>.

On Tue, Feb 5, 2008 at 7:53 AM, Joe Orton <jo...@redhat.com> wrote:

> On Fri, Feb 01, 2008 at 10:41:39AM +0100, Stefan Fritsch wrote:
> > Joe Orton wrote:
> > > I mentioned in the bug that the signal handler could cause undefined
> > > behaviour, but I'm not sure now whether that is true.  On Linux I can
> > > reproduce some cases where this will happen, which are all due to
> > > well-defined behaviour:
> > >
> > > 1) with some (default on Linux) accept mutex types,
> > > apr_proc_mutex_lock() will loop on EINTR.  Hence, children blocked
> > > waiting for the mutex do "hang" until the mutex is released.  Fixing
> > > this would need some APR work, new interfaces, blah
> >
> > This is not a problem. On graceful-stop or reload the processes will get
> > the lock one by one and die (or hang somewhere else). I have never seen a
> > left over process hanging in this function.
>
> Well, normally all children will be woken up and take the accept mutex
> because of the dummy connections.  But if you have one child blocked
> because of issue (3) - whilst holding the accept mutex - all the other
> children will also be blocked.  If the EINTR could be processed at MPM
> level, this wouldn't happen.  So I think it is a problem, though you
> could argue that solving (3) also sort of solves (1).
>
> > > I can also reproduce a third case, but I'm not sure about the cause:
> > >
> > > 3) apr_pollset_poll() is blocking despite the fact that the listening
> > > fds are supposedly already closed before entering the syscall.
> >
> > This is the main problem in my experience.
> ...
> > On Linux with epoll, the hanging processes just blocks in
> > apr_pollset_poll(), so checking the return value won't do any good.
> >
> > Maybe the problem is that (AIUI) poll() returns POLLNVAL if a fd is not
> > open, while epoll() does not have something similar. In epoll.c, a
> comment
> > says "APR_POLLNVAL is not handled by epoll". Or should epoll return
> > EPOLLHUP in this case?
>
> I did some more research on this: the case is covered in the epoll(7)
> man page - fds are removed from any containing epoll sets on closure.
> So it is well-defined behaviour, and the "hang" is expected; when all
> the listeners are closed, the poll set becomes empty, so the
> apr_pollset_poll() call will sleep forever, or until interrupted by
> signal!
>
> select() and poll() will indeed return POLLNVAL for the closed-fds case,
> and prefork needs to check for that.
>
> From some brief googling, FreeBSD kqueue appears to have the same
> guarantee.  This PR has some investigation of what happens with Solaris
> ports: http://issues.apache.org/bugzilla/show_bug.cgi?id=42580
>
> For the graceful-stop case, it would be simple enough to just signal any
> dozy children again to wake them up in the wait-for-exit loop, but
> graceful-restart doesn't have that opportunity, so I'm not sure about a
> general solution.  Reducing the poll timeout to some non-infinite time
> would work.


This holds up to some very light graceful-restart testing on OpenSolaris
(the same light testing that triggered a hang):

Index: server/mpm/prefork/prefork.c
===================================================================
--- server/mpm/prefork/prefork.c    (revision 731724)
+++ server/mpm/prefork/prefork.c    (working copy)
@@ -540,10 +540,12 @@
                 apr_int32_t numdesc;
                 const apr_pollfd_t *pdesc;

-                /* timeout == -1 == wait forever */
-                status = apr_pollset_poll(pollset, -1, &numdesc, &pdesc);
+                /* timeout == 10 seconds to avoid a hang at graceful
restart/stop
+                 * caused by the closing of sockets by the signal handler
+                 */
+                status = apr_pollset_poll(pollset, apr_time_from_sec(10),
&numdesc, &pdesc);
                 if (status != APR_SUCCESS) {
-                    if (APR_STATUS_IS_EINTR(status)) {
+                    if (APR_STATUS_IS_TIMEUP(status) ||
APR_STATUS_IS_EINTR(status)) {
                         if (one_process && shutdown_pending) {
                             return;
                         }

Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

Posted by Joe Orton <jo...@redhat.com>.

On Fri, Feb 01, 2008 at 10:41:39AM +0100, Stefan Fritsch wrote:
> Joe Orton wrote:
> > I mentioned in the bug that the signal handler could cause undefined
> > behaviour, but I'm not sure now whether that is true.  On Linux I can
> > reproduce some cases where this will happen, which are all due to
> > well-defined behaviour:
> >
> > 1) with some (default on Linux) accept mutex types,
> > apr_proc_mutex_lock() will loop on EINTR.  Hence, children blocked
> > waiting for the mutex do "hang" until the mutex is released.  Fixing
> > this would need some APR work, new interfaces, blah
> 
> This is not a problem. On graceful-stop or reload the processes will get
> the lock one by one and die (or hang somewhere else). I have never seen a
> left over process hanging in this function.

Well, normally all children will be woken up and take the accept mutex 
because of the dummy connections.  But if you have one child blocked 
because of issue (3) - whilst holding the accept mutex - all the other 
children will also be blocked.  If the EINTR could be processed at MPM 
level, this wouldn't happen.  So I think it is a problem, though you 
could argue that solving (3) also sort of solves (1).

> > I can also reproduce a third case, but I'm not sure about the cause:
> >
> > 3) apr_pollset_poll() is blocking despite the fact that the listening
> > fds are supposedly already closed before entering the syscall.
> 
> This is the main problem in my experience.
...
> On Linux with epoll, the hanging processes just blocks in
> apr_pollset_poll(), so checking the return value won't do any good.
> 
> Maybe the problem is that (AIUI) poll() returns POLLNVAL if a fd is not
> open, while epoll() does not have something similar. In epoll.c, a comment
> says "APR_POLLNVAL is not handled by epoll". Or should epoll return
> EPOLLHUP in this case?

I did some more research on this: the case is covered in the epoll(7) 
man page - fds are removed from any containing epoll sets on closure.  
So it is well-defined behaviour, and the "hang" is expected; when all 
the listeners are closed, the poll set becomes empty, so the 
apr_pollset_poll() call will sleep forever, or until interrupted by 
signal!

select() and poll() will indeed return POLLNVAL for the closed-fds case, 
and prefork needs to check for that.

>From some brief googling, FreeBSD kqueue appears to have the same 
guarantee.  This PR has some investigation of what happens with Solaris 
ports: http://issues.apache.org/bugzilla/show_bug.cgi?id=42580

For the graceful-stop case, it would be simple enough to just signal any 
dozy children again to wake them up in the wait-for-exit loop, but 
graceful-restart doesn't have that opportunity, so I'm not sure about a 
general solution.  Reducing the poll timeout to some non-infinite time 
would work.

joe

Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

Posted by Stefan Fritsch <sf...@sfritsch.de>.

Joe Orton wrote:
> I mentioned in the bug that the signal handler could cause undefined
> behaviour, but I'm not sure now whether that is true.  On Linux I can
> reproduce some cases where this will happen, which are all due to
> well-defined behaviour:
>
> 1) with some (default on Linux) accept mutex types,
> apr_proc_mutex_lock() will loop on EINTR.  Hence, children blocked
> waiting for the mutex do "hang" until the mutex is released.  Fixing
> this would need some APR work, new interfaces, blah

This is not a problem. On graceful-stop or reload the processes will get
the lock one by one and die (or hang somewhere else). I have never seen a
left over process hanging in this function.

> 2) prefork's apr_pollset_poll() loop-on-EINTR loop was not checking
> die_now; the child holding the mutex will not die immediately if poll
> fails with EINTR, and will hence appear to "hang" until a new connection
> is recevied.  Fixed by http://svn.apache.org/viewvc?rev=613260&view=rev

IMHO this is the same as 3), as apr_pollset_poll() will be called again
but with all fds already closed.

> I can also reproduce a third case, but I'm not sure about the cause:
>
> 3) apr_pollset_poll() is blocking despite the fact that the listening
> fds are supposedly already closed before entering the syscall.

This is the main problem in my experience.

> I vaguely recall some issue with epoll being mentioned before in the
> context of graceful stop, but I can't find a reference.  Colm?
>
> A very tempting explanation for (3) would be the fact that prefork only
> polls for POLLIN events, not POLLHUP or POLLERR, or indeed that it does
> not check that the returned event really is a POLLIN event; POSIX says
> on poll:
>
> " ... poll() shall set the POLLHUP, POLLERR, and POLLNVAL flag in
>  revents if the condition is true, even if the application did not set
>  the corresponding bit in events."
>

I also had problems under solaris 9 where processes blocked in 
lr->accept_func() if the fd had been closed in the meantime. 
Unfortunately, I cannot reproduce it now even with an unpatched 2.2.6 and
I don't remember which configuration I used. But this could be related to
the returned event not being POLLIN.

> and there's even a comment in the prefork poll code to the effect that
> maybe checking the returned event type would be a good idea.  But from a
> brief play around here, fixing the poll code to DTRT doesn't help.  I
> think more investigation is needed to understand exactly what is going
> on here.
>
> (Also, just to note; I can reproduce (3) even with my patch to dup2
> against the listener fds.)

On Linux with epoll, the hanging processes just blocks in
apr_pollset_poll(), so checking the return value won't do any good.

Maybe the problem is that (AIUI) poll() returns POLLNVAL if a fd is not
open, while epoll() does not have something similar. In epoll.c, a comment
says "APR_POLLNVAL is not handled by epoll". Or should epoll return
EPOLLHUP in this case?

Stefan

Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

Posted by Joe Orton <jo...@redhat.com>.

On Fri, Jan 04, 2008 at 02:42:05PM +0100, Stefan Fritsch wrote:
> this bug can be quite annoying because of the resources used by the hung
> processes. It happens e.g. under Linux when epoll is used.
> 
> The patch from http://issues.apache.org/bugzilla/show_bug.cgi?id=42829#c14
> has been in Debian unstable/Ubuntu hardy for several weeks and there have
> not been any complaints.

I've been looking into this in more detail; excuse the length of this 
mail.  The symptom in question is described as "children hang after 
graceful restart/stop in 2.2.x".

I mentioned in the bug that the signal handler could cause undefined 
behaviour, but I'm not sure now whether that is true.  On Linux I can 
reproduce some cases where this will happen, which are all due to 
well-defined behaviour:

1) with some (default on Linux) accept mutex types, 
apr_proc_mutex_lock() will loop on EINTR.  Hence, children blocked 
waiting for the mutex do "hang" until the mutex is released.  Fixing 
this would need some APR work, new interfaces, blah

2) prefork's apr_pollset_poll() loop-on-EINTR loop was not checking 
die_now; the child holding the mutex will not die immediately if poll 
fails with EINTR, and will hence appear to "hang" until a new connection 
is recevied.  Fixed by http://svn.apache.org/viewvc?rev=613260&view=rev

I can also reproduce a third case, but I'm not sure about the cause:

3) apr_pollset_poll() is blocking despite the fact that the listening 
fds are supposedly already closed before entering the syscall.

I vaguely recall some issue with epoll being mentioned before in the 
context of graceful stop, but I can't find a reference.  Colm?

A very tempting explanation for (3) would be the fact that prefork only 
polls for POLLIN events, not POLLHUP or POLLERR, or indeed that it does 
not check that the returned event really is a POLLIN event; POSIX says 
on poll:

" ... poll() shall set the POLLHUP, POLLERR, and POLLNVAL flag in
 revents if the condition is true, even if the application did not set
 the corresponding bit in events."

and there's even a comment in the prefork poll code to the effect that 
maybe checking the returned event type would be a good idea.  But from a 
brief play around here, fixing the poll code to DTRT doesn't help.  I 
think more investigation is needed to understand exactly what is going 
on here.

(Also, just to note; I can reproduce (3) even with my patch to dup2 
against the listener fds.)

joe