You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Bill Stoddard <bi...@wstoddard.com> on 2001/07/12 20:18:16 UTC

nasty bug in Apache for Windows (1.3 & 2.)

This is definitely in 2.0 and I believe it is in 1.3 as well.

If the Apache child process segfaults while any other process it started is running (CGI,
rotatelogs, etc,), there is a chance that the server will stop serving pages, even after a complete
shutdown and restart of Apache. The problem is twofold...

1. When an Apache (for Windows) child process segfaults, any processes it started are stranded and
not cleaned up. The Win32 API does not provide any facilities to tell the system to kill off any
child processes when the parent dies abnormally.  This is the least serious part of the problem.

2. This is the nasty part... Due to a bug in the Windows part of Apache, child processes are
inheriting open socket descriptors. When the Apache child process segfaults, its child processes
have copies of the open socket descriptors which can prevent the new Apache process from accepting
connections..  This could explain some long standing bug reports in the bugdb.

Solutions...
Sockets are created as inheritable by default. We need to use DuplicateHandle to create
noninheritable handles of the listeners.  This is a bit tricker than it first appears and I spent
the better part of this AM getting this to work. There are some funky race conditions between
CreateProcess() (to create the Apache child process) and WSADuplicateSocket() that will, if not
handled properly, undo any effort to make the listeners noninheritable.

I have no thoughts on how to cleanly solve problem 1.  Would be nice if there were some system calls
to bind the two processes together in a parent/child relationship.

Workarounds:
Reboot :-(  or if you are familier enough with the processes Apache starts on your system, shutdown
Apache then search and destroy the leftover processes (rotatelogs, CGI, etc.) that Apache should
have cleaned up. If you do a netstat -an and still find a listener on your webserver port, you
missed something.

Bill


Re: nasty bug in Apache for Windows (1.3 & 2.)

Posted by Bill Stoddard <bi...@wstoddard.com>.
> > > > The problem is that the processes started by the child are inheriting the sockets from the
> > child. I
> > > > wasn't clear about that.
> > > >
> > > > The parent needs to manage the listen sockets to enable graceful restarts to work.  Having
the
> > > > parent own the listeners allows us to not destroy the listen queue (and anything on it)
across a
> > > > graceful restart.  The code to prevent inheriting the socket is quite simple, it was just
> > behaving
> > > > strangely (see below).
> > >
> > > I have a patch on my computer that closes the sockets when children create
> > > child processes.  I haven't committed it because I haven't fully tested it
> > > yet.  I'll try to finish it up and commit it tonight.  This patch should
> > > fix a big part of this part of the problem.
> > >
> >
> > I suspect your patch is specific to Unix.  I already have the fix for the Windows MPM.
>
> It shouldn't be.  There is a bug in the Bug DB that says that child
> processes children are not closing the socket.  This is happening on ALL
> platforms.  The solution should be to add the correct cleanups so that the
> socket is closed whenever the child calls apr_create_process.
>

The child process under Windows can inherit socket descriptors, but it has no way of knowing the
value of the inherited descriptors. (remember, Windows does not fork).  On Windows, you have to
explicitly tell the child (via some sort of IPC) about the values of the inherited descriptors.

So this problem can be fixed in two ways on Windows...
Soln 1 (the wrong solution): Allow the child to inherit the sockets, then have the parent
communicate the values to the child (via a pipe for instance) so the child can then close the
inherited sockets.

Soln2 (the right way):  Use DuplicateHandle to set the listeners noninheritable in the parent right
after they are opened.  This is the fix I checked in this AM.

Note, I use WSADuplicateSocket and a pipe to -explicitly- send the listeners to the child rather
than have the child inherit the sockets.  This is the preferred way to share sockets among processes
using Winsock2.  WSADuplicateSocket is more reliable (and flexible) than implicitly sharing sockets
via inheritance.

Bill


Re: nasty bug in Apache for Windows (1.3 & 2.)

Posted by rb...@covalent.net.
> > > The problem is that the processes started by the child are inheriting the sockets from the
> child. I
> > > wasn't clear about that.
> > >
> > > The parent needs to manage the listen sockets to enable graceful restarts to work.  Having the
> > > parent own the listeners allows us to not destroy the listen queue (and anything on it) across a
> > > graceful restart.  The code to prevent inheriting the socket is quite simple, it was just
> behaving
> > > strangely (see below).
> >
> > I have a patch on my computer that closes the sockets when children create
> > child processes.  I haven't committed it because I haven't fully tested it
> > yet.  I'll try to finish it up and commit it tonight.  This patch should
> > fix a big part of this part of the problem.
> >
>
> I suspect your patch is specific to Unix.  I already have the fix for the Windows MPM.

It shouldn't be.  There is a bug in the Bug DB that says that child
processes children are not closing the socket.  This is happening on ALL
platforms.  The solution should be to add the correct cleanups so that the
socket is closed whenever the child calls apr_create_process.

Ryan

_____________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
Covalent Technologies			rbb@covalent.net
-----------------------------------------------------------------------------


Re: nasty bug in Apache for Windows (1.3 & 2.)

Posted by Bill Stoddard <bi...@wstoddard.com>.
> On Thu, 12 Jul 2001, Bill Stoddard wrote:
>
> > >
> > > > 2. This is the nasty part... Due to a bug in the Windows part of Apache, child processes are
> > > > inheriting open socket descriptors. When the Apache child process segfaults, its child
processes
> > > > have copies of the open socket descriptors which can prevent the new Apache process from
> > accepting
> > > > connections..  This could explain some long standing bug reports in the bugdb.
> > >
> > > Why, specifically, do we have the parent keep the sockets open?  Can we simply open the parent
> > socket
> > > (to test that it is available, and try it exclusively, since we don't do that correctly now
> > anyways),
> > > then close it, and let the child threads open their own (non-inhertable) sockets, themselves?
> > > Does this really cost us that much?
> > >
> >
> > The problem is that the processes started by the child are inheriting the sockets from the
child. I
> > wasn't clear about that.
> >
> > The parent needs to manage the listen sockets to enable graceful restarts to work.  Having the
> > parent own the listeners allows us to not destroy the listen queue (and anything on it) across a
> > graceful restart.  The code to prevent inheriting the socket is quite simple, it was just
behaving
> > strangely (see below).
>
> I have a patch on my computer that closes the sockets when children create
> child processes.  I haven't committed it because I haven't fully tested it
> yet.  I'll try to finish it up and commit it tonight.  This patch should
> fix a big part of this part of the problem.
>

I suspect your patch is specific to Unix.  I already have the fix for the Windows MPM.

Bill


Re: nasty bug in Apache for Windows (1.3 & 2.)

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: <rb...@covalent.net>
Sent: Thursday, July 12, 2001 10:17 PM


> On Thu, 12 Jul 2001, Bill Stoddard wrote:
> > The problem is that the processes started by the child are inheriting the sockets from the child. I
> > wasn't clear about that.
> >
> > The parent needs to manage the listen sockets to enable graceful restarts to work.  Having the
> > parent own the listeners allows us to not destroy the listen queue (and anything on it) across a
> > graceful restart.  The code to prevent inheriting the socket is quite simple, it was just behaving
> > strangely (see below).
> 
> I have a patch on my computer that closes the sockets when children create
> child processes.  I haven't committed it because I haven't fully tested it
> yet.  I'll try to finish it up and commit it tonight.  This patch should
> fix a big part of this part of the problem.

Only for fork()ed children ;-)


Re: nasty bug in Apache for Windows (1.3 & 2.)

Posted by rb...@covalent.net.
On Thu, 12 Jul 2001, Bill Stoddard wrote:

> >
> > > 2. This is the nasty part... Due to a bug in the Windows part of Apache, child processes are
> > > inheriting open socket descriptors. When the Apache child process segfaults, its child processes
> > > have copies of the open socket descriptors which can prevent the new Apache process from
> accepting
> > > connections..  This could explain some long standing bug reports in the bugdb.
> >
> > Why, specifically, do we have the parent keep the sockets open?  Can we simply open the parent
> socket
> > (to test that it is available, and try it exclusively, since we don't do that correctly now
> anyways),
> > then close it, and let the child threads open their own (non-inhertable) sockets, themselves?
> > Does this really cost us that much?
> >
>
> The problem is that the processes started by the child are inheriting the sockets from the child. I
> wasn't clear about that.
>
> The parent needs to manage the listen sockets to enable graceful restarts to work.  Having the
> parent own the listeners allows us to not destroy the listen queue (and anything on it) across a
> graceful restart.  The code to prevent inheriting the socket is quite simple, it was just behaving
> strangely (see below).

I have a patch on my computer that closes the sockets when children create
child processes.  I haven't committed it because I haven't fully tested it
yet.  I'll try to finish it up and commit it tonight.  This patch should
fix a big part of this part of the problem.

Ryan

_____________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
Covalent Technologies			rbb@covalent.net
-----------------------------------------------------------------------------


Re: nasty bug in Apache for Windows (1.3 & 2.)

Posted by Bill Stoddard <bi...@wstoddard.com>.
>
> > 2. This is the nasty part... Due to a bug in the Windows part of Apache, child processes are
> > inheriting open socket descriptors. When the Apache child process segfaults, its child processes
> > have copies of the open socket descriptors which can prevent the new Apache process from
accepting
> > connections..  This could explain some long standing bug reports in the bugdb.
>
> Why, specifically, do we have the parent keep the sockets open?  Can we simply open the parent
socket
> (to test that it is available, and try it exclusively, since we don't do that correctly now
anyways),
> then close it, and let the child threads open their own (non-inhertable) sockets, themselves?
> Does this really cost us that much?
>

The problem is that the processes started by the child are inheriting the sockets from the child. I
wasn't clear about that.

The parent needs to manage the listen sockets to enable graceful restarts to work.  Having the
parent own the listeners allows us to not destroy the listen queue (and anything on it) across a
graceful restart.  The code to prevent inheriting the socket is quite simple, it was just behaving
strangely (see below).

> > Solutions...
> > Sockets are created as inheritable by default. We need to use DuplicateHandle to create
> > noninheritable handles of the listeners.  This is a bit tricker than it first appears and I
spent
> > the better part of this AM getting this to work. There are some funky race conditions between
> > CreateProcess() (to create the Apache child process) and WSADuplicateSocket() that will, if not
> > handled properly, undo any effort to make the listeners noninheritable.
>
> Not sure how there is a race here... they are still opened (in the parent) inheritable), simply
dup as
> a non-inheritable and close the inherited socket in the child, no?

Heh, the behaviour is quite nonintuitive.  I set the listeners noninheritable in the parent process.
Then call CreateProcess. If the created process initializes -after- the call to WSADuplicateSocket
in the parent process, the sockets are inherited anyway.  This looks like a bug in the Windows API.
If I ensure the child is fully initialized before calling WSADuplicateSocket (by inserting a Sleep
or WaitForInputIdle after the call to CreateProcess), the sockets are not inherited.

>
> > I have no thoughts on how to cleanly solve problem 1.  Would be nice if there were some system
calls
> > to bind the two processes together in a parent/child relationship.
>
> Oh, yeah... took them till NT 5 to figure that out themselves :-/

Not even sure NT 5 fixes it right. Nothing I've read about the process group support implied that a
segfaulting parent would take out a child. Hope to commit the code to handle the socket inheritance
tomorrow (having trouble with getting WaitForInputIdle to link. The prototype is not being picked up
out of winuser.h).

Bill


Re: nasty bug in Apache for Windows (1.3 & 2.)

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: "Bill Stoddard" <bi...@wstoddard.com>
Sent: Thursday, July 12, 2001 1:18 PM


> This is definitely in 2.0 and I believe it is in 1.3 as well.
> 
> If the Apache child process segfaults while any other process it started is running (CGI,
> rotatelogs, etc,), there is a chance that the server will stop serving pages, even after a complete
> shutdown and restart of Apache. The problem is twofold...
> 
> 1. When an Apache (for Windows) child process segfaults, any processes it started are stranded and
> not cleaned up. The Win32 API does not provide any facilities to tell the system to kill off any
> child processes when the parent dies abnormally.  This is the least serious part of the problem.

True ... win32 folks have expected that for some time.  It deserves to be fixed, but isn't as critical.

> 2. This is the nasty part... Due to a bug in the Windows part of Apache, child processes are
> inheriting open socket descriptors. When the Apache child process segfaults, its child processes
> have copies of the open socket descriptors which can prevent the new Apache process from accepting
> connections..  This could explain some long standing bug reports in the bugdb.

Why, specifically, do we have the parent keep the sockets open?  Can we simply open the parent socket
(to test that it is available, and try it exclusively, since we don't do that correctly now anyways),
then close it, and let the child threads open their own (non-inhertable) sockets, themselves?  
Does this really cost us that much?

> Solutions...
> Sockets are created as inheritable by default. We need to use DuplicateHandle to create
> noninheritable handles of the listeners.  This is a bit tricker than it first appears and I spent
> the better part of this AM getting this to work. There are some funky race conditions between
> CreateProcess() (to create the Apache child process) and WSADuplicateSocket() that will, if not
> handled properly, undo any effort to make the listeners noninheritable.

Not sure how there is a race here... they are still opened (in the parent) inheritable), simply dup as
a non-inheritable and close the inherited socket in the child, no?

> I have no thoughts on how to cleanly solve problem 1.  Would be nice if there were some system calls
> to bind the two processes together in a parent/child relationship.

Oh, yeah... took them till NT 5 to figure that out themselves :-/

> Workarounds:
> Reboot :-(  or if you are familier enough with the processes Apache starts on your system, shutdown
> Apache then search and destroy the leftover processes (rotatelogs, CGI, etc.) that Apache should
> have cleaned up. If you do a netstat -an and still find a listener on your webserver port, you
> missed something.
 
Thanks for all the details, look forward to seeing the code