You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Jeff Trawick <tr...@bellsouth.net> on 2001/06/01 20:42:55 UTC

Re: idle server processes not going away

<rb...@covalent.net> writes:

> As for the problem that Jeff is describing, this is a problem with trying
> to make the old Apache 1.3 code fit the MPM model without paying enough
> attention to the flow of the code.  Take a look at the 1.3 code, it uses
> longjmp to make sure it is always executing the correct code.  The 2.0
> code tries to use return codes from APR functions.  But, the basic code
> still looks like the 1.3 code.  A few months ago, child processes were
> morphing into parent processes when we tried to kill them off.  To fix
> this, I modified some of the code to exit at the right time.  I believe
> about a month later, Paul Reder made a similar change.  IMNSHO, this can
> be solved only by actually taking the time to trace through the prefork
> MPM, and figure out what is happening, and fixing the bugs.  BTW, the
> threaded and perchild MPMs are incredibly similar to the prefork MPM in
> this respect.

I'm not sure what you mean by "fixing the bugs."

As I see it, prefork wants to use signals as the way to tell child
processes to go away.  Doing much in the signal handler is
problematic.  Instead we need to 1) longjmp() from the signal handler to
a safe spot and exit from there or 2) we make sure we wake up when the
signal handler returns.  Do you have a third suggestion?

I'll post a patch in the next few minutes for solution 2.  It seems to
work fine.

-- 
Jeff Trawick | trawickj@bellsouth.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: idle server processes not going away

Posted by rb...@covalent.net.

On Fri, 1 Jun 2001, Greg Stein wrote:

> On Fri, Jun 01, 2001 at 04:08:25PM -0700, rbb@covalent.net wrote:
> > On Fri, 1 Jun 2001, Greg Stein wrote:
> >...
> > > Hmm. Maybe it would still have a P-o-D but not select/block on it. When it
> > > gets woken up with OOB data, *then* it would do a non-blocking read on the
> > > pipe. If something is there, then it dies.
> >
> > That would work.  In reality if we do this, then we don't even need to use
> > OOB for the socket.  If we connect and close immediately, that wakes the
> > child up and it then checks the P-o-D.  This is no different than a client
> > connecting and never sending anything.
>
> ooh! Even better.
>
> I'd think we would read the P-o-D on re-entry to the listen loop. That
> allows a child to handle/complete an incoming request and then die (rather
> than accept a request, then throw it out because it saw "die").
>
> In your scenario, the "handle request" completes awfully quickly :-), then
> the P-o-D is read, and bye-bye.

Doesn't really matter when you do the check.  I agree 100% that we can't
just throw out the request, but we can check right after we are woken up,
and just set a flag.  Then, after we serve the request, we either go back
to sleep or die.  The only real difference is when we check the pipe, so
it is 6 of one, half a dozen of the other.

> > I think this is a MUCH better implementation than we currently have.  This
> > also works for all MPMs, which means that any MPM can use S_L_U_A.
>
> Yup. We could probably have a few standard functions for opening, reading,
> and handling the P-o-D. The MPMs could then just do:
>
>     void *pod_ctx;
>
>     pod_ctx = ap_mpm_pod_open(...);
>
>     ...
>
>     if (ap_mpm_pod_check(pod_ctx))
>         worker_should_exit();
>
>     ...
>
>     ap_mpm_pod_close(pod_ctx);
>
>
> Although... actually, that last call just appears in worker_should_exit().

yep.  I may go ahead and implement this this weekend, because I really
like this.

> Not sure what sort of error returns and stuff would go into pod_check. I'm
> thinking none: it tells the work to continue or to exit (i.e. a simple
> boolean return flag). Internally, it handles any errors and makes a call on
> what to tell the worker.

I think I agree completely, but I want to think about it a bit more.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: idle server processes not going away

Posted by Greg Stein <gs...@lyra.org>.

On Fri, Jun 01, 2001 at 04:08:25PM -0700, rbb@covalent.net wrote:
> On Fri, 1 Jun 2001, Greg Stein wrote:
>...
> > Hmm. Maybe it would still have a P-o-D but not select/block on it. When it
> > gets woken up with OOB data, *then* it would do a non-blocking read on the
> > pipe. If something is there, then it dies.
> 
> That would work.  In reality if we do this, then we don't even need to use
> OOB for the socket.  If we connect and close immediately, that wakes the
> child up and it then checks the P-o-D.  This is no different than a client
> connecting and never sending anything.

ooh! Even better.

I'd think we would read the P-o-D on re-entry to the listen loop. That
allows a child to handle/complete an incoming request and then die (rather
than accept a request, then throw it out because it saw "die").

In your scenario, the "handle request" completes awfully quickly :-), then
the P-o-D is read, and bye-bye.

> > An external client can get the thing to wake up, but they could do that
> > anything by simply connecting to the port. So we're no worse off.
> >
> > Now the question is: do all TCP stacks support OOB sending/receiving? I know
> > that the implementations are definitely different from one place to another
> > when it comes to OOB.
> 
> See above, it doesn't matter.  :-)

You bet :-)

> I think this is a MUCH better implementation than we currently have.  This
> also works for all MPMs, which means that any MPM can use S_L_U_A.

Yup. We could probably have a few standard functions for opening, reading,
and handling the P-o-D. The MPMs could then just do:

    void *pod_ctx;

    pod_ctx = ap_mpm_pod_open(...);

    ...

    if (ap_mpm_pod_check(pod_ctx))
        worker_should_exit();

    ...

    ap_mpm_pod_close(pod_ctx);

Although... actually, that last call just appears in worker_should_exit().

Not sure what sort of error returns and stuff would go into pod_check. I'm
thinking none: it tells the work to continue or to exit (i.e. a simple
boolean return flag). Internally, it handles any errors and makes a call on
what to tell the worker.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: idle server processes not going away

Posted by rb...@covalent.net.

On Fri, 1 Jun 2001, Greg Stein wrote:

> On Fri, Jun 01, 2001 at 02:13:55PM -0700, rbb@covalent.net wrote:
> >...
> > I agree, but the problem now is how to solve
> > SINGLE_LISTEN_UNSERIALIZED_ACCEPT, and still respect graceful stop
> > requests for the child processes.  I still think that the best way to do
> > this, is to send OOB data to the child process through the same port that
> > the child has always listened to.  That allows the child to be woken up
> > out of the select call, and we can still use the signals for graceless
> > stopping of child processes.  Of course, this leads us open to DOS, if
> > done poorly.
>
> Nah... that isn't a DOS. The OOB just wakes the kid up. Then it should check
> a flag (dunno how; just talking thru it).
>
> If the OOB is the "die" flag, then you're right: total DOS and it wouldn't
> be workable.

ahhh....   yep, that would solve the DOS problem.

> Hmm. Maybe it would still have a P-o-D but not select/block on it. When it
> gets woken up with OOB data, *then* it would do a non-blocking read on the
> pipe. If something is there, then it dies.

That would work.  In reality if we do this, then we don't even need to use
OOB for the socket.  If we connect and close immediately, that wakes the
child up and it then checks the P-o-D.  This is no different than a client
connecting and never sending anything.

> An external client can get the thing to wake up, but they could do that
> anything by simply connecting to the port. So we're no worse off.
>
> Now the question is: do all TCP stacks support OOB sending/receiving? I know
> that the implementations are definitely different from one place to another
> when it comes to OOB.

See above, it doesn't matter.  :-)

I think this is a MUCH better implementation than we currently have.  This
also works for all MPMs, which means that any MPM can use S_L_U_A.

++1!

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: idle server processes not going away

Posted by Greg Stein <gs...@lyra.org>.

On Fri, Jun 01, 2001 at 02:13:55PM -0700, rbb@covalent.net wrote:
>...
> I agree, but the problem now is how to solve
> SINGLE_LISTEN_UNSERIALIZED_ACCEPT, and still respect graceful stop
> requests for the child processes.  I still think that the best way to do
> this, is to send OOB data to the child process through the same port that
> the child has always listened to.  That allows the child to be woken up
> out of the select call, and we can still use the signals for graceless
> stopping of child processes.  Of course, this leads us open to DOS, if
> done poorly.

Nah... that isn't a DOS. The OOB just wakes the kid up. Then it should check
a flag (dunno how; just talking thru it).

If the OOB is the "die" flag, then you're right: total DOS and it wouldn't
be workable.

Hmm. Maybe it would still have a P-o-D but not select/block on it. When it
gets woken up with OOB data, *then* it would do a non-blocking read on the
pipe. If something is there, then it dies.

An external client can get the thing to wake up, but they could do that
anything by simply connecting to the port. So we're no worse off.

Now the question is: do all TCP stacks support OOB sending/receiving? I know
that the implementations are definitely different from one place to another
when it comes to OOB.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: idle server processes not going away

Posted by rb...@covalent.net.

> > > 2) we make sure we wake up when the
> > > signal handler returns.  Do you have a third suggestion?
> > >
> > > I'll post a patch in the next few minutes for solution 2.  It seems to
> > > work fine.
> >
> > I think the only way to really solve this, is to look at how the longjmp()
> > was used in 1.3 to ensure we died correctly.  I would also suggest looking
> > at how threaded dies.  If the real problem is the signal handling, then I
> > would suggest that Dean was correct about signals and daemons, and we
> > should just remove all singals and use the pipe_of_death for all Unix
> > MPMs.
>
> Dean has always been right about signals. They are really poor ways to
> communicate with code.
>
> The pipe_of_death is really the way to go.

I agree, but the problem now is how to solve
SINGLE_LISTEN_UNSERIALIZED_ACCEPT, and still respect graceful stop
requests for the child processes.  I still think that the best way to do
this, is to send OOB data to the child process through the same port that
the child has always listened to.  That allows the child to be woken up
out of the select call, and we can still use the signals for graceless
stopping of child processes.  Of course, this leads us open to DOS, if
done poorly.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: idle server processes not going away

Posted by Greg Stein <gs...@lyra.org>.

On Fri, Jun 01, 2001 at 12:18:50PM -0700, rbb@covalent.net wrote:
>...
> > As I see it, prefork wants to use signals as the way to tell child
> > processes to go away.  Doing much in the signal handler is
> > problematic.  Instead we need to 1) longjmp() from the signal handler to
> > a safe spot and exit from there or

You cannot longjmp() out of a signal handler. That is sure to cause
problems. I don't think you'd run for long if you did that.

> > 2) we make sure we wake up when the
> > signal handler returns.  Do you have a third suggestion?
> >
> > I'll post a patch in the next few minutes for solution 2.  It seems to
> > work fine.
> 
> I think the only way to really solve this, is to look at how the longjmp()
> was used in 1.3 to ensure we died correctly.  I would also suggest looking
> at how threaded dies.  If the real problem is the signal handling, then I
> would suggest that Dean was correct about signals and daemons, and we
> should just remove all singals and use the pipe_of_death for all Unix
> MPMs.

Dean has always been right about signals. They are really poor ways to
communicate with code.

The pipe_of_death is really the way to go.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: idle server processes not going away

Posted by rb...@covalent.net.

On 1 Jun 2001, Jeff Trawick wrote:

> <rb...@covalent.net> writes:
>
> > I think the only way to really solve this, is to look at how the longjmp()
> > was used in 1.3 to ensure we died correctly.  I would also suggest looking
> > at how threaded dies.  If the real problem is the signal handling, then I
> > would suggest that Dean was correct about signals and daemons, and we
> > should just remove all singals and use the pipe_of_death for all Unix
> > MPMs.
>
> I understand how the pipe of death works...  Having to look at it all
> the time is a performance issue :)

It isn't a performance issue if it is done the way that Greg and I have
been talking about.  It is only an issue if we do blocking reads.

> > Assuming that locks are interuptable is not a good idea IMHO.
>
> unclear what you mean...  all supported platforms have interruptible
> locks...

Your message specifically says that pthread locks are not interruptible.
If we are using pthread cross-process locks, then they aren't
interruptible.

> do you mean "I don't want to kill a child process that is waiting for
> a mutex?"

I mean, I don't like the whole signal mechanism that we have setup.  I
like the model Greg and I have outlined.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: idle server processes not going away

Posted by Jeff Trawick <tr...@bellsouth.net>.

<rb...@covalent.net> writes:

> I think the only way to really solve this, is to look at how the longjmp()
> was used in 1.3 to ensure we died correctly.  I would also suggest looking
> at how threaded dies.  If the real problem is the signal handling, then I
> would suggest that Dean was correct about signals and daemons, and we
> should just remove all singals and use the pipe_of_death for all Unix
> MPMs.

I understand how the pipe of death works...  Having to look at it all
the time is a performance issue :)

> Assuming that locks are interuptable is not a good idea IMHO.

unclear what you mean...  all supported platforms have interruptible
locks...

do you mean "I don't want to kill a child process that is waiting for
a mutex?"

-- 
Jeff Trawick | trawickj@bellsouth.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: idle server processes not going away

Posted by rb...@covalent.net.

> > As for the problem that Jeff is describing, this is a problem with trying
> > to make the old Apache 1.3 code fit the MPM model without paying enough
> > attention to the flow of the code.  Take a look at the 1.3 code, it uses
> > longjmp to make sure it is always executing the correct code.  The 2.0
> > code tries to use return codes from APR functions.  But, the basic code
> > still looks like the 1.3 code.  A few months ago, child processes were
> > morphing into parent processes when we tried to kill them off.  To fix
> > this, I modified some of the code to exit at the right time.  I believe
> > about a month later, Paul Reder made a similar change.  IMNSHO, this can
> > be solved only by actually taking the time to trace through the prefork
> > MPM, and figure out what is happening, and fixing the bugs.  BTW, the
> > threaded and perchild MPMs are incredibly similar to the prefork MPM in
> > this respect.
>
> I'm not sure what you mean by "fixing the bugs."
>
> As I see it, prefork wants to use signals as the way to tell child
> processes to go away.  Doing much in the signal handler is
> problematic.  Instead we need to 1) longjmp() from the signal handler to
> a safe spot and exit from there or 2) we make sure we wake up when the
> signal handler returns.  Do you have a third suggestion?
>
> I'll post a patch in the next few minutes for solution 2.  It seems to
> work fine.

I think the only way to really solve this, is to look at how the longjmp()
was used in 1.3 to ensure we died correctly.  I would also suggest looking
at how threaded dies.  If the real problem is the signal handling, then I
would suggest that Dean was correct about signals and daemons, and we
should just remove all singals and use the pipe_of_death for all Unix
MPMs.

Assuming that locks are interuptable is not a good idea IMHO.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------