You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2003/01/28 18:21:53 UTC

Altogether Broken OtherChild logic

I belive I've deciphered the "RotateLogs doesn't work for access logs
on Windows" Apache 2.0.44 bug.  It's actually many bugs in conformance.

First, rbb's reorg of the WinNT pipe logic (apr/file_io/win32/pipe.c rev 1.46) 
causes server/log.c ap_open_piped_log() to create an async (nonblocking) 
pipe.  This was very cool for internal pipes.  However, console apps that are
unprepared for Overlapped IO (read: all of them) will die given an Overlapped 
pipe handle.  Even APR doesn't assume (and shouldn't) that a pipe handle
passed as stdin/stdout is an Async handle.  However, Async handles are
the only way to do timeouts.

Unix ap_file_pipe_create() always creates the pipes blocking.  In fixing the
bug, by creating blocking handles (that can't be timed out) for Win32, I also
discovered our nonblocking pipe logic was reversed for parent/child handles 
of the child stdin process. I've just committed all those fixes, so this aspect 
was patched as well.  If you look at the new apr_create_nt_pipe() API, it 
might be worth renaming and exporting it as apr_file_pipe_create_ex().
It's similar to apr_file_pipe_create() with the added blocking mode argument
(APR_FULL_BLOCK, APR_FULL_NONBLOCK, APR_READ_BLOCK and
APR_WRITE_BLOCK).  APR_PARENT/CHILD_BLOCK only makes sense
to a function like apr_procattr_io_set.  In fact, for safety, that creation call
should never create NONBLOCK child pipes, but that's another discussion.

So part one was fixed, now we create the access log's rotatelogs process
with a good stdin stream.  It survives to log an entry, but only until we call 
apr_proc_other_child_check within the WinNT MPM's *maintenence* loop.

It seems that apr_proc_other_child_check is only being used during teardown
of the Unix children.  It also seems that the 'correct' flag passed to the maint
function for your registered otherchild is APR_OC_REASON_RESTART when
the process is still running.

Now mod_log_config kills the child process on APR_OC_REASON_RESTART.
Because Win32 doesn't call the function in maintenance like unix, but in a one 
second health-check loop, we are blowing away our access logger process.

Finally, it looks like apr_proc_other_child_read is the function we *really* wanted
to use within the health check.  But it seems all of these apr_proc_other_child
functions are really misdocumented within APR.  Would someone step up and
spell out exactly what they are *supposed* to be doing within unix, and then we
can discuss how to make them portable to Win32?  It seems we have too much
bubblegum and bailing wire holding them together, and the fixes that I made to
do *exactly* what Unix was doing has killed the WinNT mpm.

Bill



Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 09:11 AM 1/31/2003, Bill Stoddard wrote:

>>As for your question about polling, if we cycle every second we waste
>>cpu - if we sample every few seconds we lose more log entries etc.
>>If we receive alerts when the otherchild processes die we can react
>>immediately without the extra loops.

>In principle I agree but I am not sure the extra complexity of your proposed solution is worthwhile for implementing reliable piped logs.  I really hate complex solutions to simple problems.  Complexity makes the code more difficult to debug and maintain and raises the entry barrier for new folks interested in joining the project.   I often hear the argument for a complex solution in favor of a simpler solution because the complex solution "might be useful for other applications" or is "more extensible", etc.  This is a good line of argument and is quite often true, but not always.  It -is- possible to over engineer (biggie size :-) software.  I'll happily review whatever you come up with, so party on dude.

I agree 100% - complexity increases bugs - barrier for entry (new users trying
to grok the schema) etc, etc.

Let's review those issues we can't dispell:

  * Both Win32 and Unix will register other_child processes.

  * Unix can discover any dying process from the 'process tree' - those
    created by this process.  Win32's handle on this issue is really poor
    (all puns intended.)  So Unix today and in the future will use the
    old apr_proc_other_child_read() (renamed to _died) to 'check' off
    that process.  It can return APR_SUCCESS if we have a 'hit' that
    the process is an other child, or APR_xxx status when we miss
    and other_child doesn't know about the given PID.

For this purpose, I'm proposing apr_proc_other_child_alert_one(),
passing the apr_proc_t, reason and status.  So we deprecate the
poorly named _read() flavor with:

/** @deprecated */
apr_status_t apr_proc_other_child_read(apr_proc_t *pid, int status)
{
    return apr_proc_other_child_alert_one(pid, APR_OC_REASON_DEATH, status);
}

Now we can send one of many alerts (reason) to the given OC maintainance
function :-)


  * Unix calls apr_proc_other_child_check() to warn everyone that we
    are shutting down, and we need to maintain the behavior (but not
    the very misleading function name.)  

  * Win32 needs a function to check the health of the processes without
    signaling them that we are restarting.

SO... I'm proposing apr_proc_other_child_refresh_all which will go through
the entire list, and send the desired reason code to all running processes'
maintenance functions.  So we deprecate the _check() fn with:

/** @deprecated */
void apr_proc_other_child_check(void)
{
    apr_proc_other_child_refresh_all(APR_OC_REASON_RESTART)
}

And we add a new reason to the maint callback for the health check
and then call apr_proc_other_child_refresh_all(APR_OC_REASON_CHECK)
for Win32's MPM.

Sound like a possible plan?

Bill  


Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 09:11 AM 1/31/2003, Bill Stoddard wrote:

>>As for your question about polling, if we cycle every second we waste
>>cpu - if we sample every few seconds we lose more log entries etc.
>>If we receive alerts when the otherchild processes die we can react
>>immediately without the extra loops.

>In principle I agree but I am not sure the extra complexity of your proposed solution is worthwhile for implementing reliable piped logs.  I really hate complex solutions to simple problems.  Complexity makes the code more difficult to debug and maintain and raises the entry barrier for new folks interested in joining the project.   I often hear the argument for a complex solution in favor of a simpler solution because the complex solution "might be useful for other applications" or is "more extensible", etc.  This is a good line of argument and is quite often true, but not always.  It -is- possible to over engineer (biggie size :-) software.  I'll happily review whatever you come up with, so party on dude.

I agree 100% - complexity increases bugs - barrier for entry (new users trying
to grok the schema) etc, etc.

Let's review those issues we can't dispell:

  * Both Win32 and Unix will register other_child processes.

  * Unix can discover any dying process from the 'process tree' - those
    created by this process.  Win32's handle on this issue is really poor
    (all puns intended.)  So Unix today and in the future will use the
    old apr_proc_other_child_read() (renamed to _died) to 'check' off
    that process.  It can return APR_SUCCESS if we have a 'hit' that
    the process is an other child, or APR_xxx status when we miss
    and other_child doesn't know about the given PID.

For this purpose, I'm proposing apr_proc_other_child_alert_one(),
passing the apr_proc_t, reason and status.  So we deprecate the
poorly named _read() flavor with:

/** @deprecated */
apr_status_t apr_proc_other_child_read(apr_proc_t *pid, int status)
{
    return apr_proc_other_child_alert_one(pid, APR_OC_REASON_DEATH, status);
}

Now we can send one of many alerts (reason) to the given OC maintainance
function :-)


  * Unix calls apr_proc_other_child_check() to warn everyone that we
    are shutting down, and we need to maintain the behavior (but not
    the very misleading function name.)  

  * Win32 needs a function to check the health of the processes without
    signaling them that we are restarting.

SO... I'm proposing apr_proc_other_child_refresh_all which will go through
the entire list, and send the desired reason code to all running processes'
maintenance functions.  So we deprecate the _check() fn with:

/** @deprecated */
void apr_proc_other_child_check(void)
{
    apr_proc_other_child_refresh_all(APR_OC_REASON_RESTART)
}

And we add a new reason to the maint callback for the health check
and then call apr_proc_other_child_refresh_all(APR_OC_REASON_CHECK)
for Win32's MPM.

Sound like a possible plan?

Bill  


Re: Altogether Broken OtherChild logic

Posted by Bill Stoddard <bi...@wstoddard.com>.
So I 've actually spent some time looking at this...

>apr_proc_other_child_check() on Unix came first, afaict.
>
Right you are.

>Now we're left with ery simple problem.  OC works on Unix today, and 
>it's broken on Win32.  Unix's logic is well exercised by a larger group,
>WinNT's by a much smaller minority.  I trust their logic, but the names
>don't match up.
>
>I'm just going to try to wire it all back together such that Unix (working) 
>stays working, and Win32 is fixed for 2.0.45 within the next few days.
>
>Yes - it's bogus they called a fn apr_proc_other_child_check() as 
>*THE* restart signal.  But renaming that fn now is probably safer.
>
I agree.

>As for your question about polling, if we cycle every second we waste
>cpu - if we sample every few seconds we lose more log entries etc.
>If we receive alerts when the otherchild processes die we can react
>immediately without the extra loops.
>
>Bill
>
In principle I agree but I am not sure the extra complexity of your 
proposed solution is worthwhile for implementing reliable piped logs.  I 
really hate complex solutions to simple problems.  Complexity makes the 
code more difficult to debug and maintain and raises the entry barrier 
for new folks interested in joining the project.   I often hear the 
argument for a complex solution in favor of a simpler solution because 
the complex solution "might be useful for other applications" or is 
"more extensible", etc.  This is a good line of argument and is quite 
often true, but not always.  It -is- possible to over engineer (biggie 
size :-) software.  I'll happily review whatever you come up with, so 
party on dude.

Bill


Re: Altogether Broken OtherChild logic

Posted by Bill Stoddard <bi...@wstoddard.com>.
So I 've actually spent some time looking at this...

>apr_proc_other_child_check() on Unix came first, afaict.
>
Right you are.

>Now we're left with ery simple problem.  OC works on Unix today, and 
>it's broken on Win32.  Unix's logic is well exercised by a larger group,
>WinNT's by a much smaller minority.  I trust their logic, but the names
>don't match up.
>
>I'm just going to try to wire it all back together such that Unix (working) 
>stays working, and Win32 is fixed for 2.0.45 within the next few days.
>
>Yes - it's bogus they called a fn apr_proc_other_child_check() as 
>*THE* restart signal.  But renaming that fn now is probably safer.
>
I agree.

>As for your question about polling, if we cycle every second we waste
>cpu - if we sample every few seconds we lose more log entries etc.
>If we receive alerts when the otherchild processes die we can react
>immediately without the extra loops.
>
>Bill
>
In principle I agree but I am not sure the extra complexity of your 
proposed solution is worthwhile for implementing reliable piped logs.  I 
really hate complex solutions to simple problems.  Complexity makes the 
code more difficult to debug and maintain and raises the entry barrier 
for new folks interested in joining the project.   I often hear the 
argument for a complex solution in favor of a simpler solution because 
the complex solution "might be useful for other applications" or is 
"more extensible", etc.  This is a good line of argument and is quite 
often true, but not always.  It -is- possible to over engineer (biggie 
size :-) software.  I'll happily review whatever you come up with, so 
party on dude.

Bill


Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 04:47 PM 1/30/2003, Bill Stoddard wrote:

>>No, that is working fine.  It is whacking it because I modified the code in 
>>_check() to do exactly the same thing on Win32 as it does on Unix. 

>Humm... perhaps you got the cart before the horse...  if i recall correctly, I think I created the _check function specificly to handle reliable piped logs on Windows.  Someone (perhaps me I don't recall) dropped some code into the Unix branch to do the same thing. My point... I think the unix side was broken and you made a mistake by assuming the unix side was correct and modeling the windoes code after it.

And I very much agree of the 'correctness' of your original code [ :-) ]
But nice try...

Revision 1.24 / <otherchild.c?rev=1.24&content-type=text/vnd.htm>(view) - <otherchild.c?annotate=1.htm>annotate - <otherchild.c?r1=1.htm>[select for diffs] , Thu May 17 12:20:01 2001 UTC (20 months, 2 weeks ago) by stoddard 
Branch: <otherchild.htm>MAIN 
CVS Tags: <otherchild.htm>APACHE_2_0_25, <otherchild.htm>APACHE_2_0_24, <otherchild.htm>APACHE_2_0_23, <otherchild.htm>APACHE_2_0_22, <otherchild.htm>APACHE_2_0_21, <otherchild.htm>APACHE_2_0_20, <otherchild.htm>APACHE_2_0_19, <otherchild.htm>APACHE_2_0_18 
Changes since 1.23: +39 -1 lines 
Diff to <otherchild.c.diff?r1=1.23&r2=1.htm>previous 1.23 (<otherchild.c.diff?r1=1.23&r2=1.htm>colored) 
Add APR_HAS_OTHER_CHILD support to Win32

apr_proc_other_child_check() on Unix came first, afaict.

Now we're left with ery simple problem.  OC works on Unix today, and 
it's broken on Win32.  Unix's logic is well exercised by a larger group,
WinNT's by a much smaller minority.  I trust their logic, but the names
don't match up.

I'm just going to try to wire it all back together such that Unix (working) 
stays working, and Win32 is fixed for 2.0.45 within the next few days.

Yes - it's bogus they called a fn apr_proc_other_child_check() as 
*THE* restart signal.  But renaming that fn now is probably safer.

As for your question about polling, if we cycle every second we waste
cpu - if we sample every few seconds we lose more log entries etc.
If we receive alerts when the otherchild processes die we can react
immediately without the extra loops.

Bill



Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 04:47 PM 1/30/2003, Bill Stoddard wrote:

>>No, that is working fine.  It is whacking it because I modified the code in 
>>_check() to do exactly the same thing on Win32 as it does on Unix. 

>Humm... perhaps you got the cart before the horse...  if i recall correctly, I think I created the _check function specificly to handle reliable piped logs on Windows.  Someone (perhaps me I don't recall) dropped some code into the Unix branch to do the same thing. My point... I think the unix side was broken and you made a mistake by assuming the unix side was correct and modeling the windoes code after it.

And I very much agree of the 'correctness' of your original code [ :-) ]
But nice try...

Revision 1.24 / <otherchild.c?rev=1.24&content-type=text/vnd.htm>(view) - <otherchild.c?annotate=1.htm>annotate - <otherchild.c?r1=1.htm>[select for diffs] , Thu May 17 12:20:01 2001 UTC (20 months, 2 weeks ago) by stoddard 
Branch: <otherchild.htm>MAIN 
CVS Tags: <otherchild.htm>APACHE_2_0_25, <otherchild.htm>APACHE_2_0_24, <otherchild.htm>APACHE_2_0_23, <otherchild.htm>APACHE_2_0_22, <otherchild.htm>APACHE_2_0_21, <otherchild.htm>APACHE_2_0_20, <otherchild.htm>APACHE_2_0_19, <otherchild.htm>APACHE_2_0_18 
Changes since 1.23: +39 -1 lines 
Diff to <otherchild.c.diff?r1=1.23&r2=1.htm>previous 1.23 (<otherchild.c.diff?r1=1.23&r2=1.htm>colored) 
Add APR_HAS_OTHER_CHILD support to Win32

apr_proc_other_child_check() on Unix came first, afaict.

Now we're left with ery simple problem.  OC works on Unix today, and 
it's broken on Win32.  Unix's logic is well exercised by a larger group,
WinNT's by a much smaller minority.  I trust their logic, but the names
don't match up.

I'm just going to try to wire it all back together such that Unix (working) 
stays working, and Win32 is fixed for 2.0.45 within the next few days.

Yes - it's bogus they called a fn apr_proc_other_child_check() as 
*THE* restart signal.  But renaming that fn now is probably safer.

As for your question about polling, if we cycle every second we waste
cpu - if we sample every few seconds we lose more log entries etc.
If we receive alerts when the otherchild processes die we can react
immediately without the extra loops.

Bill



Re: Altogether Broken OtherChild logic

Posted by Bill Stoddard <bi...@wstoddard.com>.
>
>
>  
>
>  
>
>>I am guessing that the windows MPM is whacking the piped logger because 
>>ocr->proc->hproc is somehow hosed.
>>    
>>
>
>No, that is working fine.  It is whacking it because I modified the code in _check()
>to do exactly the same thing on Win32 as it does on Unix. 
>
Humm... perhaps you got the cart before the horse...  if i recall 
correctly, I think I created the _check function specificly to handle 
reliable piped logs on Windows.  Someone (perhaps me I don't recall) 
dropped some code into the Unix branch to do the same thing. My point... 
I think the unix side was broken and you made a mistake by assuming the 
unix side was correct and modeling the windoes code after it.


Bill


Re: Altogether Broken OtherChild logic

Posted by Bill Stoddard <bi...@wstoddard.com>.
>
>
>  
>
>  
>
>>I am guessing that the windows MPM is whacking the piped logger because 
>>ocr->proc->hproc is somehow hosed.
>>    
>>
>
>No, that is working fine.  It is whacking it because I modified the code in _check()
>to do exactly the same thing on Win32 as it does on Unix. 
>
Humm... perhaps you got the cart before the horse...  if i recall 
correctly, I think I created the _check function specificly to handle 
reliable piped logs on Windows.  Someone (perhaps me I don't recall) 
dropped some code into the Unix branch to do the same thing. My point... 
I think the unix side was broken and you made a mistake by assuming the 
unix side was correct and modeling the windoes code after it.


Bill


Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 01:30 PM 1/30/2003, Bill Stoddard wrote:
>William A. Rowe, Jr. wrote:
>
>>I belive I've deciphered the "RotateLogs doesn't work for access logs
>>on Windows" Apache 2.0.44 bug.  It's actually many bugs in conformance.
>>
>>Finally, it looks like apr_proc_other_child_read is the function we *really* wanted
>>to use within the health check.  But it seems all of these apr_proc_other_child
>>functions are really misdocumented within APR.  Would someone step up and
>>spell out exactly what they are *supposed* to be doing within unix, and then we
>>can discuss how to make them portable to Win32?  It seems we have too much
>>bubblegum and bailing wire holding them together, and the fixes that I made to
>>do *exactly* what Unix was doing has killed the WinNT mpm.

>What we want to do in the winnt MPM maintenance loop is peridically check 
>to see of the process is still alive. It it is dead, it needs to be restarted (ie, reliable 
>piped logs). apr_proc_other_child_read was used in the Unix side of the house 
>to do a periodic read of the pipe to an OC.  On Unix, you can tell if the process 
>on the other end is still alive by doing a read on the pipe. Cannot do that with pipes 
>on Windows, so we use apr_proc_other_child_check instead.  

We never check the either end of OC pipes on Unix today in Apache 2.0.
Because we 'hang on' to the pipe and send that same pipe to the next 
generation of child, we catch the death of the child instead.

However, the dying child is caught by apr_proc_wait_all_procs().  This is
bubbled to ap_wait_or_timeout which bubbles the offending PID to the
run_mpm main loop.

The apr_proc_wait_all_procs() is simply a apr_proc_wait(-1) flavor of the same.
Win32 could do the very same thing.  However, I'm considering whether or not
to drive this; will WaitForMultipleObjects (suffering the usual issues of 64 wait 
events) or if it might make more sense to setup RegisterWaitForSingleObject 
on each child process handle.  This may give us most similar results to Unix.

Bug number one, in my mind, was the abuse of the function that was named
apr_proc_other_child_read().  That fn must be called apr_proc_other_child_died().
Only the name of that function changes.

The second abused function was...

>I think we need one function (call it what you will but apr_proc_other_child_check 
>seems okay to me) that checks the status of an OC and performs an action 
>(specified on the call to apr_proc_other_child_check) based on the status.  

The *existing* apr_proc_other_child_check doesn't do at all what you describe on
Unix.  We only call it from the mpm and it's only as we are shutting down or
restarting.  *That* function is misnamed, I'm thinking apr_proc_other_child_restart().
This function would continue to be called as the MPM recycles.

The *new* function you describe (it didn't exist today) would have been nicely named
_check() but that would be a namespace clash.  So I'm thinking _refresh() or some
such that would check the health of the other children and invoke their callbacks only
if something bad has happened.  This might even be on a periodic basis on Unix, if
we discover that some platforms aren't reliably reporting apr_proc_other_child_died()
due to apr_proc_wait_all_procs() missing some signals.

>I am guessing that the windows MPM is whacking the piped logger because 
>ocr->proc->hproc is somehow hosed.

No, that is working fine.  It is whacking it because I modified the code in _check()
to do exactly the same thing on Win32 as it does on Unix.  (Yes, on unix we whack
the children from mod_log_config each restart or shutdown.  Good?  Maybe not but
until this behaves correctly it remains a useful test case for me.)  So now Win32
would whack the child once per second based on WinNT MPM's calls to _check().

And the code in _check() - al la _restart(), the code in _read() al la _died(), and the 
new code in a _refresh() must behave the same between Unix and Win32, or nobody 
will ever successfully code portable modules.

So, renaming _check() to restart() and adding the _refresh() we will safely be able
to wander between the two platforms and not run into this again.  Sure, _check()
sounded like a good place to put the logic you added, but it was entirely different
in purpose from the Unix code.  That's unacceptable.  I fault the docs, again, for
not providing sufficient details.

Sure, the MPM details can be different, but they should be *converging* around
common functions, not divirging wildly.  If my theory on RegisterWaitForSingleObject
to signal apr_proc_wait_all_procs() actually works, we might just see that convergence.

I started this thread not to ask what Win32 was doing, but what the essential design
from Unix was.  Once I understand that, I stand a real chance to get Win32 identical
or so similar you don't notice the discrepancies.  This may involve an extra function,
then again it might not.  

But Unix sure didn't agree with the documented apr_thread_proc.h documentation, 
and that's what really drove me nuts these few days as I continue to try untangling
the problem.

Bill



Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 12:37 PM 1/30/2003, Bill Stoddard wrote:
>This stuff kinda worked on Windows in the past.  I need to dig some but I seem to recall that it was basically impossible to do the exact same thing in Windows as you do in Unix. The other_child_read in Unix will not (and cannot) work the same way under Unix.

It's fine if they diverge when necessary.  It's not fair to overload functions
with two different meanings for two platforms :-)  Let me play with the
RegisterWait*() API in Win32 and see if we can't register that other child
handle to invoke the registered callback upon death.  We might even end
up -cleaner- than the Unix implementation, without the need for extra
polling in the MPM :-)

Bill 


Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 12:37 PM 1/30/2003, Bill Stoddard wrote:
>This stuff kinda worked on Windows in the past.  I need to dig some but I seem to recall that it was basically impossible to do the exact same thing in Windows as you do in Unix. The other_child_read in Unix will not (and cannot) work the same way under Unix.

It's fine if they diverge when necessary.  It's not fair to overload functions
with two different meanings for two platforms :-)  Let me play with the
RegisterWait*() API in Win32 and see if we can't register that other child
handle to invoke the registered callback upon death.  We might even end
up -cleaner- than the Unix implementation, without the need for extra
polling in the MPM :-)

Bill 


Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 01:30 PM 1/30/2003, Bill Stoddard wrote:
>William A. Rowe, Jr. wrote:
>
>>I belive I've deciphered the "RotateLogs doesn't work for access logs
>>on Windows" Apache 2.0.44 bug.  It's actually many bugs in conformance.
>>
>>Finally, it looks like apr_proc_other_child_read is the function we *really* wanted
>>to use within the health check.  But it seems all of these apr_proc_other_child
>>functions are really misdocumented within APR.  Would someone step up and
>>spell out exactly what they are *supposed* to be doing within unix, and then we
>>can discuss how to make them portable to Win32?  It seems we have too much
>>bubblegum and bailing wire holding them together, and the fixes that I made to
>>do *exactly* what Unix was doing has killed the WinNT mpm.

>What we want to do in the winnt MPM maintenance loop is peridically check 
>to see of the process is still alive. It it is dead, it needs to be restarted (ie, reliable 
>piped logs). apr_proc_other_child_read was used in the Unix side of the house 
>to do a periodic read of the pipe to an OC.  On Unix, you can tell if the process 
>on the other end is still alive by doing a read on the pipe. Cannot do that with pipes 
>on Windows, so we use apr_proc_other_child_check instead.  

We never check the either end of OC pipes on Unix today in Apache 2.0.
Because we 'hang on' to the pipe and send that same pipe to the next 
generation of child, we catch the death of the child instead.

However, the dying child is caught by apr_proc_wait_all_procs().  This is
bubbled to ap_wait_or_timeout which bubbles the offending PID to the
run_mpm main loop.

The apr_proc_wait_all_procs() is simply a apr_proc_wait(-1) flavor of the same.
Win32 could do the very same thing.  However, I'm considering whether or not
to drive this; will WaitForMultipleObjects (suffering the usual issues of 64 wait 
events) or if it might make more sense to setup RegisterWaitForSingleObject 
on each child process handle.  This may give us most similar results to Unix.

Bug number one, in my mind, was the abuse of the function that was named
apr_proc_other_child_read().  That fn must be called apr_proc_other_child_died().
Only the name of that function changes.

The second abused function was...

>I think we need one function (call it what you will but apr_proc_other_child_check 
>seems okay to me) that checks the status of an OC and performs an action 
>(specified on the call to apr_proc_other_child_check) based on the status.  

The *existing* apr_proc_other_child_check doesn't do at all what you describe on
Unix.  We only call it from the mpm and it's only as we are shutting down or
restarting.  *That* function is misnamed, I'm thinking apr_proc_other_child_restart().
This function would continue to be called as the MPM recycles.

The *new* function you describe (it didn't exist today) would have been nicely named
_check() but that would be a namespace clash.  So I'm thinking _refresh() or some
such that would check the health of the other children and invoke their callbacks only
if something bad has happened.  This might even be on a periodic basis on Unix, if
we discover that some platforms aren't reliably reporting apr_proc_other_child_died()
due to apr_proc_wait_all_procs() missing some signals.

>I am guessing that the windows MPM is whacking the piped logger because 
>ocr->proc->hproc is somehow hosed.

No, that is working fine.  It is whacking it because I modified the code in _check()
to do exactly the same thing on Win32 as it does on Unix.  (Yes, on unix we whack
the children from mod_log_config each restart or shutdown.  Good?  Maybe not but
until this behaves correctly it remains a useful test case for me.)  So now Win32
would whack the child once per second based on WinNT MPM's calls to _check().

And the code in _check() - al la _restart(), the code in _read() al la _died(), and the 
new code in a _refresh() must behave the same between Unix and Win32, or nobody 
will ever successfully code portable modules.

So, renaming _check() to restart() and adding the _refresh() we will safely be able
to wander between the two platforms and not run into this again.  Sure, _check()
sounded like a good place to put the logic you added, but it was entirely different
in purpose from the Unix code.  That's unacceptable.  I fault the docs, again, for
not providing sufficient details.

Sure, the MPM details can be different, but they should be *converging* around
common functions, not divirging wildly.  If my theory on RegisterWaitForSingleObject
to signal apr_proc_wait_all_procs() actually works, we might just see that convergence.

I started this thread not to ask what Win32 was doing, but what the essential design
from Unix was.  Once I understand that, I stand a real chance to get Win32 identical
or so similar you don't notice the discrepancies.  This may involve an extra function,
then again it might not.  

But Unix sure didn't agree with the documented apr_thread_proc.h documentation, 
and that's what really drove me nuts these few days as I continue to try untangling
the problem.

Bill



Re: Altogether Broken OtherChild logic

Posted by Bill Stoddard <bi...@wstoddard.com>.
William A. Rowe, Jr. wrote:

>I belive I've deciphered the "RotateLogs doesn't work for access logs
>on Windows" Apache 2.0.44 bug.  It's actually many bugs in conformance.
>
>First, rbb's reorg of the WinNT pipe logic (apr/file_io/win32/pipe.c rev 1.46) 
>causes server/log.c ap_open_piped_log() to create an async (nonblocking) 
>pipe.  This was very cool for internal pipes.  However, console apps that are
>unprepared for Overlapped IO (read: all of them) will die given an Overlapped 
>pipe handle.  Even APR doesn't assume (and shouldn't) that a pipe handle
>passed as stdin/stdout is an Async handle.  However, Async handles are
>the only way to do timeouts.
>
>Unix ap_file_pipe_create() always creates the pipes blocking.  In fixing the
>bug, by creating blocking handles (that can't be timed out) for Win32, I also
>discovered our nonblocking pipe logic was reversed for parent/child handles 
>of the child stdin process. I've just committed all those fixes, so this aspect 
>was patched as well.  If you look at the new apr_create_nt_pipe() API, it 
>might be worth renaming and exporting it as apr_file_pipe_create_ex().
>It's similar to apr_file_pipe_create() with the added blocking mode argument
>(APR_FULL_BLOCK, APR_FULL_NONBLOCK, APR_READ_BLOCK and
>APR_WRITE_BLOCK).  APR_PARENT/CHILD_BLOCK only makes sense
>to a function like apr_procattr_io_set.  In fact, for safety, that creation call
>should never create NONBLOCK child pipes, but that's another discussion.
>
>So part one was fixed, now we create the access log's rotatelogs process
>with a good stdin stream.  It survives to log an entry, but only until we call 
>apr_proc_other_child_check within the WinNT MPM's *maintenence* loop.
>
>It seems that apr_proc_other_child_check is only being used during teardown
>of the Unix children.  It also seems that the 'correct' flag passed to the maint
>function for your registered otherchild is APR_OC_REASON_RESTART when
>the process is still running.
>
>Now mod_log_config kills the child process on APR_OC_REASON_RESTART.
>Because Win32 doesn't call the function in maintenance like unix, but in a one 
>second health-check loop, we are blowing away our access logger process.
>
>Finally, it looks like apr_proc_other_child_read is the function we *really* wanted
>to use within the health check.  But it seems all of these apr_proc_other_child
>functions are really misdocumented within APR.  Would someone step up and
>spell out exactly what they are *supposed* to be doing within unix, and then we
>can discuss how to make them portable to Win32?  It seems we have too much
>bubblegum and bailing wire holding them together, and the fixes that I made to
>do *exactly* what Unix was doing has killed the WinNT mpm.
>
>Bill
>
This stuff kinda worked on Windows in the past.  I need to dig some but 
I seem to recall that it was basically impossible to do the exact same 
thing in Windows as you do in Unix. The other_child_read in Unix will 
not (and cannot) work the same way under Unix.

Bill

>  
>


Re: Altogether Broken OtherChild logic

Posted by Bill Stoddard <bi...@wstoddard.com>.
William A. Rowe, Jr. wrote:

>At 02:03 PM 1/30/2003, Jeff Trawick wrote:
>  
>
>>wrowe wrote:
>>
>>    
>>
>>>>Finally, it looks like apr_proc_other_child_read is the function we
>>>>*really* wanted
>>>>to use within the health check.  But it seems all of these
>>>>apr_proc_other_child
>>>>functions are really misdocumented within APR.  Would someone step up and
>>>>spell out exactly what they are *supposed* to be doing within unix,
>>>>and then we
>>>>can discuss how to make them portable to Win32?
>>>>        
>>>>
>>apr_proc_other_child_read() is called when the monitor (e.g., MPM) knows that a process has died; it passes the apr_proc_t describing the dead process to apr_proc_other_child_read(), and if that dead process was a registered "other child", then the maintenance function for that other child is called with APR_OC_REASON_DEATH
>>    
>>
>
>Good, we agree on what this function is doing.
>
>  
>
>>sounds pretty hokey to me; maybe apr_proc_wait() could call the maintenance function automatically if a newly-deceased process was registered as "other child"
>>    
>>
>
>I'm simply thinking of renaming it apr_proc_other_child_died() and documenting
>it correctly.  Step two is determining how I can then identify and report that
>case in the WinNT MPM, or automagically as a callback.
>  
>
What's wrong with MPMs polling OCs?  Simple, efficient, effective and 
has served httpd 1.3 well.  Let's not biggie size. I -hate- biggie size 
when a simpler solution is just as effective.

Bill



Re: Altogether Broken OtherChild logic

Posted by Bill Stoddard <bi...@wstoddard.com>.
William A. Rowe, Jr. wrote:

>At 02:03 PM 1/30/2003, Jeff Trawick wrote:
>  
>
>>wrowe wrote:
>>
>>    
>>
>>>>Finally, it looks like apr_proc_other_child_read is the function we
>>>>*really* wanted
>>>>to use within the health check.  But it seems all of these
>>>>apr_proc_other_child
>>>>functions are really misdocumented within APR.  Would someone step up and
>>>>spell out exactly what they are *supposed* to be doing within unix,
>>>>and then we
>>>>can discuss how to make them portable to Win32?
>>>>        
>>>>
>>apr_proc_other_child_read() is called when the monitor (e.g., MPM) knows that a process has died; it passes the apr_proc_t describing the dead process to apr_proc_other_child_read(), and if that dead process was a registered "other child", then the maintenance function for that other child is called with APR_OC_REASON_DEATH
>>    
>>
>
>Good, we agree on what this function is doing.
>
>  
>
>>sounds pretty hokey to me; maybe apr_proc_wait() could call the maintenance function automatically if a newly-deceased process was registered as "other child"
>>    
>>
>
>I'm simply thinking of renaming it apr_proc_other_child_died() and documenting
>it correctly.  Step two is determining how I can then identify and report that
>case in the WinNT MPM, or automagically as a callback.
>  
>
What's wrong with MPMs polling OCs?  Simple, efficient, effective and 
has served httpd 1.3 well.  Let's not biggie size. I -hate- biggie size 
when a simpler solution is just as effective.

Bill



Re: Altogether Broken OtherChild logic

Posted by Jeff Trawick <tr...@attglobal.net>.
William A. Rowe, Jr. wrote:

> I'm simply thinking of renaming it apr_proc_other_child_died() and 
> documenting
> it correctly.  Step two is determining how I can then identify and 
> report that
> case in the WinNT MPM, or automagically as a callback.


How about rename it to

apr_proc_other_child_some_random_process_died_please_see_if_you_give_a_shit()

That's more accurate.  The caller doesn't know if it was registered as 
"other child" or not.

I'd rather have apr_proc_other_child_check() take an optional apr_proc_t 
(or have flavor called apr_proc_other_child_check_one()) and have it see 
if it cares about that process and if so see if it is still alive. 
Seems more useful.  Yeah, an extra rare syscall for some Apache MPMs but 
I feel that the function as it is just doesn't have enough generic 
usefulness to keep around.

Or just have apr_proc_wait() take care of calling any necessary other 
child maintenance function itself (if it finds out some process died and 
it was registered as other child).



Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@apache.org>.
At 02:03 PM 1/30/2003, Jeff Trawick wrote:
>wrowe wrote:
>
>>> Finally, it looks like apr_proc_other_child_read is the function we
>>> *really* wanted
>>> to use within the health check.  But it seems all of these
>>> apr_proc_other_child
>>> functions are really misdocumented within APR.  Would someone step up and
>>> spell out exactly what they are *supposed* to be doing within unix,
>>> and then we
>>> can discuss how to make them portable to Win32?
>
>
>apr_proc_other_child_read() is called when the monitor (e.g., MPM) knows that a process has died; it passes the apr_proc_t describing the dead process to apr_proc_other_child_read(), and if that dead process was a registered "other child", then the maintenance function for that other child is called with APR_OC_REASON_DEATH

Good, we agree on what this function is doing.

>sounds pretty hokey to me; maybe apr_proc_wait() could call the maintenance function automatically if a newly-deceased process was registered as "other child"

I'm simply thinking of renaming it apr_proc_other_child_died() and documenting
it correctly.  Step two is determining how I can then identify and report that
case in the WinNT MPM, or automagically as a callback.

The struggle is aprizing the discrepancies between winNT and the other 
mpm_run loops, given that Win32 doesn't have signals.  But I'm working 
on that, too.

Bill


Re: Altogether Broken OtherChild logic

Posted by Jeff Trawick <tr...@attglobal.net>.
wrowe wrote:

> > Finally, it looks like apr_proc_other_child_read is the function we
> > *really* wanted
> > to use within the health check.  But it seems all of these
> > apr_proc_other_child
> > functions are really misdocumented within APR.  Would someone step 
> up and
> > spell out exactly what they are *supposed* to be doing within unix,
> > and then we
> > can discuss how to make them portable to Win32?


apr_proc_other_child_read() is called when the monitor (e.g., MPM) knows 
that a process has died; it passes the apr_proc_t describing the dead 
process to apr_proc_other_child_read(), and if that dead process was a 
registered "other child", then the maintenance function for that other 
child is called with APR_OC_REASON_DEATH

sounds pretty hokey to me; maybe apr_proc_wait() could call the 
maintenance function automatically if a newly-deceased process was 
registered as "other child"


Re: Altogether Broken OtherChild logic

Posted by Bill Stoddard <bi...@wstoddard.com>.
William A. Rowe, Jr. wrote:

>I belive I've deciphered the "RotateLogs doesn't work for access logs
>on Windows" Apache 2.0.44 bug.  It's actually many bugs in conformance.
>
>First, rbb's reorg of the WinNT pipe logic (apr/file_io/win32/pipe.c rev 1.46) 
>causes server/log.c ap_open_piped_log() to create an async (nonblocking) 
>pipe.  This was very cool for internal pipes.  However, console apps that are
>unprepared for Overlapped IO (read: all of them) will die given an Overlapped 
>pipe handle.  Even APR doesn't assume (and shouldn't) that a pipe handle
>passed as stdin/stdout is an Async handle.  However, Async handles are
>the only way to do timeouts.
>
>Unix ap_file_pipe_create() always creates the pipes blocking.  In fixing the
>bug, by creating blocking handles (that can't be timed out) for Win32, I also
>discovered our nonblocking pipe logic was reversed for parent/child handles 
>of the child stdin process. I've just committed all those fixes, so this aspect 
>was patched as well.  If you look at the new apr_create_nt_pipe() API, it 
>might be worth renaming and exporting it as apr_file_pipe_create_ex().
>It's similar to apr_file_pipe_create() with the added blocking mode argument
>(APR_FULL_BLOCK, APR_FULL_NONBLOCK, APR_READ_BLOCK and
>APR_WRITE_BLOCK).  APR_PARENT/CHILD_BLOCK only makes sense
>to a function like apr_procattr_io_set.  In fact, for safety, that creation call
>should never create NONBLOCK child pipes, but that's another discussion.
>
>So part one was fixed, now we create the access log's rotatelogs process
>with a good stdin stream.  It survives to log an entry, but only until we call 
>apr_proc_other_child_check within the WinNT MPM's *maintenence* loop.
>
>It seems that apr_proc_other_child_check is only being used during teardown
>of the Unix children.  It also seems that the 'correct' flag passed to the maint
>function for your registered otherchild is APR_OC_REASON_RESTART when
>the process is still running.
>
>Now mod_log_config kills the child process on APR_OC_REASON_RESTART.
>Because Win32 doesn't call the function in maintenance like unix, but in a one 
>second health-check loop, we are blowing away our access logger process.
>
>Finally, it looks like apr_proc_other_child_read is the function we *really* wanted
>to use within the health check.  But it seems all of these apr_proc_other_child
>functions are really misdocumented within APR.  Would someone step up and
>spell out exactly what they are *supposed* to be doing within unix, and then we
>can discuss how to make them portable to Win32?  It seems we have too much
>bubblegum and bailing wire holding them together, and the fixes that I made to
>do *exactly* what Unix was doing has killed the WinNT mpm.
>
>Bill
>
What we want to do in the winnt MPM maintenance loop is peridically 
check to see of the process is still alive. It it is dead, it needs to 
be restarted (ie, reliable piped logs). apr_proc_other_child_read was 
used in the Unix side of the house to do a periodic read of the pipe to 
an OC.  On Unix, you can tell if the process on the other end is still 
alive by doing a read on the pipe. Cannot do that with pipes on Windows, 
so we use apr_proc_other_child_check instead.  I think we need one 
function (call it what you will but apr_proc_other_child_check seems 
okay to me) that checks the status of an OC and performs an action 
(specified on the call to apr_proc_other_child_check) based on the 
status.  I am guessing that the windows MPM is whacking the piped logger 
because ocr->proc->hproc is somehow hosed.

Bill



Re: Altogether Broken OtherChild logic

Posted by Bill Stoddard <bi...@wstoddard.com>.
William A. Rowe, Jr. wrote:

>I belive I've deciphered the "RotateLogs doesn't work for access logs
>on Windows" Apache 2.0.44 bug.  It's actually many bugs in conformance.
>
>First, rbb's reorg of the WinNT pipe logic (apr/file_io/win32/pipe.c rev 1.46) 
>causes server/log.c ap_open_piped_log() to create an async (nonblocking) 
>pipe.  This was very cool for internal pipes.  However, console apps that are
>unprepared for Overlapped IO (read: all of them) will die given an Overlapped 
>pipe handle.  Even APR doesn't assume (and shouldn't) that a pipe handle
>passed as stdin/stdout is an Async handle.  However, Async handles are
>the only way to do timeouts.
>
>Unix ap_file_pipe_create() always creates the pipes blocking.  In fixing the
>bug, by creating blocking handles (that can't be timed out) for Win32, I also
>discovered our nonblocking pipe logic was reversed for parent/child handles 
>of the child stdin process. I've just committed all those fixes, so this aspect 
>was patched as well.  If you look at the new apr_create_nt_pipe() API, it 
>might be worth renaming and exporting it as apr_file_pipe_create_ex().
>It's similar to apr_file_pipe_create() with the added blocking mode argument
>(APR_FULL_BLOCK, APR_FULL_NONBLOCK, APR_READ_BLOCK and
>APR_WRITE_BLOCK).  APR_PARENT/CHILD_BLOCK only makes sense
>to a function like apr_procattr_io_set.  In fact, for safety, that creation call
>should never create NONBLOCK child pipes, but that's another discussion.
>
>So part one was fixed, now we create the access log's rotatelogs process
>with a good stdin stream.  It survives to log an entry, but only until we call 
>apr_proc_other_child_check within the WinNT MPM's *maintenence* loop.
>
>It seems that apr_proc_other_child_check is only being used during teardown
>of the Unix children.  It also seems that the 'correct' flag passed to the maint
>function for your registered otherchild is APR_OC_REASON_RESTART when
>the process is still running.
>
>Now mod_log_config kills the child process on APR_OC_REASON_RESTART.
>Because Win32 doesn't call the function in maintenance like unix, but in a one 
>second health-check loop, we are blowing away our access logger process.
>
>Finally, it looks like apr_proc_other_child_read is the function we *really* wanted
>to use within the health check.  But it seems all of these apr_proc_other_child
>functions are really misdocumented within APR.  Would someone step up and
>spell out exactly what they are *supposed* to be doing within unix, and then we
>can discuss how to make them portable to Win32?  It seems we have too much
>bubblegum and bailing wire holding them together, and the fixes that I made to
>do *exactly* what Unix was doing has killed the WinNT mpm.
>
>Bill
>
What we want to do in the winnt MPM maintenance loop is peridically 
check to see of the process is still alive. It it is dead, it needs to 
be restarted (ie, reliable piped logs). apr_proc_other_child_read was 
used in the Unix side of the house to do a periodic read of the pipe to 
an OC.  On Unix, you can tell if the process on the other end is still 
alive by doing a read on the pipe. Cannot do that with pipes on Windows, 
so we use apr_proc_other_child_check instead.  I think we need one 
function (call it what you will but apr_proc_other_child_check seems 
okay to me) that checks the status of an OC and performs an action 
(specified on the call to apr_proc_other_child_check) based on the 
status.  I am guessing that the windows MPM is whacking the piped logger 
because ocr->proc->hproc is somehow hosed.

Bill



Re: Altogether Broken OtherChild logic

Posted by Bill Stoddard <bi...@wstoddard.com>.
William A. Rowe, Jr. wrote:

>I belive I've deciphered the "RotateLogs doesn't work for access logs
>on Windows" Apache 2.0.44 bug.  It's actually many bugs in conformance.
>
>First, rbb's reorg of the WinNT pipe logic (apr/file_io/win32/pipe.c rev 1.46) 
>causes server/log.c ap_open_piped_log() to create an async (nonblocking) 
>pipe.  This was very cool for internal pipes.  However, console apps that are
>unprepared for Overlapped IO (read: all of them) will die given an Overlapped 
>pipe handle.  Even APR doesn't assume (and shouldn't) that a pipe handle
>passed as stdin/stdout is an Async handle.  However, Async handles are
>the only way to do timeouts.
>
>Unix ap_file_pipe_create() always creates the pipes blocking.  In fixing the
>bug, by creating blocking handles (that can't be timed out) for Win32, I also
>discovered our nonblocking pipe logic was reversed for parent/child handles 
>of the child stdin process. I've just committed all those fixes, so this aspect 
>was patched as well.  If you look at the new apr_create_nt_pipe() API, it 
>might be worth renaming and exporting it as apr_file_pipe_create_ex().
>It's similar to apr_file_pipe_create() with the added blocking mode argument
>(APR_FULL_BLOCK, APR_FULL_NONBLOCK, APR_READ_BLOCK and
>APR_WRITE_BLOCK).  APR_PARENT/CHILD_BLOCK only makes sense
>to a function like apr_procattr_io_set.  In fact, for safety, that creation call
>should never create NONBLOCK child pipes, but that's another discussion.
>
>So part one was fixed, now we create the access log's rotatelogs process
>with a good stdin stream.  It survives to log an entry, but only until we call 
>apr_proc_other_child_check within the WinNT MPM's *maintenence* loop.
>
>It seems that apr_proc_other_child_check is only being used during teardown
>of the Unix children.  It also seems that the 'correct' flag passed to the maint
>function for your registered otherchild is APR_OC_REASON_RESTART when
>the process is still running.
>
>Now mod_log_config kills the child process on APR_OC_REASON_RESTART.
>Because Win32 doesn't call the function in maintenance like unix, but in a one 
>second health-check loop, we are blowing away our access logger process.
>
>Finally, it looks like apr_proc_other_child_read is the function we *really* wanted
>to use within the health check.  But it seems all of these apr_proc_other_child
>functions are really misdocumented within APR.  Would someone step up and
>spell out exactly what they are *supposed* to be doing within unix, and then we
>can discuss how to make them portable to Win32?  It seems we have too much
>bubblegum and bailing wire holding them together, and the fixes that I made to
>do *exactly* what Unix was doing has killed the WinNT mpm.
>
>Bill
>
This stuff kinda worked on Windows in the past.  I need to dig some but 
I seem to recall that it was basically impossible to do the exact same 
thing in Windows as you do in Unix. The other_child_read in Unix will 
not (and cannot) work the same way under Unix.

Bill

>  
>


Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
A perfect example; how does one go from this;

/**
 * Check on the specified process.  If it is gone, call the maintenance 
 * function.
 * @param pid The process to check.
 * @param status The status to pass to the maintenance function.
 */
APR_DECLARE(apr_status_t) apr_proc_other_child_read(apr_proc_t *pid, int status);


to this?

APR_DECLARE(apr_status_t) apr_proc_other_child_read(apr_proc_t *pid, int status)
{
    apr_other_child_rec_t *ocr, *nocr;

    for (ocr = other_children; ocr; ocr = nocr) {
        nocr = ocr->next;
        if (ocr->proc->pid != pid->pid)
            continue;

        ocr->proc = NULL;
        (*ocr->maintenance) (APR_OC_REASON_DEATH, ocr->data, status);
        return 0;
    }
    return APR_CHILD_NOTDONE;
}


Re: Altogether Broken OtherChild logic

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Another quick example; APR_OC_REASON_UNWRITABLE is entirely
unimplemented according to my quick grep of the entire code base.

Bill