You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Greg Stein <gs...@lyra.org> on 2002/09/25 04:06:04 UTC

CGI bucket needed

Just ran into an interesting bug, and I've got a proposal for a way to solve
it, too. (no code tho :-)

If a CGI writes to stderr [more than the pipe's buffer has room for], then
it will block on that write. Meanwhile, when Apache goes to deliver the CGI
output to the network, it will *block* on a read from the CGI's output.

See the deadlock yet? :-)

The CGI can't generate output because it needs the write-to-stderr to
complete. Apache can't drain stderr until the read-from-stdout completes. In
fact, Apache won't even drain stderr until the CGI is *done* (it must empty
the PIPE bucket passed into the output filters).

Eventually, the deadlock resolves itself when the read from the PIPE bucket
times out.

[ this read behavior occurs in the C-L filter ]

[ NOTE: it appears this behavior is a regression from Apache 1.3. In 1.3, we
  just hook stderr into the error log. In 2.0, we manually read lines, then
  log them (with timestamps) ]


I believe the solution is to create a new CGI bucket type. The read()
function would read from stdout, similar to a normal PIPE bucket (e.g.
create a new HEAP bucket with the results). However, the bucket *also* holds
the stderr pipe from the CGI script. When you do a bucket read(), it
actually blocks on both pipes. If data comes in from stderr, then it drains
it and sends that to the error log. Data that comes in from stdout is
handled normally.

This system allows you to keep stderr drained, yet still provide for
standard PIPE style operation on the stdout pipe.

Thoughts?

Anybody adventurous enough to code it? :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: CGI bucket needed

Posted by Aaron Bannert <aa...@clove.org>.

On Tue, Sep 24, 2002 at 07:06:04PM -0700, Greg Stein wrote:
> Just ran into an interesting bug, and I've got a proposal for a way to solve
> it, too. (no code tho :-)
> 
> If a CGI writes to stderr [more than the pipe's buffer has room for], then
> it will block on that write. Meanwhile, when Apache goes to deliver the CGI
> output to the network, it will *block* on a read from the CGI's output.
> 
> See the deadlock yet? :-)
> 
> The CGI can't generate output because it needs the write-to-stderr to
> complete. Apache can't drain stderr until the read-from-stdout completes. In
> fact, Apache won't even drain stderr until the CGI is *done* (it must empty
> the PIPE bucket passed into the output filters).
> 
> Eventually, the deadlock resolves itself when the read from the PIPE bucket
> times out.
> 
> [ this read behavior occurs in the C-L filter ]
> 
> [ NOTE: it appears this behavior is a regression from Apache 1.3. In 1.3, we
>   just hook stderr into the error log. In 2.0, we manually read lines, then
>   log them (with timestamps) ]
> 
> 
> I believe the solution is to create a new CGI bucket type. The read()
> function would read from stdout, similar to a normal PIPE bucket (e.g.
> create a new HEAP bucket with the results). However, the bucket *also* holds
> the stderr pipe from the CGI script. When you do a bucket read(), it
> actually blocks on both pipes. If data comes in from stderr, then it drains
> it and sends that to the error log. Data that comes in from stdout is
> handled normally.

Yuck. I think if we had the ability to multiplex brigades then we'd
have a more elegant approach. There are other similiar possible
deadlocks with CGI that would all be solved with a general purpose
apr_brigade_poll().

-aaron

Re: CGI bucket needed

Posted by Greg Stein <gs...@lyra.org>.

On Tue, Sep 24, 2002 at 10:54:50PM -0400, rbb@apache.org wrote:
> On Tue, 24 Sep 2002, Greg Stein wrote:
> 
> > Just ran into an interesting bug, and I've got a proposal for a way to solve
> > it, too. (no code tho :-)
> > 
> > If a CGI writes to stderr [more than the pipe's buffer has room for], then
> > it will block on that write. Meanwhile, when Apache goes to deliver the CGI
> > output to the network, it will *block* on a read from the CGI's output.
> > 
> > See the deadlock yet? :-)
> > 
> > The CGI can't generate output because it needs the write-to-stderr to
> > complete. Apache can't drain stderr until the read-from-stdout completes. In
> > fact, Apache won't even drain stderr until the CGI is *done* (it must empty
> > the PIPE bucket passed into the output filters).
> > 
> > Eventually, the deadlock resolves itself when the read from the PIPE bucket
> > times out.
> > 
> > [ this read behavior occurs in the C-L filter ]
> > 
> > [ NOTE: it appears this behavior is a regression from Apache 1.3. In 1.3, we
> >   just hook stderr into the error log. In 2.0, we manually read lines, then
> >   log them (with timestamps) ]
> 
> Is there a reason we don't go back to what 1.3 did?  That would seem to be
> the easiest way to solve this problem.  I am pretty sure that the reason
> this was changed originally, was that the first version of apr_proc_create
> couldn't do what 1.3 did.  Although we should double check on that.

I've gotta say that I like the current behavior where each line of the error
log is timestamped. In the old code, the CGI program could spam anything
there, in any format.

As far as I can tell, the 1.3 CGI system relied on Apache's stderr pointing
to the error log. We would need to do that, and let CGI's share the same
stderr, or maybe somehow get the fd that we use for the error log and hook
that into the CGI's stderr.

Note that hooking stderr like that also obviates log levels and syslog
logging. It eliminates the timestamp,s annotation of the client IP and
other, similar bits.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: CGI bucket needed

Posted by rb...@apache.org.

On Tue, 24 Sep 2002, Greg Stein wrote:

> Just ran into an interesting bug, and I've got a proposal for a way to solve
> it, too. (no code tho :-)
> 
> If a CGI writes to stderr [more than the pipe's buffer has room for], then
> it will block on that write. Meanwhile, when Apache goes to deliver the CGI
> output to the network, it will *block* on a read from the CGI's output.
> 
> See the deadlock yet? :-)
> 
> The CGI can't generate output because it needs the write-to-stderr to
> complete. Apache can't drain stderr until the read-from-stdout completes. In
> fact, Apache won't even drain stderr until the CGI is *done* (it must empty
> the PIPE bucket passed into the output filters).
> 
> Eventually, the deadlock resolves itself when the read from the PIPE bucket
> times out.
> 
> [ this read behavior occurs in the C-L filter ]
> 
> [ NOTE: it appears this behavior is a regression from Apache 1.3. In 1.3, we
>   just hook stderr into the error log. In 2.0, we manually read lines, then
>   log them (with timestamps) ]

Is there a reason we don't go back to what 1.3 did?  That would seem to be
the easiest way to solve this problem.  I am pretty sure that the reason
this was changed originally, was that the first version of apr_proc_create
couldn't do what 1.3 did.  Although we should double check on that.

Ryan

Re: CGI bucket needed

Posted by Brian Pane <br...@cnet.com>.

On Tue, 2002-09-24 at 21:43, William A. Rowe, Jr. wrote:

> The thought is, if we call pass_brigade or read_brigade in a 
> non-blocking mode, and nothing can be processed, we get back
> a bucket containing a pollset of some sort (either multiple handles
> assembled by all the intervening filters, or multiple pollfd items
> each in their own bucket, each appended by the read.)
> 
> It would then be possible to assemble a pollset for CGI processing,
> dealing with all of the intervening filters.

Where would you poll the pollset?  If the poll is in the
handler, there's still a risk of deadlock when the handler
passes data to the output filter stack (specifically, there's
a risk of deadlock in the C-L filter).  But if the poll is
in the filters, it will be difficult to get the right code
invoked when a pollfd is signaled (because the necessary
code might be in a completely different filter).

Brian

Re: CGI bucket needed

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 09:06 PM 9/24/2002, Greg Stein wrote:

>I believe the solution is to create a new CGI bucket type. The read()
>function would read from stdout, similar to a normal PIPE bucket (e.g.
>create a new HEAP bucket with the results). However, the bucket *also* holds
>the stderr pipe from the CGI script. When you do a bucket read(), it
>actually blocks on both pipes. If data comes in from stderr, then it drains
>it and sends that to the error log. Data that comes in from stdout is
>handled normally.

I believe a better answer is to finally allow polling on a filter chain.
This isn't as trivial (perhaps) but it's certainly more useful.

Consider a POST request to a mod_ext_filter parsed CGI document...

The input chain is...

 socket read from client
  \-- pipe write to ext_filter_in stdin
       \-- pipe read from ext_filter_in stdout
            \-- pipe write to cgi stdin

The output chain is...

 pipe read from cgi stdout
  \-- pipe write to ext_filter stdin
       \-- pipe read to ext_filter stdout
            \-- socket write to client

All the while we have...

 pipe read from cgi stderr
  \-- file write to errlog

Lost track of what we are blocking on yet?  You certainly can,
take the ext_filter'ed cgi stdout.  Is there data waiting on the
cgi stdout, or are we blocked on pipe read from the ext_filter's
stdout?  If it's the later, is it a long-running ext_filter taking several
seconds to generate output, or does it need more data pumped to
it's stdin in order to formulate more output?

In the nasty example above, we truly have up to ten different items
to poll at the same time.  Pretty harry.

The thought is, if we call pass_brigade or read_brigade in a 
non-blocking mode, and nothing can be processed, we get back
a bucket containing a pollset of some sort (either multiple handles
assembled by all the intervening filters, or multiple pollfd items
each in their own bucket, each appended by the read.)

It would then be possible to assemble a pollset for CGI processing,
dealing with all of the intervening filters.

I think your idea is an interesting alternative, but I doubt it really
handles the breadth of these cases, so it would be an awful lot
of effort that ultimately would be undone.

Bill