You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Cliff Woolley <cl...@yahoo.com> on 2001/06/08 01:38:49 UTC

byterange filter + apr_brigade_partition

Hey...

I'm finally getting around to applying the six-month-old patch to change
the prototype for apr_brigade_partition() to this:

APU_DECLARE(apr_status_t) apr_brigade_partition(apr_bucket_brigade *b,
                                                apr_off_t point,
                                                apr_bucket **after_point)

This is so that the caller can have more information when a failure
occurs, and was at the request of Greg Stein when the function first came
into existence.  The patch just slipped through the cracks and was
forgotten since then.

Anyway, I'll have to patch the byterange filter in Apache to account for
the change, and I'm looking for input on what the correct action is when a
failure actually occurs.  Right now, Apache doesn't test for a failure to
partition (which would have been indicated by a NULL return value) at all.
So in the code below, what would be the right thing to stick in the "XXX:
handle error" spot?

Thanks,
Cliff



Index: http_protocol.c
===================================================================
RCS file: /home/cvs/httpd-2.0/modules/http/http_protocol.c,v
retrieving revision 1.324
diff -u -d -r1.324 http_protocol.c
--- http_protocol.c	2001/06/06 19:30:49	1.324
+++ http_protocol.c	2001/06/07 23:31:39
@@ -2317,6 +2317,7 @@
            (rv = parse_byterange(current, clength, &range_start, &range_end))) {
         apr_bucket *e2;
         apr_bucket *ec;
+        apr_status_t rvstart, rvend;

         if (rv == -1) {
             continue;
@@ -2339,8 +2340,11 @@
             APR_BRIGADE_INSERT_TAIL(bsend, e);
         }

-        e = apr_brigade_partition(bb, range_start);
-        e2 = apr_brigade_partition(bb, range_end + 1);
+        rvstart = apr_brigade_partition(bb, range_start, &e);
+        rvend   = apr_brigade_partition(bb, range_end + 1, &e2);
+        if ((rvstart != APR_SUCCESS) || (rvend != APR_SUCCESS)) {
+            /* XXX: handle error */
+        }

         ec = e;
         do {


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: byterange filter + apr_brigade_partition

Posted by Cliff Woolley <cl...@yahoo.com>.

On Fri, 8 Jun 2001, Greg Stein wrote:

> > read on the (former) PIPE/SOCKET buckets was going to fail, it would have
> > failed earlier in the call to apr_brigade_length().  Right?
>
> Good point. (and let's also clarify to ourselves that we're just talking
> about the bytrange filter's use of partition; other partition users will
> still need to be wary of error return values)

Absolutely.  No doubt about it.

> However... (and you knew that was coming :-)

;-]

> Not if the PIPE has a known length, but simply hasn't read the data yet
> (dunno if PIPE can support this; if not, then just say there is another
> bucket that does).

They do not... neither to SOCKET buckets.  The general rule has always
been that any bucket that is of pre-defined length will be either a
"shared" bucket or a "simple" bucket, meaning that they can be split
without being read by simply adjusting the start/length values.  The only
buckets that do not support this are buckets of indeterminate length (ie,
length == -1).  But I guess that could change later (don't currently see
how, but let's just say that somebody figures out a need for such a
beast).  In that case, you're right, this would be necessary.  So for
future-proofness, we'll just say that it could happen.  Fair enough.

> [ note: apr_brigade_partition() does not properly handle *failure* in a
>   split operation; it assumes anything but ENOTIMPL is success ]

The old version did, but the patched version doesn't:

-            if (apr_bucket_split(e, point) != APR_ENOTIMPL)
-                return APR_BUCKET_NEXT(e);
+            if ((rv = apr_bucket_split(e, point)) != APR_ENOTIMPL) {
+                *after_point = APR_BUCKET_NEXT(e);
+                return rv;
+            }

While success and non-ENOTIMPL failure are handled by the same case, the
actual rv is returned.  So if that's APR_EOF or something, it will get
passed back.  *after_point is set anyway as a side effect, but that
doesn't matter.  Good?

> In fact, I am going to be changing mod_dav's interface to the back-end
> repository to use bucket brigades(*). Let's say that a database backend
> inserts a bucket that will read a BLOB out of a database. It knows the
> length is N bytes, but it certainly doesn't want to read all that into
> memory right away. So when it *does* read, it could fail (e.g. the database
> connection timed out).

Good example!

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: byterange filter + apr_brigade_partition

Posted by Greg Stein <gs...@lyra.org>.

On Fri, Jun 08, 2001 at 03:10:07PM -0400, Cliff Woolley wrote:
> On Fri, 8 Jun 2001, Greg Stein wrote:
> 
> > > catastrophic happens between the calls to apr_brigade_length() and
> > > apr_brigade_partition().  That's why it worked fine before to just ignore
> > > the possibility of a NULL return from apr_brigade_partition()... it just
> > > shouldn't ever happen.
> >
> > It can definitely happen.
> >
> > Consider a brigade that contains a PIPE bucket from a CGI. When we go to
> > partition it, we read from the PIPE. Are you *sure* that read will never
> > return an error on us? :-)
> 
> But there won't be a PIPE bucket in the input brigade by the time it gets
> to the apr_brigade_partition() calls, because apr_brigade_length() will
> have already seen and read from any such buckets, morphing them to HEAP
> buckets.  All apr_brigade_partition() will ever see is the "plain"
> buckets, ie ones that you don't need to read from to operate on.  If the
> read on the (former) PIPE/SOCKET buckets was going to fail, it would have
> failed earlier in the call to apr_brigade_length().  Right?

Good point. (and let's also clarify to ourselves that we're just talking
about the bytrange filter's use of partition; other partition users will
still need to be wary of error return values)

However... (and you knew that was coming :-)

Not if the PIPE has a known length, but simply hasn't read the data yet
(dunno if PIPE can support this; if not, then just say there is another
bucket that does). Let's say you have some kind of protocol via pipe or a
socket, where the length is known, but we haven't read the data (into a heap
bucket yet). You insert a bucket with length == <whatever>. Thus, the
length() call works without a read() call.

Now the partition comes along and figures out that a split needs to occur
right in the middle of this PIPE' bucket. We need to read up to the split
point, and leave the after-split data sitting in the pipe/socket. Anyways...
that read could fail, meaning the split() would fail.

[ note: apr_brigade_partition() does not properly handle *failure* in a
  split operation; it assumes anything but ENOTIMPL is success ]

The end result is a partition() call that returns an error.

> > > If we do want to explicitly handle errors here, I think the way to do it
> > > is to move the calls to apr_brigade_partition() above the block that adds
> > > the range specifiers to the output stream and simply "continue;" if one
> > > of the calls fails for some wacky reason.
> >
> > Sure. We should also log an error because it means that something has broke
> > down during a bucket read. A failing CGI, for example.
> 
> I'll buy this... if you can convince me that these calls to
> apr_brigade_partition() will ever have a reason to read from any bucket.

See above :-)

In fact, I am going to be changing mod_dav's interface to the back-end
repository to use bucket brigades(*). Let's say that a database backend
inserts a bucket that will read a BLOB out of a database. It knows the
length is N bytes, but it certainly doesn't want to read all that into
memory right away. So when it *does* read, it could fail (e.g. the database
connection timed out).

[ strictly speaking, you're right: partition() won't *read* in these
  scenarios, but the split() can easily fail. ]

Cheers,
-g

(*) I already use this for DeltaV REPORT processing. I pass an ap_filter_t*
into the back-end, which then does some ap_fputstrs() to write into the
filter stack. I want to have repositories deliver their GET response into a
filter. Also, PROPFIND result generation can go into a filter rather than
accumulating in memory (ugh!) and then delivering it all. Should have a much
better working set this way, and reduce latency on the PROPFIND.

-- 
Greg Stein, http://www.lyra.org/

Re: byterange filter + apr_brigade_partition

Posted by Cliff Woolley <cl...@yahoo.com>.

On Fri, 8 Jun 2001, Greg Stein wrote:

> > catastrophic happens between the calls to apr_brigade_length() and
> > apr_brigade_partition().  That's why it worked fine before to just ignore
> > the possibility of a NULL return from apr_brigade_partition()... it just
> > shouldn't ever happen.
>
> It can definitely happen.
>
> Consider a brigade that contains a PIPE bucket from a CGI. When we go to
> partition it, we read from the PIPE. Are you *sure* that read will never
> return an error on us? :-)

But there won't be a PIPE bucket in the input brigade by the time it gets
to the apr_brigade_partition() calls, because apr_brigade_length() will
have already seen and read from any such buckets, morphing them to HEAP
buckets.  All apr_brigade_partition() will ever see is the "plain"
buckets, ie ones that you don't need to read from to operate on.  If the
read on the (former) PIPE/SOCKET buckets was going to fail, it would have
failed earlier in the call to apr_brigade_length().  Right?

> > If we do want to explicitly handle errors here, I think the way to do it
> > is to move the calls to apr_brigade_partition() above the block that adds
> > the range specifiers to the output stream and simply "continue;" if one
> > of the calls fails for some wacky reason.
>
> Sure. We should also log an error because it means that something has broke
> down during a bucket read. A failing CGI, for example.

I'll buy this... if you can convince me that these calls to
apr_brigade_partition() will ever have a reason to read from any bucket.
=-)  Still, I guess it couldn't hurt to have a "something really, really
bad happened" message get logged...

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: byterange filter + apr_brigade_partition

Posted by Greg Stein <gs...@lyra.org>.

On Fri, Jun 08, 2001 at 12:48:40PM -0400, Cliff Woolley wrote:
>...
> range-end where necessary.  So in theory, these particular calls to
> apr_brigade_partition() should never fail, unless something really
> catastrophic happens between the calls to apr_brigade_length() and
> apr_brigade_partition().  That's why it worked fine before to just ignore
> the possibility of a NULL return from apr_brigade_partition()... it just
> shouldn't ever happen.

It can definitely happen.

Consider a brigade that contains a PIPE bucket from a CGI. When we go to
partition it, we read from the PIPE. Are you *sure* that read will never
return an error on us? :-)

> If we do want to explicitly handle errors here, I think the way to do it
> is to move the calls to apr_brigade_partition() above the block that adds
> the range specifiers to the output stream and simply "continue;" if one
> of the calls fails for some wacky reason.

Sure. We should also log an error because it means that something has broke
down during a bucket read. A failing CGI, for example.

> I'm tempted to just keep on ignoring the [slim] possibility of an error in
> this case and to simply add a comment explaining these assumptions.
> Maybe an AP_DEBUG_ASSERT() on the return values to add future-proofness?

Definitely not an AP_DEBUG_ASSERT. Then we would never see anything out in
the real world. It will just mysteriously die.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: byterange filter + apr_brigade_partition

Posted by Cliff Woolley <cl...@yahoo.com>.

On Fri, 8 Jun 2001, Cliff Woolley wrote:

> There are special protocol rules for byteranges, though...  You're
> supposed to satisfy as many of the requested ranges as you can, IIRC...
> only if NONE of them are satisfiable is it really an error.  I think.  Is
> that right?  I guess I have to go read the damned RFC.  =-)

According to the RFC, you just ignore byteranges you can't satisfy [as
long as there's at least one that you can].  But a range-end past the end
of the entity does not imply non-satisfiability... you just adjust the end
to be length-1.

What I didn't realize is that all of this is already handled by the filter
beforehand, because it calls apr_brigade_length() to find out the absolute
length and parse_byteranges() handles the adjustments to range-start and
range-end where necessary.  So in theory, these particular calls to
apr_brigade_partition() should never fail, unless something really
catastrophic happens between the calls to apr_brigade_length() and
apr_brigade_partition().  That's why it worked fine before to just ignore
the possibility of a NULL return from apr_brigade_partition()... it just
shouldn't ever happen.

If we do want to explicitly handle errors here, I think the way to do it
is to move the calls to apr_brigade_partition() above the block that adds
the range specifiers to the output stream and simply "continue;" if one
of the calls fails for some wacky reason.

I'm tempted to just keep on ignoring the [slim] possibility of an error in
this case and to simply add a comment explaining these assumptions.
Maybe an AP_DEBUG_ASSERT() on the return values to add future-proofness?

Thoughts?

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: byterange filter + apr_brigade_partition

Posted by Cliff Woolley <cl...@yahoo.com>.

On Fri, 8 Jun 2001, Greg Stein wrote:

> Hah! Not even. I've been thinking about our filters quite a bit recently,
> and I was going to go and dig up this patch. *snicker*

I've got the apr-util end ready to commit... I'm just holding off until I
figure out what to do with Apache.  Should be today sometime.

> > So in the code below, what would be the right thing to stick in the "XXX:
> > handle error" spot?
>
> Dunno. Some kind of magic error bucket? Recovering from errors during filter
> processing is totally undefined. If you've sent data to the client, then
> there isn't much that you *can* do except to:
>
> *) log an error
> *) close the connection ("unexpectedly" for the client)

There are special protocol rules for byteranges, though...  You're
supposed to satisfy as many of the requested ranges as you can, IIRC...
only if NONE of them are satisfiable is it really an error.  I think.  Is
that right?  I guess I have to go read the damned RFC.  =-)

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: byterange filter + apr_brigade_partition

Posted by rb...@covalent.net.

> >From there... I remembered the partition status return code patch.
>
> > Anyway, I'll have to patch the byterange filter in Apache to account for
> > the change, and I'm looking for input on what the correct action is when a
> > failure actually occurs.  Right now, Apache doesn't test for a failure to
> > partition (which would have been indicated by a NULL return value) at all.
> > So in the code below, what would be the right thing to stick in the "XXX:
> > handle error" spot?
>
> Dunno. Some kind of magic error bucket? Recovering from errors during filter
> processing is totally undefined. If you've sent data to the client, then
> there isn't much that you *can* do except to:

Recovering from errors during filter processing is completely defined.  It
is VERY easy.

1)  If you have never passed data to a lower filter, then you can pass an
eror bucket.

2)  If you have passed data to a lower filter, then you log an error, and
return to the higher filter.  This means that as far as the client is
concerned, the connection closed prematurly, but that is basically what
would happen in 1.3 if there was a fatal error during request processing.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: byterange filter + apr_brigade_partition

Posted by Greg Stein <gs...@lyra.org>.

On Thu, Jun 07, 2001 at 07:38:49PM -0400, Cliff Woolley wrote:
> 
> Hey...
> 
> I'm finally getting around to applying the six-month-old patch to change
> the prototype for apr_brigade_partition() to this:
> 
> APU_DECLARE(apr_status_t) apr_brigade_partition(apr_bucket_brigade *b,
>                                                 apr_off_t point,
>                                                 apr_bucket **after_point)
> 
> This is so that the caller can have more information when a failure
> occurs, and was at the request of Greg Stein when the function first came
> into existence.

Right. The reason is because the partition function can call read() on a
bucket. That can definitely fail, so we need to return that to the caller.

> The patch just slipped through the cracks and was
> forgotten since then.

Hah! Not even. I've been thinking about our filters quite a bit recently,
and I was going to go and dig up this patch. *snicker*

Part of what led me to thinking about the patch is that we have a number of
new "write" functions related to filters/brigades that return a byte count.
That means they completely toss status information. And the byte count is
never useful. The caller can always determine the byte count (they passed
the dumb arguments!), and the "bytes written" only means what was written to
the next filter, not what goes onto the nextwork or anything else
meaningful. And our definition of filters is "write everything passed". You
can't do a partial write to a filter stack. The byte count return value is
legacy from the ap_r* functions.

>From there... I remembered the partition status return code patch.

> Anyway, I'll have to patch the byterange filter in Apache to account for
> the change, and I'm looking for input on what the correct action is when a
> failure actually occurs.  Right now, Apache doesn't test for a failure to
> partition (which would have been indicated by a NULL return value) at all.
> So in the code below, what would be the right thing to stick in the "XXX:
> handle error" spot?

Dunno. Some kind of magic error bucket? Recovering from errors during filter
processing is totally undefined. If you've sent data to the client, then
there isn't much that you *can* do except to:

*) log an error
*) close the connection ("unexpectedly" for the client)

I am not sure of the best way to effect this change. I don't think that we
have a mechanism to shut down a connection/request. What would be really
neat is to toss the filter stack and insert an ERROR filter that simply
returns APR_EDISCONNECTED (or somesuch) whenever it is called. Hopefully,
somebody will see that error and stop generating content :-)

Of course, tossing the filter stack while *in it* is problematic. The
upstream filter still has "f" in their stack frame, and f->next is still
useful (the pool is still around, after all). To some extent,
ap_pass_brigade() would need to check the connection status to assist with
rapid shutdown.
(guys doing f->next->frec->filter_func.out_func(...) are screwed, but they
 deserve it.)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/