You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by "Roy T. Fielding" <fi...@kiwi.ICS.UCI.EDU> on 2000/06/30 08:30:51 UTC

Re: what are the issues? (was: Re: Patch review: ...)

>Point is: the char* callback does exactly what an ioblock/bucket callback
>would do on its own when it must examine each byte.
>
>So, I will state again: the char* callback is not a problem. If you
>disagree, then please explain further.

There is a significant flaw in that argument.  char * doesn't do what we
want when a filter does not have to examine each byte.  That is the problem.

It doesn't make any sense to have two filter interfaces when you can
accomplish the same with one and a simple parameter conversion function.

>1) there is nothing in my framework for lists of buckets
>2) presume there is a new put_buckets() API for sending a list of buckets
>3) put_buckets() would iterate over the buckets, map them into a char*, and
>   call the callback for each one.

That would not be a solution. The purpose of passing a list of buckets around
is to linearize the call stack for the frequent case of filtered content
splitting one large bucket into separate buckets with filtered results
interspersed in between.  The effect is that a filter chain can frequently
process an entire message in one pass down the chain, which enables the
stream end to send the entire response in one go, which also allows it
to do interesting things like provide a content length by summing the
data length of all the buckets' data, and set a last-modified time
by picking the most recent time from a set of static file buckets.

I think it would help if we stopped using artificial examples.  Let's
try something simple:

       socket <-- http <-- add_footer <-- add_header <-- send_file

send_file calls its filter with an ap_file_t bucket and End-of-Stream (EOS)
in the bucket list.  add_header sets a flag, prepends another ap_file_t
bucket to the list and sends the list to its filter.  add_footer looks
at the list, finds the EOS, inserts another ap_file_t bucket in
front of the EOS, and sends the list on to its filter.  http walks through
the list picking up the (cached) stat values, notes the EOS and seeing
that its own flag for headers_sent is false, sets the cumulative metadata
and sends the header fields, followed by three calls to the kernel to
send out the three files using whatever mechanism is most efficient.

The point here isn't that this is the only way to implement filters.
The point is that no other interface can implement them as efficiently.
Not even close.  Yes, there are cases where string filters are just as
efficient as any other design, but there is no case in which they are
more efficient than bucket brigades.  The reason is that being able
to process a list of strings in one call more than offsets the extra
cost of list processing, regardless of the filter type, and allows
for additional features that have benefits for http processing.
Like, for example, being able to determine the entire set of resources
that make up the source of this dynamic resource without teaching every
filter about WebDAV.

In reference to some other messages, it isn't necessary for us to wait
for a content length -- HTTP/1.1 chunked does work just fine.  However,
that doesn't mean it isn't preferable to use content-length whenever
possible, since not all clients are HTTP/1.1 and most browsers are able
to present a better progress-bar if they have the content-length in
advance of the data.

....Roy

Re: what are the issues? (was: Re: Patch review: ...)

Posted by Ben Hyde <bh...@pobox.com>.

Roy's example of the need to sum up meta data (content length, mod time)
is one aspect of why I think, as he says, and ADT is a good thing in the
filter design.

I wish we were doing more design on this thing.  I wish I had more time.

There are three ADT I perceive in the design space:
 1. The filter elements (they need a place to store thier microtasking state).
 2. The buckets (we need a chunky spread, and a place to sum up meta data).
 3. Read/Write heads for the individual filter elements.
    (we need a vprintf, et. al. and we need an error protocol).

The holly grail is to find a way to get a bicycle that can evolve into
earth mover.  What makes that hard is that the bicycle needs to start
out with certain organs.  These appear to me to include:

All of this heat is probably for the best.  All that Apache does is
I/O so this is important.  Boy is it emotionally draining.

 - ben

"Roy T. Fielding" <fi...@kiwi.ICS.UCI.EDU> writes:
> That would not be a solution. The purpose of passing a list of buckets around
> is to linearize the call stack for the frequent case of filtered content
> splitting one large bucket into separate buckets with filtered results
> interspersed in between.  The effect is that a filter chain can frequently
> process an entire message in one pass down the chain, which enables the
> stream end to send the entire response in one go, which also allows it
> to do interesting things like provide a content length by summing the
> data length of all the buckets' data, and set a last-modified time
> by picking the most recent time from a set of static file buckets.
> 
> I think it would help if we stopped using artificial examples.  Let's
> try something simple:
> 
>        socket <-- http <-- add_footer <-- add_header <-- send_file
> 
> send_file calls its filter with an ap_file_t bucket and End-of-Stream (EOS)
> in the bucket list.  add_header sets a flag, prepends another ap_file_t
> bucket to the list and sends the list to its filter.  add_footer looks
> at the list, finds the EOS, inserts another ap_file_t bucket in
> front of the EOS, and sends the list on to its filter.  http walks through
> the list picking up the (cached) stat values, notes the EOS and seeing
> that its own flag for headers_sent is false, sets the cumulative metadata
> and sends the header fields, followed by three calls to the kernel to
> send out the three files using whatever mechanism is most efficient.

char* handler (was: what are the issues?)

Posted by Greg Stein <gs...@lyra.org>.

On Thu, Jun 29, 2000 at 11:30:51PM -0700, Roy T. Fielding wrote:
> >Point is: the char* callback does exactly what an ioblock/bucket callback
> >would do on its own when it must examine each byte.
> >
> >So, I will state again: the char* callback is not a problem. If you
> >disagree, then please explain further.
> 
> There is a significant flaw in that argument.  char * doesn't do what we
> want when a filter does not have to examine each byte.  That is the problem.
> 
> It doesn't make any sense to have two filter interfaces when you can
> accomplish the same with one and a simple parameter conversion function.

The char* handler is not suited for all filters. Granted. I've maintained
that it is simply a convenience for those filters that don't want to munge
through the bucket interface.

Consider the case where you need to crawl through all the bytes of the
content delivered into your filter (gzip, recoding, SSI, PHP, etc). Now
let's take the bucket-based interface from my patch:

  my_callback(filter, bucket)
  {
      ... how to process each character in the bucket? ...
  }

The processing gets a bit hairy, and it would be contained in all of the
each-char-walking filters. I believe Jim Winstead said something like:

    p = bucket->get_data()

Unfortunately, that isn't quite enough. If the bucket represents a file,
then you can't simply get a pointer to it (don't want to read it all into
memory, and maybe mmap is not present on the platform). This implies that
you will have a read-loop in your callback:

    read_context = prep_read_content(bucket);
    while (1) {
        p = bucket->get_more_data(read_context);

        ... process p ...
    }

Now, we can't just keep reading chunks of data from that flie endlessly
(into the heap), so we need some kind of buffer management:

    if (ctx->read_buf == NULL)
        ctx->read_buf = ap_palloc(some_pool, READ_BUF_SIZE);
    read_context = prep_read_context(bucket, ctx->read_buf);
    ...

All right.. Maybe that is "okay" for people to do in each filter, and works
for the file case.

Uh oh... what happens when somebody calls ap_rprintf()? If we attempt to
delay the actual formatting as late as possible (so that BUFF can do it
directly into the network buffer), then we pass around a fmt/va_list pair.
Our issue here is to reduce that to characters for scanning. We can use
ap_vsnprintf() to drop it into ctx->read_buf, but what happens on overflow?
Now we need to allocate a big enough block from somewhere, format into it
using ap_pvsprintf(), and then toss it out. The only "toss" that we have
right now is destroying a pool.

My intent was to simplify the job for filters. When they don't want to do
this work, they use a char* handler and get plain old bytes. Simple, clean,
and easy.

When filters can be smarter and work with files or bytes, or whatever, then
they can use the bucket-based callback. Should we encourage everybody to use
the bucket interface? Probably. Does this encouragement change the patch
that is submitted? I don't think so.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

buckets vs strings (was: what are the issues?)

Posted by Greg Stein <gs...@lyra.org>.

On Thu, Jun 29, 2000 at 11:30:51PM -0700, Roy T. Fielding wrote:
>...
> The point here isn't that this is the only way to implement filters.
> The point is that no other interface can implement them as efficiently.
> Not even close.  Yes, there are cases where string filters are just as
> efficient as any other design, but there is no case in which they are
> more efficient than bucket brigades.

Um. I think it is important to clarify that string-filters are not the only
option in my patch (dunno if you knew this). I *do* have a bucket interface,
and it can process things in the fashion you describe.

I do not have lists of buckets, but that is merely adding a "next" pointer.

The intent is to provide a small, tight, reviewable framework that allows
for the growth into completeness. Want a list of buckets? A "next" pointer
is easy. Want an "End of Stream" bucket? Another bucket type. Want a bucket
that carries its own data to allow for arbitrary lifetimes? Another bucket
type (coded that example last night).

Starting small and lean helps us to understand what is going in. The patch
that I posted for committing is a simple 850 lines. A continuing sequence of
development easily brings us to the position that you are looking for, Roy.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: what are the issues? (was: Re: Patch review: ...)

Posted by Rasmus Lerdorf <ra...@apache.org>.

> In reference to some other messages, it isn't necessary for us to wait
> for a content length -- HTTP/1.1 chunked does work just fine.  However,
> that doesn't mean it isn't preferable to use content-length whenever
> possible, since not all clients are HTTP/1.1 and most browsers are able
> to present a better progress-bar if they have the content-length in
> advance of the data.

Isn't the more important reason to have a content-length header that
browsers won't do keep-alive without one?  I never really understand that
limitation though.

-Rasmus