You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by "Roy T. Fielding" <fi...@kiwi.ICS.UCI.EDU> on 2000/07/23 06:35:12 UTC

More detailed review of Ryan's filtering patch

Well, okay, the subject lies -- I tried to read through Ryan's stuff
and I just keep getting lost in the details.  Where is the forest?
I'm going to have to write my own version just to make sense of it.
But I have to do some board stuff first, so here's just a summary.

The concepts seem to be there, but a lot of the names are wrong and
some of the assumptions about bucket processing won't work in general.

A good way to think of a filter stream is just to think of it as a
file -- we want to be able to do everything to the stream that we might want
to do to an output file.  We might want to have multiple output streams,
just like forking output to multiple files (think response caching).
This is generally done with a tee-style filter that is dropped on the
stream like any other filter.

Likewise, a good way to think of a bucket is as an input file.  It is best
to just pass the bucket around by the handle until we need to mess with the
contents.  When we do need to look at the contents, we can't just convert
them to a string, since the contents may be too big.  Converting to a string
must therefore be a sequence of fixed buffer reads until the bucket is
"empty."  Also, it isn't nice to lose the metadata associated with such
buckets just because they are converted -- we need to translate the metadata,
often in the form of a stat structure, into a metadata bucket and pass that
along as well.

But at the same time, we don't want to do any of this "read" stuff if the
bucket is already in the form of a ptr+len string or if the bucket is the
EOS, so there will be at least three code paths for any filtering reader.

Note that in the process of being read, a bucket may need to change
its color.  That is, a reference to a file bucket may need to replace
itself with a reference to an mmap bucket.  Ain't that fun?  Now consider
how that affects the ap_bucket_t type.  It either needs to be parameterized
below the level of the color, or we have to pass the address of the
bucket pointer on every read instead of just the pointer.

The filter hooks are simply the wrong interface -- there is nothing more
to say about that.  They don't gain anything over the far less complicated
manually-sorted list of filters, and end up losing the efficiency advantage
of allocating some structures from the stack instead of the heap.  We need
a hook mechanism for registry of named filters, which can then be selected
by name and dropped on the output stream, but using hooks for connecting
the actual stream won't work because streams are not always linear.

....Roy

Re: More detailed review of Ryan's filtering patch

Posted by rb...@covalent.net.

On Sat, 22 Jul 2000, Roy T. Fielding wrote:

> Well, okay, the subject lies -- I tried to read through Ryan's stuff
> and I just keep getting lost in the details.  Where is the forest?
> I'm going to have to write my own version just to make sense of it.
> But I have to do some board stuff first, so here's just a summary.

I'll try to comment some more of it tomorrow.  I think it is easy to
follow, but of course, I wrote it.  :-)

> The concepts seem to be there, but a lot of the names are wrong and
> some of the assumptions about bucket processing won't work in general.

I tend to not worry about naming too much, because most people disgree
with the names that make the most sense to me. 

> Likewise, a good way to think of a bucket is as an input file.  It is best
> to just pass the bucket around by the handle until we need to mess with the
> contents.  When we do need to look at the contents, we can't just convert
> them to a string, since the contents may be too big.  Converting to a string
> must therefore be a sequence of fixed buffer reads until the bucket is
> "empty."  

This would be easy to add at any time, but I'm not sure I agree.  If we
are dealing with a large file, then we really should be mmapp'ing it and
returning a char * to the front of the mmap.  Then, of course we have to
deal with the case of a machine without mmap.  Hmmm.....   Maybe having
the reads look like APR's read routines would make the most sense.

> Also, it isn't nice to lose the metadata associated with such
> buckets just because they are converted -- we need to translate the metadata,
> often in the form of a stat structure, into a metadata bucket and pass that
> along as well.

So, that requires creating a stat function pointer, and adding it to the
bucket type.  Shouldn't be that hard to do.

> But at the same time, we don't want to do any of this "read" stuff if the
> bucket is already in the form of a ptr+len string or if the bucket is the
> EOS, so there will be at least three code paths for any filtering reader.

I don't think the filters want to have any concept of what they are
reading from.  Filters will want to call b->read() and know that they got
the data back.  Later they will want to call b->write(), and know that the
data is in the right spot in the bucket.

> Note that in the process of being read, a bucket may need to change
> its color.  That is, a reference to a file bucket may need to replace
> itself with a reference to an mmap bucket.  Ain't that fun?  Now consider
> how that affects the ap_bucket_t type.  It either needs to be parameterized
> below the level of the color, or we have to pass the address of the
> bucket pointer on every read instead of just the pointer.

Why can't the function just change where the function pointers point as
well as the color.  I actually coded this stuff first to have the function
pointers in the bucket_color_t, but that unweildy VERY quickly.  Every
function that we add to the bucket then needs to have a wrapper that
doesn't actually do anything, other than figure out which bucket type it
is, and then find the function pointers in that bucket.

> The filter hooks are simply the wrong interface -- there is nothing more
> to say about that.  They don't gain anything over the far less complicated
> manually-sorted list of filters, and end up losing the efficiency advantage
> of allocating some structures from the stack instead of the heap.  We need
> a hook mechanism for registry of named filters, which can then be selected
> by name and dropped on the output stream, but using hooks for connecting
> the actual stream won't work because streams are not always linear.

Fine.  The filters are also easily re-placeable.  The API is going to need
to be somewhat close to what I was using there.

The bottom line is that we are stagnating right now.  I think we have two
options.  Commit filtering of some kind next week, or leave filtering out
of 2.0.  We haven't had an alpha since before this filtering stuff
started, and we aren't moving forward right now.  There was talk at some
point of getting a release out at AC Europe.  It really doesn't look like
that is going to happen.

Roy, if you re-write all of the filtering stuff to look like what you
want, how long will it take?

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------