You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Roy T. Fielding" <fi...@kiwi.ICS.UCI.EDU> on 2000/07/23 03:53:05 UTC

More detailed review of Greg's filtering patch

Greg explained some things that I didn't understand about the filters patch
while we were in Monterey.  For example, since we aren't sharing bucket
structures, it is possible to allocate those on the stack and
not worry about freeing them -- any blocking point on the stream
will just save the bucket data and not the structure.  However, we still
need a way to differentiate between statically allocated data and
dynamically allocated data, since we don't want heap stuff copied when
it doesn't need to be.

Just thought of one issue.  If we are passing down a large chain
of buckets and some bottom part is going to set them aside, is it
best to copy just the contents (effectively merging the buckets)
or set aside a copy of the bucket structure as well?  I guess that
would depend on the type of bucket.

Other comments on greg.patch:

flush_filters is named wrong -- ending the stream is different from flushing
buffered content from the end of the stream.  This really should be
ap_end_content, or simply ap_rclose.  We may also want a flush call,
but that should only impact filters that have set aside processed data
for the purpose of output buffering -- unprocessed data should not be
affected.

I don't like the DECL_FILTER_HEAD macro for a variety of reasons.
The style reason is simply that it is better to leave the type visible
for code readability, which could be accomplished by 

#define AP_INIT_FILTER_HEAD(r)  { NULL, (r), NULL, 0, (r)->filters }

     ap_filter fc = AP_INIT_FILTER_HEAD(r, filter);

but I'm not fond of that either.  Basically, I don't understand why
we need to allocate a filter structure instead of just pointing to
the r->filter.  Instead of

 API_EXPORT(int) ap_rputc(int c, request_rec *r)
 {
     DECL_FILTER_HEAD(r, filter);
 
     if (r->connection->aborted)
         return EOF;

     ap_fc_putc(&filter, c);
 
     SET_BYTES_SENT(r);
     return 1;
 }

I would prefer

 API_EXPORT(int) ap_rputc(int c, request_rec *r)
 {
     ap_bucket_t bucket = ap_bucket_ptrlen_init(&c, 1);

     if (r->connection->aborted)
         return EOF;

     ap_fc_write(r->filter, b);
 
     SET_BYTES_SENT(r);
     return 1;
 }

In order for this to work in Greg's scheme, r->filter would have to be
primed with the BUFF connection.  ap_bucket_ptrlen_init would still be
a macro function.

On a side note, we should rethink SET_BYTES_SENT.  And shouldn't ap_rputc
be writing a const unsigned char?

BUFF_filter_callback confuses me, but that may be just because I don't
think of function pointers as callbacks.  Having a NULL next filter
imply BUFF_filter_callback may hide programming errors.  This should
just be the first filter placed on the filter list.

There are too many ap_fc_putblah functions -- we should be doing the
conversion to bucket brigades in the ap_r* functions.  All of the filter
writes should be returning a status value.  A given filter may need
to know about the request_rec (for the actual filtering logic), but
the generic filter processing should not have access to r.

Eventually, sub requests should be replaced by a URI bucket and
APACHE_XLATE should just be a content filter.

Regarding ap_filter.h:

Buckets are separate from filters -- they should not be in the same file.
Likewise, placing filters on the output chain is separable from filter
processing.  We haven't talked about input filters yet, but we should
keep the concept in mind when choosing symbol names.

I don't think we need AP_BUCKET_STRINGS.  In general, we should avoid
null-terminated strings except at the ap_r* level.

After talking to Greg, I now understand the desire for AP_BUCKET_PRINTF.

The bucket structure should separate itself from the data it carries:

struct ap_bucket_t {
    ap_bucket_type type;
    ap_bucket_t *next;          /* next bucket in list */
    ap_bucket_t *prev;          /* previous bucket in list */

    void *data;                 /* structure for bucket-specific data */
    ap_ssize_t datalen;         /* shortcut for buffer processing */
};

so that each specific bucket type has its own structure.  That is assuming
we don't want to be able to register bucket types, which is needed to
separate the bucket implementation from the Apache-specific request types.
In order for bucket registration to work, the bucket itself needs to
contain function pointers for type conversion (bucket_to_nstr) and
ap_bucket_type would be a struct of function pointers.

Regarding filters.c:

A global pool and registration functions simply won't work.  In a
multithreaded process model, we will have multiple simultaneous streams
of output filters, and there should be one pool for each stream that
is a subpool of the connection pool.  Each filter should inherit the
pool from the next filter on the chain.

Summary:

I understand this design much better after having talked to Greg
about the properties of allocating bucket structures off the stack.
I would still change a lot of this, but the only thing I am -1 about
is the filter registration and global pool.  The rest are design
differences.

....Roy

Re: More detailed review of Greg's filtering patch

Posted by Greg Stein <gs...@lyra.org>.
A couple quick comments/clarifications. I'm a bit rushed :-(

On Sat, Jul 22, 2000 at 06:53:05PM -0700, Roy T. Fielding wrote:
>...
> flush_filters is named wrong -- ending the stream is different from flushing
> buffered content from the end of the stream.  This really should be
> ap_end_content, or simply ap_rclose.

No problem. In the registration patch that Ryan put together (and I'm
helping to tweak), this is now called end_output_stream(). It is still
private since the only callers are other ap_* public functions in that file.

>...
> #define AP_INIT_FILTER_HEAD(r)  { NULL, (r), NULL, 0, (r)->filters }
> 
>      ap_filter fc = AP_INIT_FILTER_HEAD(r, filter);

This can be done. No problem with me.

> but I'm not fond of that either.  Basically, I don't understand why
> we need to allocate a filter structure instead of just pointing to
> the r->filter.  Instead of
> 
>  API_EXPORT(int) ap_rputc(int c, request_rec *r)
>  {
>      DECL_FILTER_HEAD(r, filter);
>  
>      if (r->connection->aborted)
>          return EOF;
> 
>      ap_fc_putc(&filter, c);
>  
>      SET_BYTES_SENT(r);
>      return 1;
>  }
> 
> I would prefer
> 
>  API_EXPORT(int) ap_rputc(int c, request_rec *r)
>  {
>      ap_bucket_t bucket = ap_bucket_ptrlen_init(&c, 1);

nit: &c may not point to the character :-)

>      if (r->connection->aborted)
>          return EOF;
> 
>      ap_fc_write(r->filter, b);
>  
>      SET_BYTES_SENT(r);
>      return 1;
>  }
> 
> In order for this to work in Greg's scheme, r->filter would have to be
> primed with the BUFF connection.  ap_bucket_ptrlen_init would still be
> a macro function.

ap_fc_*() pass the content to the *next* filter. If you pass r->filter to
those guys, then you will end up skipping the first filter.

Obviously, this portion needs commenting :-) ... I create a "dummy" filter
that I can pass to ap_fc_*. The ->next field is set to r->filters so that
you inject the content into the proper filter.

The dummy filter record is the cleanest way to reduce code duplication in
the "inject" portion of the patches. (without the dummy, you end up with
code dup between ap_r* and ap_fc_*)

>...
> BUFF_filter_callback confuses me, but that may be just because I don't
> think of function pointers as callbacks.  Having a NULL next filter
> imply BUFF_filter_callback may hide programming errors.  This should
> just be the first filter placed on the filter list.

Yes. It should be the first filter added. As I described elsewhere, I wanted
to minimize the patch size and the possible impact. I created
BUFF_filter_callback to be equivalent to what *would* be the first filter.
But I didn't want to include its addition to the chain with the patches
necessary to create the basic filtering system. A second round would have
all the necessary bits/tweaks to move BUFF_filter_callback to wherever it
needed to go and to get it inserted properly, on the proper requests, in the
right order, etc.

"minimize perturbation" was my rule here :-)

>...
> I don't think we need AP_BUCKET_STRINGS.  In general, we should avoid
> null-terminated strings except at the ap_r* level.

AP_BUCKET_STRINGS was created so that we could defer copying the strings off
the stack until we finally determined it was necessary to make that copy.

[ the problem originates from ap_rvputs(r, s1, s2, s3, NULL) ]

>...
> Regarding filters.c:
> 
> A global pool and registration functions simply won't work.  In a
> multithreaded process model, we will have multiple simultaneous streams
> of output filters, and there should be one pool for each stream that
> is a subpool of the connection pool.  Each filter should inherit the
> pool from the next filter on the chain.
> 
> Summary:
> 
> I understand this design much better after having talked to Greg
> about the properties of allocating bucket structures off the stack.
> I would still change a lot of this, but the only thing I am -1 about
> is the filter registration and global pool.  The rest are design
> differences.

These last two paragraphs:

That global pool is for holding information *about* the filters. Meta info
if you will. The name -> filter info mapping.

When you add a filter to a request output chain, it does not use the global
pool -- it uses the request pool.

Note: the reason why we have the darn global pool is because a pool is not
passed to a module's register_hooks function. I figured on solving that
if/when necessary. Since it looks like Ryan is probably going to commit the
registration later this week, then yes: it will be nice to pass the pool
through to the register_hooks slot.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: More detailed review of Greg's filtering patch

Posted by rb...@covalent.net.
On Sat, 22 Jul 2000, Roy T. Fielding wrote:

> Greg explained some things that I didn't understand about the filters patch
> while we were in Monterey.  For example, since we aren't sharing bucket
> structures, it is possible to allocate those on the stack and
> not worry about freeing them -- any blocking point on the stream
> will just save the bucket data and not the structure.  However, we still
> need a way to differentiate between statically allocated data and
> dynamically allocated data, since we don't want heap stuff copied when
> it doesn't need to be.

The other problem with this, is that it keeps us from ever sharing the
buckets.  I do envision a time when we want to keep a list of free buckets
that can then be re-used.

> There are too many ap_fc_putblah functions -- we should be doing the
> conversion to bucket brigades in the ap_r* functions.  All of the filter
> writes should be returning a status value.  A given filter may need
> to know about the request_rec (for the actual filtering logic), but
> the generic filter processing should not have access to r.

While I agree that the conversion to bucket brigades can be done in the
ap_r* functions.  I expect that MANY performance improvements will be
possible by skipping the ap_r* functions completely.  I believe modules
written for 2.0 should use the bucket functions directly, and only legacy
modules should continue to use the ap_r* functions.

Basically, this will encourage module writers to buffer their data at the
top of the filter stack, and there will thus be less to buffer at the
bottom.  Let me give a quick example:

mod_autoindex.  If mod_autoindex used the bucket functions directly, we
could probably make one call down the filter stack.  Because it is
currenlty using the ap_r* functions, we make about five per line.

> Eventually, sub requests should be replaced by a URI bucket and
> APACHE_XLATE should just be a content filter.

YES!

> The bucket structure should separate itself from the data it carries:
> 
> struct ap_bucket_t {
>     ap_bucket_type type;
>     ap_bucket_t *next;          /* next bucket in list */
>     ap_bucket_t *prev;          /* previous bucket in list */
> 
>     void *data;                 /* structure for bucket-specific data */
>     ap_ssize_t datalen;         /* shortcut for buffer processing */
> };
> 
> so that each specific bucket type has its own structure.  That is assuming
> we don't want to be able to register bucket types, which is needed to
> separate the bucket implementation from the Apache-specific request types.
> In order for bucket registration to work, the bucket itself needs to
> contain function pointers for type conversion (bucket_to_nstr) and
> ap_bucket_type would be a struct of function pointers.

This is exactly what the code I have submitted does (with the help of
Cliff Wooley)

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------