You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by rb...@covalent.net on 2000/07/13 00:59:18 UTC

Filtering issues.

I have been spending today working on my filtering patch.  I will have a
patch shortly that will work with current content generators.  I have also
found a massive issue with current modules in the process.

My test module has been mod_autoindex.  The goal has been to avoid BUFF
for everything except for the headers (not worth it to re-write the
headers stuff right now).  I have had code working today, where
mod_autoindex would create bucket brigades, and send them down the chain,
and the core would write them out.  I had this at 10:00 this morning.

I realized immediately that this was sub-optimal, because the core filter
wasn't holding up enough data to make it worth our while to write to the
network.

My next step was to have the core save data off to the side.  After
working through a bunch of small bugs, I was getting corrupted data.  This
is where I was until two minutes ago.  The problem is that mod_Autoindex
is using an automatic variable to output data.  This means that my patch
was saving data off to the side, and it was being over-written by
mod_autoindex, because I wasn't copying.  The problem, is that
mod_autoindex wasn't written to take zero-copy into account.

This is fixable, but only by using read/write buckets (which imply a
memcopy) with the ap_r* functions.  Most content generators will need to
be re-written to take full advantage of the optimizations available to
filters, but they will work with filters regardless.

This problem will crop up sooner or later in the patch submitted by
Greg.  It isn't noticable now, because BUFF is being used at the bottom of
the stack, which is doing the memcopy invisibly.

Expect a patch by the end of the day or today or early tomorrow
morning.  I have to modify the ap_bucket_[v]printf functions to use rwmem
buckets (five minutes work), and then test a bit more (twenty minutes
work).

Ryan


_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: More detailed review of Greg's filtering patch

Posted by Greg Stein <gs...@lyra.org>.
A couple quick comments/clarifications. I'm a bit rushed :-(

On Sat, Jul 22, 2000 at 06:53:05PM -0700, Roy T. Fielding wrote:
>...
> flush_filters is named wrong -- ending the stream is different from flushing
> buffered content from the end of the stream.  This really should be
> ap_end_content, or simply ap_rclose.

No problem. In the registration patch that Ryan put together (and I'm
helping to tweak), this is now called end_output_stream(). It is still
private since the only callers are other ap_* public functions in that file.

>...
> #define AP_INIT_FILTER_HEAD(r)  { NULL, (r), NULL, 0, (r)->filters }
> 
>      ap_filter fc = AP_INIT_FILTER_HEAD(r, filter);

This can be done. No problem with me.

> but I'm not fond of that either.  Basically, I don't understand why
> we need to allocate a filter structure instead of just pointing to
> the r->filter.  Instead of
> 
>  API_EXPORT(int) ap_rputc(int c, request_rec *r)
>  {
>      DECL_FILTER_HEAD(r, filter);
>  
>      if (r->connection->aborted)
>          return EOF;
> 
>      ap_fc_putc(&filter, c);
>  
>      SET_BYTES_SENT(r);
>      return 1;
>  }
> 
> I would prefer
> 
>  API_EXPORT(int) ap_rputc(int c, request_rec *r)
>  {
>      ap_bucket_t bucket = ap_bucket_ptrlen_init(&c, 1);

nit: &c may not point to the character :-)

>      if (r->connection->aborted)
>          return EOF;
> 
>      ap_fc_write(r->filter, b);
>  
>      SET_BYTES_SENT(r);
>      return 1;
>  }
> 
> In order for this to work in Greg's scheme, r->filter would have to be
> primed with the BUFF connection.  ap_bucket_ptrlen_init would still be
> a macro function.

ap_fc_*() pass the content to the *next* filter. If you pass r->filter to
those guys, then you will end up skipping the first filter.

Obviously, this portion needs commenting :-) ... I create a "dummy" filter
that I can pass to ap_fc_*. The ->next field is set to r->filters so that
you inject the content into the proper filter.

The dummy filter record is the cleanest way to reduce code duplication in
the "inject" portion of the patches. (without the dummy, you end up with
code dup between ap_r* and ap_fc_*)

>...
> BUFF_filter_callback confuses me, but that may be just because I don't
> think of function pointers as callbacks.  Having a NULL next filter
> imply BUFF_filter_callback may hide programming errors.  This should
> just be the first filter placed on the filter list.

Yes. It should be the first filter added. As I described elsewhere, I wanted
to minimize the patch size and the possible impact. I created
BUFF_filter_callback to be equivalent to what *would* be the first filter.
But I didn't want to include its addition to the chain with the patches
necessary to create the basic filtering system. A second round would have
all the necessary bits/tweaks to move BUFF_filter_callback to wherever it
needed to go and to get it inserted properly, on the proper requests, in the
right order, etc.

"minimize perturbation" was my rule here :-)

>...
> I don't think we need AP_BUCKET_STRINGS.  In general, we should avoid
> null-terminated strings except at the ap_r* level.

AP_BUCKET_STRINGS was created so that we could defer copying the strings off
the stack until we finally determined it was necessary to make that copy.

[ the problem originates from ap_rvputs(r, s1, s2, s3, NULL) ]

>...
> Regarding filters.c:
> 
> A global pool and registration functions simply won't work.  In a
> multithreaded process model, we will have multiple simultaneous streams
> of output filters, and there should be one pool for each stream that
> is a subpool of the connection pool.  Each filter should inherit the
> pool from the next filter on the chain.
> 
> Summary:
> 
> I understand this design much better after having talked to Greg
> about the properties of allocating bucket structures off the stack.
> I would still change a lot of this, but the only thing I am -1 about
> is the filter registration and global pool.  The rest are design
> differences.

These last two paragraphs:

That global pool is for holding information *about* the filters. Meta info
if you will. The name -> filter info mapping.

When you add a filter to a request output chain, it does not use the global
pool -- it uses the request pool.

Note: the reason why we have the darn global pool is because a pool is not
passed to a module's register_hooks function. I figured on solving that
if/when necessary. Since it looks like Ryan is probably going to commit the
registration later this week, then yes: it will be nice to pass the pool
through to the register_hooks slot.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Filtering issues.

Posted by Greg Stein <gs...@lyra.org>.
On Wed, Jul 12, 2000 at 06:39:52PM -0700, Roy T. Fielding wrote:
> >My filter design considers the 95% case: no memcpy() is needed *unless*
> >somebody desides to "set aside" the output.
> 
> You can do that with buckets as well -- just create a bucket type that
> has to be replicated when set aside.

Exactly! That is how my patch/design approaches the problem. I'm trying to
say that the code that Ryan as submitted takes the opposite approach: all
buckets must be self-determining so they can be set aside.

> >Conversely: you designed for the 5% solution and force the other 95% to
> >conform to that solution. You must memcpy() data into arbitrary-lifetime
> >buffers.
> 
> That isn't an aspect of the design.

As I understand Ryan's patch: yes, it is.

> >You say "most content generators will need to be re-written to take full
> >advantage of the optimizations available to filters, ..."  Well, that is an
> >aspect of your design. My filter design does not require *any* content
> >generators to be rewritten.
> 
> Your design will never be optimal, period.  That's why we can't start
> with it as a basis, even if it works.

You have not explained what is sub-optimal about my patch. Please do so. How
can I get it improved to "your specifications" unless you help here?

> Please, I am tired of this type of discussion.

And I'm getting a bit tired of your "bring me a rock" messages. "Roy, here
is a patch." "Nope. That's not it." "Roy, here is another." "Nope, that
doesn't work either."

You keep saying that there is something wrong with the two patches, but you
haven't explained *what*. How can we fix anything?

> There are valid bits of
> stuff in all the files committed, and the only way to get to an optimal
> solution is to stop talking about "my" patch and start filling out the
> holes in the implementations of the design.

> If there are problems with the code, in any of the files, then commit
> a comment to that effect directly within the code or within a design
> overview doc in the same directory.

I don't believe in the approach/design that Ryan has embodied in his
implementation. I've explained why. I don't have any particular motivation
to try and correct the stuff that he has written, when I believe there is
*already* a perfectively viable alternative prepared and ready to go (for a
week and a half now). And as far as I'm concerned, these are still suggested
patches and they have names attached; I'll continue to refer to them by
name. See ap_filter.h for the design embodied in my patches. If it is
missing some information that you'd like to see, then please ask.

And I haven't heard anything but hand-waving about why the bucket design and
lifetime issues in my patch are "bad" or "wrong" or "never be optimal." I
just don't believe it at this time. Stop talking about it and show us the
"right" code or describe the "right" design that enables filtering in
today's Apache, without changes being required from content generators.
Whatever the solution is, it should be able to drop right in. If you have
different requirements, then let's hear them.

> >I am quite interested watching the development of the buckets now that you
> >placed them into source control. There does not seem to be a coherent design
> >that is managing the development. A lot of poking, prodding, and bug fixing
> >is occurring to get the stuff to work. I find that very sad when there is an
> >existing filtering patch that doesn't have these bugs, works very well, does
> >not impose changes upon the content generators, etc.
> 
> And doesn't satisfy our needs.

WHY? Don't just sit there and say "no" without explanation. That is
counter-productive.

> >The solution that you're putting together feels very complex and very
> >fragile.
> 
> I agree, but that is because it doesn't fit the design yet.  That's my
> problem to fix and it isn't something I can do without access to all of
> the source files.

What design? And why is it only your problem?

If you're participating, then help us out here. It's very annoying to get
pronouncements from on high. Summarize your requirements, and then match
those up against the two patch sets. I'd be interested to know what you're
looking for, and how the patch sets fail those expectations. As it stands,
we are simply bringing rocks to the table.

> >I chuckled when I saw the addition of the "rmem" buckets and that you simply
> >stored the pointer into the bucket. Your bucket design requires buckets'
> >lifetimes to be self-determining, so a simple reference would never work.
> >You will always need to copy, so those rmem buckets simply do not work.
> 
> Fix it, or make a note and someone else will fix it.

I already fixed it -- by virtue of the problem not being present in my
patches. Why should I spend time creating/fixing/patching/documenting
duplicate, broken code? Please explain why.

> >Finally, a meta issue: it is rather upsetting that vetoed code is residing
> >in APR. People are submitting patches to get the darn thing to compile, to
> >fix bugs, and add functions. I feel the process that is happening around
> >this bucket stuff is not respecting the Apache Development Guidelines.
> 
> Voting is used to determine what gets distributed as Apache, not what
> resides in the repository.  The repository is a development tool.

As it stands, Ryan has added it to Apache. That is the problem. I could care
less that it sits there.

> The guidelines are designed to help us make progress on decisions.
> When two developers are deadlocked in consideration of an issue, the
> right way to resolve the deadlock is via a bake-off, where both designs
> are implemented and we all test to see which works best.  Then we can
> vote on the implementation and not on the design.

Fine. What is your suggested process? I've got an implementation that is
ready for review. Has been for a while. What process are you suggesting that
will magically get it reviewed? Or do we first sit around until all the
implementations arrive? Does this stuff never go into Apache because nobody
cares to do the reviews you're looking for?

Grumbly,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Filtering issues.

Posted by Dirk-Willem van Gulik <di...@covalent.net>.

On Wed, 12 Jul 2000, Greg Stein wrote:

> On Wed, Jul 12, 2000 at 03:59:18PM -0700, rbb@covalent.net wrote:
> >...
> > This is fixable, but only by using read/write buckets (which imply a
> > memcopy) with the ap_r* functions.  Most content generators will need to
> > be re-written to take full advantage of the optimizations available to
> > filters, but they will work with filters regardless.
> > 
> > This problem will crop up sooner or later in the patch submitted by
> > Greg.
> 
> No. It won't.
> 
> My filter design considers the 95% case: no memcpy() is needed *unless*
> somebody desides to "set aside" the output.
> 
> Conversely: you designed for the 5% solution and force the other 95% to
> conform to that solution. You must memcpy() data into arbitrary-lifetime
> buffers.

Actually, one of the point's of 2.0 is that we can stop worrying about
this. This was _THE_ number one premisse for the 2.0 work. So this is
a non issue IMHO. Unless we suddenly see 2.0 as a 1.3 on steroids. 

Secondly, if you look at what some of the legacy modules do; pass pointers
to static buffers which they later modify, it should be clear that in
'legacy' mode you _always_ will have to do a memcpy()

Thirdly, if most of those modules will ever run in a threaded 2.0
environment they will have to learn better habits anyway. So someone
will propably update them sooner or later (and then we get much
cleaner binary pluginablility for 2.0 in any case :-).


Dw


Does Borland compiler work on Apache?

Posted by jlwpc1 <jl...@mail.earthlink.net>.
Inprise/Borland Will Radically Simplify Apache(TM) Web Application
Development

http://www.newsalert.com/bin/story?StoryId=CowQBWbWbu0zuvte0nW

JLW




Re: Filtering issues.

Posted by rb...@covalent.net.
> > This problem will crop up sooner or later in the patch submitted by
> > Greg.
> 
> No. It won't.
> 
> My filter design considers the 95% case: no memcpy() is needed *unless*
> somebody desides to "set aside" the output.

I don't want this to turn into my patch vs your patch again.  This issue
affects all filtering patches.  The reason is relatively simple.  If we
remove Buff from the code (which seems to be the current goal), we need to
have some way to buffer the data.  It makes sense to me to use the bucket
brigades to buffer the data, and then use iovec's to write them out.  As
long as we do any buffering, we either need to copy the data or re-write
the existing modules.

The two existing patches both memcopy the data.  The only difference is
where.  The bucket brigades copy it at the top of the filter stack.  The
ap_bucket_t patch (best name I could come up with  :-) uses BUFF to
memcopy the data at the bottom of the stack.

The only way to get around this, is to re-write the existing modules.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: Filtering issues.

Posted by Greg Stein <gs...@lyra.org>.
On Wed, Jul 12, 2000 at 03:59:18PM -0700, rbb@covalent.net wrote:
>...
> This is fixable, but only by using read/write buckets (which imply a
> memcopy) with the ap_r* functions.  Most content generators will need to
> be re-written to take full advantage of the optimizations available to
> filters, but they will work with filters regardless.
> 
> This problem will crop up sooner or later in the patch submitted by
> Greg.

No. It won't.

My filter design considers the 95% case: no memcpy() is needed *unless*
somebody desides to "set aside" the output.

Conversely: you designed for the 5% solution and force the other 95% to
conform to that solution. You must memcpy() data into arbitrary-lifetime
buffers.

You say "most content generators will need to be re-written to take full
advantage of the optimizations available to filters, ..."  Well, that is an
aspect of your design. My filter design does not require *any* content
generators to be rewritten.



I am quite interested watching the development of the buckets now that you
placed them into source control. There does not seem to be a coherent design
that is managing the development. A lot of poking, prodding, and bug fixing
is occurring to get the stuff to work. I find that very sad when there is an
existing filtering patch that doesn't have these bugs, works very well, does
not impose changes upon the content generators, etc.

The solution that you're putting together feels very complex and very
fragile.

I chuckled when I saw the addition of the "rmem" buckets and that you simply
stored the pointer into the bucket. Your bucket design requires buckets'
lifetimes to be self-determining, so a simple reference would never work.
You will always need to copy, so those rmem buckets simply do not work.


Finally, a meta issue: it is rather upsetting that vetoed code is residing
in APR. People are submitting patches to get the darn thing to compile, to
fix bugs, and add functions. I feel the process that is happening around
this bucket stuff is not respecting the Apache Development Guidelines.


Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/