You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Greg Stein <gs...@lyra.org> on 2000/07/10 13:26:56 UTC

filtering patches

I've just checked in an update to STATUS noting a veto on the two patches
that Ryan has posted. I'm not going to go through a serious, in-depth review
of the patches here; instead, I'll just hit on the high points:

*) all content generators (Apache's and third party's) must be rewritten to
   take advantage of this filtering mechanism

*) these patches do not use BUFF, but attempt to go straight to the network.
   this loses the buffering, the chunking transfer coding, and the character
   set translation which BUFF provides.

*) the use of "rwmem" is unclear; it appears to have a fixed size buffer and
   requires that all generated content be copied into a holding buffer. if a
   content-generator creates 100Mb of content, can it do this piecemeal
   (create/destroy bucket brigades for pieces of the content) or will it get
   spooled to disk (per a comment in ap_rwmem_write)

*) minor issues:
   - the use of malloc() is scary; I do not understand why a pool is not
     used (scoped to the request, the connection, or a cache)
   - bugs, such as "newbuf = malloc(sizeof(newbuf))". the server doesn't
     even operate if these patches are applied (not withstanding the BUFF
     avoidance mentioned above)
   - much more complex API (at least a couple dozen new functions)


------------------------------

Conversely:

*) the patches that I've posted require no changes on the part of today's
   content generators, nor will it be required in the future.

*) The patches continue to work with BUFF, which has been (effectively)
   isolated into a filter; a future pass can turn it into a real filter, can
   shift translation out (e.g. the mod_xlate that I posted), and can shift
   chunking out (thus leaving it to simply buffer content for the network).

*) the output mechanism is the same as today: ap_rwrite() and friends.
   Filters use ap_fc_write() and friends -- analogues to ap_r*().

*) no known bugs. the server operates quite fine, and I've tested the
   patches with several actual filters, performing different types of
   filtering.


------------------------------

I think it is time that we get some additional commentary and review of the
patch sets from the other project members. I know that my patch is "final"
for this pass; Ryan will probably resubmit with some bug fixes.

The STATUS file has been updated with Message-ID values for the location of
the different patches, and the mod_xlate demo that I posted. Of course, it
also has the current record of voting for the three patch sets.

Please take some time to review these patch sets. Thanx!

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: filtering patches

Posted by Greg Stein <gs...@lyra.org>.
On Tue, Jul 11, 2000 at 08:32:16AM -0400, Rodent of Unusual Size wrote:
> Greg Stein wrote:
> > 
> > Hmm... I guess this also associated with the "modules must be
> > rewritten to use output filtering" issue. Either we mandate that
> > all content generators are rewritten so that we can drop BUFF
> > (in favor of the filters),
> 
> -1
> 
> > or we include both BUFF and the new BUFF-replacement filters in
> > the final code (eek!)
> 
> -1
> 
> > [ or provide a way for today's modules to use output filtering.
> 
> +1

Ken: then please take a closer look at my current patch. It works with all
of today's modules, CGIs, and the current Apache structure/design.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: filtering patches

Posted by rb...@covalent.net.
As I said in a response to Greg at some point yesterday, the current patch
requires that modules be re-written.  This is not a feature of the design,
it is a feature of the patch.

If we want current content generators to be able to use filtering, we just
have to re-write ap_r* to create a bucket and call ap_pass_bucket.  My
original plan had been to modify ap_r* to take a bucket_brigade as an
argument.  This would allow a module that wanted to create all of the data
and pass it down the stack at once to do so.

Think about mod_autoindex.  Currently, it makes numerous little calls to
ap_rputs.  This is quite honestly a stupid design with buckets.  My plan
had been to allow mod_autoindex to create a bucket_brigade with the whole
page in it and pass it down the filter stack.

If people don't like this idea, I'll re-write ap_r* to keep the same
signature and just pass data down the stack as soon as it gets it.

This will come in a second patch though, because I am trying to keep the
current patch small.

Ryan


On Tue, 11 Jul 2000, Rodent of Unusual Size wrote:
> Greg Stein wrote:
> > 
> > Hmm... I guess this also associated with the "modules must be
> > rewritten to use output filtering" issue. Either we mandate that
> > all content generators are rewritten so that we can drop BUFF
> > (in favor of the filters),
> 
> -1
> 
> > or we include both BUFF and the new BUFF-replacement filters in
> > the final code (eek!)
> 
> -1
> 
> > [ or provide a way for today's modules to use output filtering.
> 
> +1
> -- 
> #ken    P-)}
> 
> Ken Coar                    <http://Golux.Com/coar/>
> Apache Software Foundation  <http://www.apache.org/>
> "Apache Server for Dummies" <http://Apache-Server.Com/>
> "Apache Server Unleashed"   <http://ApacheUnleashed.Com/>
> 


_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: filtering patches

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
Greg Stein wrote:
> 
> Hmm... I guess this also associated with the "modules must be
> rewritten to use output filtering" issue. Either we mandate that
> all content generators are rewritten so that we can drop BUFF
> (in favor of the filters),

-1

> or we include both BUFF and the new BUFF-replacement filters in
> the final code (eek!)

-1

> [ or provide a way for today's modules to use output filtering.

+1
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

Re: filtering patches

Posted by rb...@covalent.net.
On Mon, 10 Jul 2000, Greg Stein wrote:
> On Mon, Jul 10, 2000 at 03:56:15PM -0700, rbb@covalent.net wrote:
> >...
> > If your module doesn't use the filtering stuff, we still use BUFF.  If the
> > first two modules written do character set translation and chunked
> > encoding, then we are done.  Since the only module that currently uses the
> > filtering is the core, I don't think this is an issue.
> 
> Well, the core is the thing sending out a lot of content today. I think it
> should probably have chunking/xlate/buffering :-)
> 
> Hmm... I guess this also associated with the "modules must be rewritten to
> use output filtering" issue. Either we mandate that all content generators
> are rewritten so that we can drop BUFF (in favor of the filters), or we
> include both BUFF and the new BUFF-replacement filters in the final code
> (eek!)
> 
> [ or provide a way for today's modules to use output filtering. My patch
>   does this, but I don't see how it would be done in your patch right now.
>   It assumes all buckets (brigades) can be set aside; I'm not sure how that
>   set-aside requirement would interact with ap_rwrite and friends. or maybe
>   the requirement doesn't exist? ]

I actually could provide a way for today's modules to use output
filters.  It's actually incredibly simple.  Just make ap_r* create a
bucket and then call ap_pass_brigade.  This would take all of twenty or
thirty minutes.  I didn't do it, because I don't think it's the right
solution.

> > You are asking for a complete patch though.  This is a design and a
> > beginning to the implementation.  Yes, I could go off on my own and write
> > this code and just drop it on the list one day.  If this is what you want,
> > let me know.  I'll go away for a few weeks.
> 
> No... sorry that I gave that impression. My ideal is that your patch
> continues to use BUFF for the output rather than trying to replace it at
> this point. In other words: torch some of the code from your patch :-). That
> will allow your patches to start working a bit better right out of the gate.

No, that's the wrong solution.  People fix things when they are broken,
not when they are sub-optimal.  We commit the current patch, and in the
next two weeks, somebody will fix it.  How can I garauntee that?  Because
if nobody else steps up and volunteers, then I will.  This is important to
me, so I am not going to just drop it.  The patch is relatively small
right now.  Yes, we break some things.  So what.  This is an alpha.  We
can break whatever we want, as long as it is obvious how to fix it
correctly.  The obvious fix for these two problems, is to write a filter
that performs the required actions.  Those are small, simple filters to
write.  I don't see that as an impediment to this patch.

> In a future pass, then we can start to replace BUFF. But those patches
> will be focused on that task, rather than a combined "add filtering,
> avoid BUFF" patch.

The filters avoid BUFF as a part of the design.  Yes, they could use BUFF
at the bottom, but that would be a step backwards.  If you would like to
do this on your own tree, just make core_filter use bwritev instead of
ap_flush_to_iol.  Even if I remove that section, all I would do is make
core_filter call ap_flush_to_iol, and the day after I commit this patch, I
would change it back, because ap_flush_to_iol is the right solution.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: filtering patches

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Jul 10, 2000 at 03:56:15PM -0700, rbb@covalent.net wrote:
>...
> If your module doesn't use the filtering stuff, we still use BUFF.  If the
> first two modules written do character set translation and chunked
> encoding, then we are done.  Since the only module that currently uses the
> filtering is the core, I don't think this is an issue.

Well, the core is the thing sending out a lot of content today. I think it
should probably have chunking/xlate/buffering :-)

Hmm... I guess this also associated with the "modules must be rewritten to
use output filtering" issue. Either we mandate that all content generators
are rewritten so that we can drop BUFF (in favor of the filters), or we
include both BUFF and the new BUFF-replacement filters in the final code
(eek!)

[ or provide a way for today's modules to use output filtering. My patch
  does this, but I don't see how it would be done in your patch right now.
  It assumes all buckets (brigades) can be set aside; I'm not sure how that
  set-aside requirement would interact with ap_rwrite and friends. or maybe
  the requirement doesn't exist? ]

> If you would like,
> I will hack together two modules that use the filtering to make these work
> as a part of this patch.

Certainly not :-)  below...

> You are asking for a complete patch though.  This is a design and a
> beginning to the implementation.  Yes, I could go off on my own and write
> this code and just drop it on the list one day.  If this is what you want,
> let me know.  I'll go away for a few weeks.

No... sorry that I gave that impression. My ideal is that your patch
continues to use BUFF for the output rather than trying to replace it at
this point. In other words: torch some of the code from your patch :-). That
will allow your patches to start working a bit better right out of the gate.

In a future pass, then we can start to replace BUFF. But those patches will
be focused on that task, rather than a combined "add filtering, avoid BUFF"
patch.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: filtering patches

Posted by rb...@covalent.net.
On Mon, 10 Jul 2000, Greg Stein wrote:

> On Mon, Jul 10, 2000 at 04:30:19PM +0100, Tony Finch wrote:
> > Greg Stein <gs...@lyra.org> wrote:
> > >*) these patches do not use BUFF, but attempt to go straight to the network.
> > >   this loses the buffering, the chunking transfer coding, and the character
> > >   set translation which BUFF provides.
> > 
> > I think that is the whole point, since all that functionality should be
> > re-implemented as filters in order to get rid of the horror that is BUFF.
> 
> Agreed, but I'd rather not see it broken (then rely on future changes to
> make it work again). It would be great to continue to use BUFF for now, then
> a future pass can start to transition the functionality from BUFF up to the
> filtering layer(s).

If your module doesn't use the filtering stuff, we still use BUFF.  If the
first two modules written do character set translation and chunked
encoding, then we are done.  Since the only module that currently uses the
filtering is the core, I don't think this is an issue.  If you would like,
I will hack together two modules that use the filtering to make these work
as a part of this patch.

You are asking for a complete patch though.  This is a design and a
beginning to the implementation.  Yes, I could go off on my own and write
this code and just drop it on the list one day.  If this is what you want,
let me know.  I'll go away for a few weeks.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: filtering patches

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Jul 10, 2000 at 04:30:19PM +0100, Tony Finch wrote:
> Greg Stein <gs...@lyra.org> wrote:
> >*) these patches do not use BUFF, but attempt to go straight to the network.
> >   this loses the buffering, the chunking transfer coding, and the character
> >   set translation which BUFF provides.
> 
> I think that is the whole point, since all that functionality should be
> re-implemented as filters in order to get rid of the horror that is BUFF.

Agreed, but I'd rather not see it broken (then rely on future changes to
make it work again). It would be great to continue to use BUFF for now, then
a future pass can start to transition the functionality from BUFF up to the
filtering layer(s).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: ap_hash_t and userdata (was: Re: filtering patches)

Posted by Tony Finch <do...@dotat.at>.
gstein@lyra.org wrote:
>
>hmm. Should ap_table_t be reimplemented as cover/utility functions on
>ap_hash_t?

See STATUS:

    * Replace tables with a proper opaque ADT that has pluggable
      implementations (including something like the existing data type,
      plus hash tables for speed, with options for more later).
	Status: fanf is working on this.

Tony.
-- 
f.a.n.finch    fanf@covalent.net    dot@dotat.at
471 scraping the roe from the slit-open belly of comedy

ap_hash_t and userdata (was: Re: filtering patches)

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Jul 10, 2000 at 02:01:21PM -0400, Rodent of Unusual Size wrote:
>...
> If we had tables
> with binary values (Dirk?), dealing with this would be simple.
> All a filter has to do is hang some sort of pointer to the pool
> off the request_rec.  That could be done now, in a messy kludgy
> fashion.

"tables with binary values" can be handled by Tony's new ap_hash_t APIs (see
apr_hash.h). It is a true hash table, too, unlike ap_table_t which is
implemented as a linear scan.

"hanging pointers" can be handled with ap_set_userdata(data, key, cleanup,
r->pool).

I'm using the tables in the DAV code to map URIs to unique integers.

hmm. Should ap_table_t be reimplemented as cover/utility functions on
ap_hash_t?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: filtering patches

Posted by rb...@covalent.net.
On Mon, 10 Jul 2000, Greg Stein wrote:
> On Mon, Jul 10, 2000 at 12:41:04PM -0700, rbb@covalent.net wrote:
> >...
> > If we use malloc/free we don't have the penalty when a filter caches the
> > data, because we can write it to the disk, and free that memory.  Using
> > pools, we can write it to the disk, and destroy the pool, but that doesn't
> > actually free the memory, it just returns it to the free list.  Yes, we
> > can re-use that memory, but we then back to using palloc to emulate
> > malloc/free.
> 
> One of the issues here is that you are proactively compensating for filters
> that need to set data aside, and making the rest of the filters operate
> within that design. I believe the 95% case won't be setting data aside,
> especially not *all* or most of the data. Typically, they might set aside
> partial translation characters, partial tags/directives, etc.
> 
> It is also interesting to note that this "proactive set aside" design means
> that you memcpy() all data into the malloc'd buffers. I would rather see the
> 5% case (that needs to do the set-aside) do the memcpy() at the set-aside
> time. This would remove the memcpy() for the 95% case.

No, it doesn't mean that everything gets copied.  Yes, the current patch
does do this copy.  I had to choose between implementing rwmem_buckets or
read_only_buckets.  I chose rwmem.  Read only buckets have no
memcopy.  The code is easier to write as it turns out, so I could
implement those buckets tomorrow if that would make you happy.

The fact is that the bucket brigades design allows the modules that want
to save data aside to save it aside cleanly without a memcopy.  It also
allows those that don't to just pass data down the stack.

Not all of it has been implemented, but the core of it is all there.

> In my filtering patch, the string values in the ap_rputs() go all the way to
> the network buffering code without a copy. It will get memcpy'd into BUFF's
> buffer, sure, but that is a necessary part of network/socket optimization.

In the final bucket brigades, there is no copy at all.  Just because it's
there now, doesn't mean it's always there.  The comments mention the read
only buffers.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: filtering patches

Posted by Greg Stein <gs...@lyra.org>.
[ only touching on a couple issues here, rather than the full stream... ]

On Mon, Jul 10, 2000 at 12:41:04PM -0700, rbb@covalent.net wrote:
>...
> If we use malloc/free we don't have the penalty when a filter caches the
> data, because we can write it to the disk, and free that memory.  Using
> pools, we can write it to the disk, and destroy the pool, but that doesn't
> actually free the memory, it just returns it to the free list.  Yes, we
> can re-use that memory, but we then back to using palloc to emulate
> malloc/free.

One of the issues here is that you are proactively compensating for filters
that need to set data aside, and making the rest of the filters operate
within that design. I believe the 95% case won't be setting data aside,
especially not *all* or most of the data. Typically, they might set aside
partial translation characters, partial tags/directives, etc.

It is also interesting to note that this "proactive set aside" design means
that you memcpy() all data into the malloc'd buffers. I would rather see the
5% case (that needs to do the set-aside) do the memcpy() at the set-aside
time. This would remove the memcpy() for the 95% case.

Here is a little block of code from mod_dav.c:

    if (first->propresult.xmlns == NULL) {
        ap_rputs("<D:response>", r);
    }
    else {
        ap_rputs("<D:response", r);
        for (t = first->propresult.xmlns; t; t = t->next) {
	    ap_rputs(t->text, r);
        }
        ap_rputc('>', r);
    }

In my filtering patch, the string values in the ap_rputs() go all the way to
the network buffering code without a copy. It will get memcpy'd into BUFF's
buffer, sure, but that is a necessary part of network/socket optimization.

> It doesn't make any sense to emulate malloc/free with
> create_sub_pool/destroy_pool.  We incur allocation penalties in either
> case, and we use more memory in the pool case.

Pools can allocate small memory chunks faster than malloc(). They are about
the same for large chunks, but the pool could be faster -- the pool can yank
memory from a free list, or fall back to a malloc() if no blocks are
available. A C runtime might have an optimized malloc() doing much the same
thing (i.e. avoid calling sbrk()), but I don't think we can rely on
optimized malloc libraries across all of our platforms (and our pools solve
that issue).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: filtering patches

Posted by rb...@covalent.net.
On Mon, 10 Jul 2000, Rodent of Unusual Size wrote:
> rbb@covalent.net wrote:
> > 
> > No, it can't.  The pool cannot be destroyed before returning from the
> > function.  We don't know that the rest of the filters have finished
> > with the data yet.
> 
> Isn't the outward-bound data supposed to be stored in a bucket?
> I think we're talking about apples and oranges; I'm talking about
> filter-private memory, and you appear to be talking about the
> memory used to transmit the data into and out of the filter.
> 
> Why can't each bucket have its own pool?  When the bucket gets
> dumped to the wire, the pool gets destroyed.  If there's some
> semi-pathological module along the way, like mod_gzip, that
> wants to get all of the buckets before passing any along,
> well, we'll have a lot of memory consumption anyway.

Each bucket could have it's own pool.  Then, we are talking about
implementing malloc using palloc.  Basically, each bucket will have to
create a sub-pool off of the request_rec's pool, and it will be used to
allocate memory for that bucket and that bucket only.  Then, when we are
done with the bucket, we destroy the pool.

This is the same thing as using malloc/free with register/kill cleanup,
except that we incur the penalty of creating and destroying
sub-pools.  There is no reason to use pools, because they have to be used
just like malloc/free.  We trade the penalty of two system calls for the
penalty of creating/destroying pools.

We also have more pools lying around, taking up unnecessary memory.

Plus, since the memory doesn't actually get freed, our pools will grow
bigger than necessary.

If we use malloc/free we don't have the penalty when a filter caches the
data, because we can write it to the disk, and free that memory.  Using
pools, we can write it to the disk, and destroy the pool, but that doesn't
actually free the memory, it just returns it to the free list.  Yes, we
can re-use that memory, but we then back to using palloc to emulate
malloc/free.

It doesn't make any sense to emulate malloc/free with
create_sub_pool/destroy_pool.  We incur allocation penalties in either
case, and we use more memory in the pool case.

> > > You have in no way convinced me of any portion of your sweeping
> > > "pools will not work for filters" statement.
> > 
> > How about now?
> 
> Now I think we're talking about the same thing -- but I'm
> still not convinced.

Getting warmer?

Ryan
_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: filtering patches

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
rbb@covalent.net wrote:
> 
> No, it can't.  The pool cannot be destroyed before returning from the
> function.  We don't know that the rest of the filters have finished
> with the data yet.

Isn't the outward-bound data supposed to be stored in a bucket?
I think we're talking about apples and oranges; I'm talking about
filter-private memory, and you appear to be talking about the
memory used to transmit the data into and out of the filter.

Why can't each bucket have its own pool?  When the bucket gets
dumped to the wire, the pool gets destroyed.  If there's some
semi-pathological module along the way, like mod_gzip, that
wants to get all of the buckets before passing any along,
well, we'll have a lot of memory consumption anyway.

> > You have in no way convinced me of any portion of your sweeping
> > "pools will not work for filters" statement.
> 
> How about now?

Now I think we're talking about the same thing -- but I'm
still not convinced.
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

Re: filtering patches

Posted by rb...@covalent.net.
On Mon, 10 Jul 2000, Rodent of Unusual Size wrote:
> rbb@covalent.net wrote:
> > 
> > They can create sub-pools easily, because they have the request_rec,
> > but the data shouldn't be allocated out of a pool.  Any data
> > allocated out of a pool sticks around forever.  Unless pools are
> > modified to free memory, or we do some funky sub-pool management,
> > pools can't be used for filters.
> 
> Nonsense.  If a filter is being treated as a black box, and doesn't
> know about any external processing state, it can easily create a
> subpool when called, and destroy it before returning.

No, it can't.  The pool cannot be destroyed before returning from the
function.  We don't know that the rest of the filters have finished with
the data yet.  Take the case of mod_gzip, which wants to cache all the
data so it can modify the content-length.

default_handler sends a bunch of data.
mod_include creates a sub-pool, allocates a bunch of data and sends it on.
mod_gzip caches that data and returns
mod_include destroys the pool on the way back.

Whoops, mod_gzip now has invalid data.  That is, unless each module
creates a sub-pool, and allocates and copies the data.  Which is something
we are trying to avoid here.

> > created before.  The problem is that there are two ways to associate
> > data with pools.  Either each filter has a pool, at which point all
> > memory has to be allocated for each filter, and the data has to be
> > copied.  Or, the data has it's own pool that is passed around with
> > the data.
> 
> You're trying to solve the problem of statefulness in an environment
> you just said was stateless.  "They do not know about where they
> are in the request processing."  Or did you mean that they don't
> know where they are in the output chain?

Sorry, they don't know where they are in the output chain.  Filters do
know where they are in the request processing, because only data that is
being sent to the network triggers filters.

> In any case, your statements above are flawed.  You seem to be making
> tons of assumptions about how filters will use memory, and they
> all seem to be worst-case.

Yes, my assumptions are worst case, because that's the case we have to
design for.  If we design for the best case, then the first person to come
along and do something complex brings down the server.  It is easy to code
this for a worst case design, and get code that works great for smart
filters, and well for stupid ones.  Why would we choose the design that
works great for smart modules and fall apart for stupid ones?

> > Not knowing which pool is the parent of the data's pool though, no
> > pool can be destroyed (to free the memory), because you don't know
> > which sub-pools will also be destroyed, thus freeing memory the other
> > filters weren't done with yet.
> 
> Rubbish.  Any subpool would be created by the filter, and it
> would know there wouldn't be any descendants.  If we had tables
> with binary values (Dirk?), dealing with this would be simple.
> All a filter has to do is hang some sort of pointer to the pool
> off the request_rec.  That could be done now, in a messy kludgy
> fashion.

So basically, what you are saying, is use palloc to emulate
malloc?  That's what this comes down to.  Yes, if we allocate memory out
of the pool, and ensure that no sub-pools are created, and keep the pool
with the data, then we could destroy the pool when the data has been
sent.  The only way this differs from using malloc/free, is that we incur
the overhead of creating the sub-pool.  You want the safety that comes
with pools?  Use malloc/free, and register a cleanup, killing the cleanup
as a part of the bucket destruction function.  This accomplishes the same
thing, but without any of the overhead of sub-pool creation or
destruction.

> You have in no way convinced me of any portion of your sweeping
> "pools will not work for filters" statement.

How about now?

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: filtering patches

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
rbb@covalent.net wrote:
> 
> They can create sub-pools easily, because they have the request_rec,
> but the data shouldn't be allocated out of a pool.  Any data
> allocated out of a pool sticks around forever.  Unless pools are
> modified to free memory, or we do some funky sub-pool management,
> pools can't be used for filters.

Nonsense.  If a filter is being treated as a black box, and doesn't
know about any external processing state, it can easily create a
subpool when called, and destroy it before returning.

> The problem with pools isn't that a filter can't find a sub-pool it
> created before.  The problem is that there are two ways to associate
> data with pools.  Either each filter has a pool, at which point all
> memory has to be allocated for each filter, and the data has to be
> copied.  Or, the data has it's own pool that is passed around with
> the data.

You're trying to solve the problem of statefulness in an environment
you just said was stateless.  "They do not know about where they
are in the request processing."  Or did you mean that they don't
know where they are in the output chain?

In any case, your statements above are flawed.  You seem to be making
tons of assumptions about how filters will use memory, and they
all seem to be worst-case.

> Not knowing which pool is the parent of the data's pool though, no
> pool can be destroyed (to free the memory), because you don't know
> which sub-pools will also be destroyed, thus freeing memory the other
> filters weren't done with yet.

Rubbish.  Any subpool would be created by the filter, and it
would know there wouldn't be any descendants.  If we had tables
with binary values (Dirk?), dealing with this would be simple.
All a filter has to do is hang some sort of pointer to the pool
off the request_rec.  That could be done now, in a messy kludgy
fashion.

You have in no way convinced me of any portion of your sweeping
"pools will not work for filters" statement.
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

Re: filtering patches

Posted by rb...@covalent.net.
Filters know about the current chunk of data and any chunk they have set
aside for themselves.  They do not know about where they are in the
request processing, or what other filters are installed.  Filters are
self-contained, just like all modules have always been.

They can create sub-pools easily, because they have the request_rec, but
the data shouldn't be allocated out of a pool.  Any data allocated out of
a pool sticks around forever.  Unless pools are modified to free memory,
or we do some funky sub-pool management, pools can't be used for filters.

What I find amazing, is that what is being asked for now, is my very first
ioblock patch, except as a recursive algorithm.  I could have had that in
a day and a half, two months ago.

The problem with pools isn't that a filter can't find a sub-pool it
created before.  The problem is that there are two ways to associate data
with pools.  Either each filter has a pool, at which point all memory has
to be allocated for each filter, and the data has to be copied.  Or, the
data has it's own pool that is passed around with the data.  Not knowing
which pool is the parent of the data's pool though, no pool can be
destroyed (to free the memory), because you don't know which sub-pools
will also be destroyed, thus freeing memory the other filters weren't done
with yet.

Ryan

On Mon, 10 Jul 2000, Rodent of Unusual Size wrote:
> rbb@covalent.net wrote:
> > 
> > Pools will not work for filters.  The reason is simple.  They
> > don't free memory.  If I try to write 100MB and I allocate it
> > from a pool, that 100MB is allocated for the length of the request.
> > It can't shrink until I am done with the request.  Pools will work
> > if we play games with sub-pools, but that requires that no modules
> > destroy the request->pool or any of it's sub_pools.  There is one
> > way to free a bucket, it is through the _destroy function.  If it
> > is not destroyed, we still have the data.  Using pools is going to
> > end up meaning that no filters or generators will destroy_pools
> > for fear of ruining somebody else's data later in the chain.
> 
> This and something else you said makes me think your entire design
> is predicated on filters only being called to operate on a chunk of
> data, and having no knowledge of the progress of a request.
> Essentially, some Supreme Function calls the filter and says "Here,
> do your stuff with this" and just doesn't call it any more when there's
> no more data.  So each filter knows only about the current chunk
> of data.
> 
> Is that correct?  Because if it is, I consider it lame.  Filters
> should be able to create and destroy sub-pools of r->pool whenever
> they like, and be able to find them easily from call to call.  (Or
> chunk to chunk.)


_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------



Re: filtering patches

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
rbb@covalent.net wrote:
> 
> Pools will not work for filters.  The reason is simple.  They
> don't free memory.  If I try to write 100MB and I allocate it
> from a pool, that 100MB is allocated for the length of the request.
> It can't shrink until I am done with the request.  Pools will work
> if we play games with sub-pools, but that requires that no modules
> destroy the request->pool or any of it's sub_pools.  There is one
> way to free a bucket, it is through the _destroy function.  If it
> is not destroyed, we still have the data.  Using pools is going to
> end up meaning that no filters or generators will destroy_pools
> for fear of ruining somebody else's data later in the chain.

This and something else you said makes me think your entire design
is predicated on filters only being called to operate on a chunk of
data, and having no knowledge of the progress of a request.
Essentially, some Supreme Function calls the filter and says "Here,
do your stuff with this" and just doesn't call it any more when there's
no more data.  So each filter knows only about the current chunk
of data.

Is that correct?  Because if it is, I consider it lame.  Filters
should be able to create and destroy sub-pools of r->pool whenever
they like, and be able to find them easily from call to call.  (Or
chunk to chunk.)
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

Re: filtering patches

Posted by rb...@covalent.net.
I am not going to argue this too much.

Yes, the API's change.  In order to do filtering correctly they have to
change.  Filters need more control of the data then just sending out
streams of data.  If we start with your patch, we will end up with this
patch at some point.  I say that, because your patch will need all of the
new bucket types, as well as all of the functions to combine the bucket
types.  Then, we will need ways to split a single bucket into multiple
buckets.  We are doing more complicated things than just writing
data.  The API is going to get more complex.

As far as their being bugs, I admit that completely.  I am working to fix
the biggest bugs.

Pools will not work for filters.  The reason is simple.  They don't free
memory.  If I try to write 100MB and I allocate it from a pool, that 100MB
is allocated for the length of the request.  It can't shrink until I am
done with the request.  Pools will work if we play games with sub-pools,
but that requires that no modules destroy the request->pool or any of it's
sub_pools.  There is one way to free a bucket, it is through the _destroy
function.  If it is not destroyed, we still have the data.  Using pools is
going to end up meaning that no filters or generators will destroy_pools
for fear of ruining somebody else's data later in the chain.  

I will be fixing things, but the direction of both patches is clear.  Both
patches have bugs, but those can be fixed once one patch is
committed.  It's the direction that is important at this point, not the
actual patch.

Ryan

On Mon, 10 Jul 2000, Greg Stein wrote:

> I've just checked in an update to STATUS noting a veto on the two patches
> that Ryan has posted. I'm not going to go through a serious, in-depth review
> of the patches here; instead, I'll just hit on the high points:
> 
> *) all content generators (Apache's and third party's) must be rewritten to
>    take advantage of this filtering mechanism
> 
> *) these patches do not use BUFF, but attempt to go straight to the network.
>    this loses the buffering, the chunking transfer coding, and the character
>    set translation which BUFF provides.
> 
> *) the use of "rwmem" is unclear; it appears to have a fixed size buffer and
>    requires that all generated content be copied into a holding buffer. if a
>    content-generator creates 100Mb of content, can it do this piecemeal
>    (create/destroy bucket brigades for pieces of the content) or will it get
>    spooled to disk (per a comment in ap_rwmem_write)
> 
> *) minor issues:
>    - the use of malloc() is scary; I do not understand why a pool is not
>      used (scoped to the request, the connection, or a cache)
>    - bugs, such as "newbuf = malloc(sizeof(newbuf))". the server doesn't
>      even operate if these patches are applied (not withstanding the BUFF
>      avoidance mentioned above)
>    - much more complex API (at least a couple dozen new functions)
> 
> 
> ------------------------------
> 
> Conversely:
> 
> *) the patches that I've posted require no changes on the part of today's
>    content generators, nor will it be required in the future.
> 
> *) The patches continue to work with BUFF, which has been (effectively)
>    isolated into a filter; a future pass can turn it into a real filter, can
>    shift translation out (e.g. the mod_xlate that I posted), and can shift
>    chunking out (thus leaving it to simply buffer content for the network).
> 
> *) the output mechanism is the same as today: ap_rwrite() and friends.
>    Filters use ap_fc_write() and friends -- analogues to ap_r*().
> 
> *) no known bugs. the server operates quite fine, and I've tested the
>    patches with several actual filters, performing different types of
>    filtering.
> 
> 
> ------------------------------
> 
> I think it is time that we get some additional commentary and review of the
> patch sets from the other project members. I know that my patch is "final"
> for this pass; Ryan will probably resubmit with some bug fixes.
> 
> The STATUS file has been updated with Message-ID values for the location of
> the different patches, and the mod_xlate demo that I posted. Of course, it
> also has the current record of voting for the three patch sets.
> 
> Please take some time to review these patch sets. Thanx!
> 
> Cheers,
> -g
> 
> -- 
> Greg Stein, http://www.lyra.org/
> 


_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: filtering patches

Posted by Tony Finch <do...@dotat.at>.
Greg Stein <gs...@lyra.org> wrote:
>
>*) these patches do not use BUFF, but attempt to go straight to the network.
>   this loses the buffering, the chunking transfer coding, and the character
>   set translation which BUFF provides.

I think that is the whole point, since all that functionality should be
re-implemented as filters in order to get rid of the horror that is BUFF.

Tony.
-- 
f.a.n.finch    fanf@covalent.net    dot@dotat.at
444 horse-whip strap lather

Re: filtering patches

Posted by rb...@covalent.net.
Since all of the issues listed below have been addressed in e-mail, and
they are all functions of the patch, and not of the design, can we get the
veto removed?

Each of these issues can and will be dealt with in future patches.  I am
trying to keep the current patch small and clean to make it easy to
review.

I could post a second patch later today that allows all content generators
to work with the current patch if that would help to remove the veto.

Ryan

On Mon, 10 Jul 2000, Greg Stein wrote:
> I've just checked in an update to STATUS noting a veto on the two patches
> that Ryan has posted. I'm not going to go through a serious, in-depth review
> of the patches here; instead, I'll just hit on the high points:
> 
> *) all content generators (Apache's and third party's) must be rewritten to
>    take advantage of this filtering mechanism
> 
> *) these patches do not use BUFF, but attempt to go straight to the network.
>    this loses the buffering, the chunking transfer coding, and the character
>    set translation which BUFF provides.
> 
> *) the use of "rwmem" is unclear; it appears to have a fixed size buffer and
>    requires that all generated content be copied into a holding buffer. if a
>    content-generator creates 100Mb of content, can it do this piecemeal
>    (create/destroy bucket brigades for pieces of the content) or will it get
>    spooled to disk (per a comment in ap_rwmem_write)
> 
> *) minor issues:
>    - the use of malloc() is scary; I do not understand why a pool is not
>      used (scoped to the request, the connection, or a cache)
>    - bugs, such as "newbuf = malloc(sizeof(newbuf))". the server doesn't
>      even operate if these patches are applied (not withstanding the BUFF
>      avoidance mentioned above)
>    - much more complex API (at least a couple dozen new functions)

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------