You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Greg Stein <gs...@lyra.org> on 2000/06/28 03:17:01 UTC

summary: issues for my filtering patch

I have posted my patch to STATUS for review. At the moment, Ryan has vetoed
it and provided his technical issues for it. I'm going to summarize those
and my responses. I believe those issues do not exist, and that the veto
should be rescinded.

The patch still requires two/three +1 votes (and no more vetos, of course).
I'm not sure of two/three -- does my vote count? :-)

I'm also not sure what happens if not enough people take the time to review
it and those extra +1 votes do not appear. Is that a pocket veto?

Note: below is just a summary. For detail, see my other note. The intent
here is to summarize what has been offered to date, and summarize why those
issues do not hold. Additionally, my hope is to clarify the purported issues
so that they may be addressed clearly and concisely. Those long emails tend
to hide things.

On to Ryan's issues:

 1) Ryan suggests that some items have been left out.

    => I suggested that he details which items he is referring to. Presuming
       these are the same things that we've talked about before (e.g.
       delaying sending http headers), then I've explained how these items
       can be deferred for future rounds of patches.

 2) the design requires the framework/filters to have unbounded heap usage

    => untrue. demonstrated otherwise.

 3) Roy's buckets are needed -- this solution doesn't go far enough.

    => the framework allows for extended/additional bucket types, but does
       not preclude them. therefore, these are deferred to a future patch
       set. and the term "need" was overstated.

 4) the char* handler is bad for various reasons

    => I showed that each of these "reasons" are issues for filters in
       general and not characteristic of the char* handler. regardless, the
       "reasons" do not preclude the patch, the design, or filters in
       general.

       [ This was a rather extensive section in Ryan's response. I did
	 respond to all of his issues, but they might not be reflected in
	 this summary email. Please look there if you have concerns in this
	 area. ]

 5) my framework precludes mod_auth_digest from computing a digest

    => false. it has extra work (or punt) if the response is non-chunked,
       but these are issues for mod_auth_digest rather than a problem
       inherent in the framework. mod_auth_digest is rather unique in this
       regard due to its need to see the whole entity-body, rather than
       always able to operate in a streaming fashion like most other
       filters.

 6) content generators should change how they output data
 
    => possibly, but that is a separate change for the future. my patch does
       not preclude the types of changes that Ryan suggests

 7) both buckets and char* callbacks introduces bugs

    => absurdly false. the proffered example was way out of line.

 8) need to track pools and their lifetimes

    => false. demonstrated otherwise

 9) the framework demands extra memory and memcpys

    => false. demonstrated otherwise

10) the framework precludes certain optimizations

    => possibly (as with any code), but there has not been a credible
       example of an optimization that is: important/relevant, and can't be
       done later within this framework

11) Content-Length needs to be sent when filters are present

    => detailed to Roy, and in my response to Ryan, that this can be handled
       in a future patch set.

       Note: Content-Length *remains* in the non-filter case. Only when we
       start taking advantage of this new feature, will it disappear. But
       there are examples of how C-L can be put back in for certain types of
       filters and response.

12) subrequests won't work (because of output filter chain issues?)

    => false. demonstrated empirically with my mod_gargle. also demonstrated
       that other subrequest filter chain policies can be specified.

That is it.

AFAICT, there are no surviving issues from Ryan's response to my issue
recap. If some do, then please bring them up so that I can track them and
respond to them.

I would also encourage others to go ahead and review the patch. It is ready
for inclusion, and I believe that it builds a great framework for output
filtering. There have been many suggestions for future optimizations and
improvements. Many have been great, but are not required at this point in
the process -- they can all be performed after this framework is in place.
As we all know, step-wise patches are much better than mother-patches. I
would like to continue with that tradition/process for the filtering
mechanisms.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: summary: issues for my filtering patch

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

Ben Hyde wrote:
> 
> Three votes in additon to yours.

Actually, I think it's only two additional votes needed.  A total
of three +1s, possibly but not necessarily including the submitter.
(There have been times when the submitter was only +0 on the patch,
and it therefore needed three +1s from others.)
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

Re: summary: issues for my filtering patch

Posted by Ben Hyde <bh...@pobox.com>.

Greg Stein <gs...@lyra.org> writes:
> The patch still requires two/three +1 votes (and no more vetos, of course).
> I'm not sure of two/three -- does my vote count? :-)

Three votes in additon to yours.

> I'm also not sure what happens if not enough people take the time to review
> it and those extra +1 votes do not appear. Is that a pocket veto?

Congratulations - I think you may have named that syndrome.

I'd prefer to see a little cooling off period here; three reasons.
Ryan stated he wanted to step back - we should respect.  Some of
the ambvalence about filtering in general may evaporate if as it
becomes clear something might actually appear in the code.  Deciding
if one want's to swallow/embrace this approach requires a little
thinking and hands on involvement.

Thanks to all for keeping the machinery grinding, even if it
occationally makes horrible noises at it works.  I see a lot of
progress made; a lot more common understanding in hand.  I can't
tell yet how if this is a sufficent/good/great seed for filtering
to grow upon going forward.

 - ben

Re: summary: issues for my filtering patch

Posted by Jim Winstead <ji...@trainedmonkey.com>.

On Jun 28, rbb@covalent.net wrote:
> Ideally.  But......  I think more filters will actually accumulate data
> than just pass it through.

I hope you're wrong. :)

> Well, adding a new bucket type should be trivially simple yes.  I believe
> the filters will need to know something about what kind of bucket they are
> creating, but not what kind they are reading.  For example, if a module
> wants to create a new bucket, it needs to know if that bucket is going to
> store a file or a char * or an mmap'ed file.

Obviously a filter needs to know about bucket types it is creating.
I just wonder if other filters ever need to know anything about
the bucket types they receive other than "give me the data in a
format I can use".

There might be some way for some filter to avoid having to load
a bucket containing a file descriptor because it knows how to do
what it needs to do using the file descriptor, for example. I just
can't think of a practical example of such a use off the top of
my head.

Jim

Re: summary: issues for my filtering patch

Posted by rb...@covalent.net.

> > A bucket doesn't grow.  You have lists of buckets, so if you need more
> > memory, you just add another bucket to the end.  So, it was a bit of a
> > misnomer to say the bucket can only get so big, in reality, it is the list
> > of buckets that can only provide so much memory.  The concept is the same
> > though.
> 
> Got it. Ideally, of course, the buckets are getting poured out onto
> the network before they ever accumulate like that. Its only the
> "bad" filters that need to see everything before sending anything
> along.

Ideally.  But......  I think more filters will actually accumulate data
than just pass it through.

> Even so, if your list of buckets has data from a variety of sources
> (some pointers to read-only memory allocated by some library, some
> to read-write memory, perhaps a file pointer or two), do you really
> want to write it all out to a file? How do you determine what is
> too much memory?

Well, no, but we don't need to.  If the data is already in a file on the
system, then the bucket knows that, and we just reference the file instead
of actually re-writing the data.  For example, if the data is already 
a file pointer, then instead of writing the data out to a file, we
use that in our bucket list:

So, if the original bucket list is:

    file pointer1 -->  read/write memory -->  file pointer2

The new bucket list becomes:

    file pointer1 -->  file pointer3 --> file pointer 2

As far as how much memory is too much, I really don't know.  Roy said
8K.  I would think this would need to be a compile time option, so that we
can experiment.  :-)

> > My impression after reviewing the patch is that Greg's patch has two
> > different filters.  One uses a char * to pass data.
> 
> My impression is that the char* version is just for people who want
> to be able to say "just give me raw char* data (that I can write
> to?) so I don't have to call bucket->get_data()".

That's not my impression from the code.  The where to write is whatever
the module wants to use, because he is using ap_l(write|put|...), so the
module could allocate more space for the data, or it could use what it was
passed (not if it wants to modify the data though).  The buckets are only
created in one of two instances:

1)  A module allocates the bucket, and it is passed down to the next
filter using ap_lsend_bucket (the function name may not be right).

2)  A module implements a bucket filter, and a char * was passed from the
previous filter.  In this case the core allocates a bucket for the filter.

This means that each modules MUST copy the data if it wants to modify
it.  Or if it wants to save it off to the side.  At least that is how I am
reading the code.

> I believe that in Greg's patch, the data being passed between layers
> is still a bucket, its just that some layers ask to get the data
> converted to something simpler before they touch it.

Again, that's not how I read it.  In fact, I read it just the
opposite.  The data is always a char *, but we'll wrap it up for you if
you really want us to.  These may seem similar, but they really aren't.

If the data is always in a bucket, then that implies that we know
something about the kind of memory used to store it.  We can't just lose
that information because we feel like it.  The information about the
memory that we store in the bucket is vitally important.

> Ideally, adding another type is as simple as allocating a ap_bucket_t
> and setting a few function pointers, right? You certainly don't want
> filters to have to know anything about the different bucket types
> (unless they choose to know, I guess).

Well, adding a new bucket type should be trivially simple yes.  I believe
the filters will need to know something about what kind of bucket they are
creating, but not what kind they are reading.  For example, if a module
wants to create a new bucket, it needs to know if that bucket is going to
store a file or a char * or an mmap'ed file.

Once a filter has been passed a bucket, then all buckets should
essentially look the same to the filter.

As far as implementing a new bucket type.  I expect it will be a matter of
creating a ap_create_bucket_foo function that sets up the correct
pointers.  It should be a five or ten line function, so it would be
trivial yes.

> Cool. I'm certainly not trying to pressure anyone to get something
> done *right now*. I thought if I asked enough dumb questions I
> wouldn't be the only one enlightened by the answers. :)

You are helping to enlighten everyone, and you are helping me to sharpen
my thoughts on this topic.  The more questions you ask, the clearer the
patch becomes in my mind.  I hope that in answering your questions I have
also helped to explain where my problems are with the current patch.  If
not, please keep asking.

> > Yes, I agree.  However starting down the wrong path can often make it
> > harder to find the right one.  This patch is IMO the wrong path.  I have
> > asked for proof that I am wrong and this morning I detailed exactly
> > what I need to see to be convinced.  The proof should be relatively easy
> > to create if I am wrong, and then I will step aside and let the patch go
> > in.
> 
> Also reasonable. I just wanted to make sure we knew we weren't
> looking for the perfect solution before we took *any* steps forward.

I do not want perfection, I just want the correct design/patch.  I expect
that my next patch will offer read/write memory buckets only.  This will
without a doubt not be a perfect patch.  It will however be in my mind the
correct design.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: summary: issues for my filtering patch

Posted by Jim Winstead <ji...@trainedmonkey.com>.

On Jun 28, rbb@covalent.net wrote:
> 
> > > Basically, the core will provide buckets, which can get so big.  If the
> > > buckets get too big, the buckets themselves flush out to the disk, and
> > > shrink.
> > 
> > This is probably a dumb question, but how does a bucket grow?
> 
> A bucket doesn't grow.  You have lists of buckets, so if you need more
> memory, you just add another bucket to the end.  So, it was a bit of a
> misnomer to say the bucket can only get so big, in reality, it is the list
> of buckets that can only provide so much memory.  The concept is the same
> though.

Got it. Ideally, of course, the buckets are getting poured out onto
the network before they ever accumulate like that. Its only the
"bad" filters that need to see everything before sending anything
along.

Even so, if your list of buckets has data from a variety of sources
(some pointers to read-only memory allocated by some library, some
to read-write memory, perhaps a file pointer or two), do you really
want to write it all out to a file? How do you determine what is
too much memory?

> > But maybe Greg's patch is further away from the ADT-style implementation
> > of buckets than I'm assuming.
> 
> My impression after reviewing the patch is that Greg's patch has two
> different filters.  One uses a char * to pass data.

My impression is that the char* version is just for people who want
to be able to say "just give me raw char* data (that I can write
to?) so I don't have to call bucket->get_data()".

I believe that in Greg's patch, the data being passed between layers
is still a bucket, its just that some layers ask to get the data
converted to something simpler before they touch it.

Maybe it is a premature optimization.

>  This means there is
> no ADT at all.  The other uses an ADT, but the ADT is not very advanced
> IMHO.  The ability to implement a filter without using ANY ADT is IMHO a
> bad idea.  This is where Greg and I disagree.  I think the filters need to
> require the use of the ADT, and that while the initial ADT may be minimal,
> like just read/write memory, it needs to be obvious how to add more types
> to the back end.  When I say obvious, it would be enough to have a comment
> in the code that says "To add another type to the ADT do the following..."

Ideally, adding another type is as simple as allocating a ap_bucket_t
and setting a few function pointers, right? You certainly don't want
filters to have to know anything about the different bucket types
(unless they choose to know, I guess).

> If you look at Dean's ap_buf.c file, he has gone 90% of the way towards
> implementing the read/write memory bucket.  I have actually finished that
> code, so it compiles.  I am slowly working through Greg's patch and my
> previous patches to bring them together into one patch that uses
> buckets.  This patch will take a while to create, because I am not
> focusing on it.  I have stepped away from filtering until at least
> Thursday so that when I step back towards it, I have a little different
> perspective.

Cool. I'm certainly not trying to pressure anyone to get something
done *right now*. I thought if I asked enough dumb questions I
wouldn't be the only one enlightened by the answers. :)

> But, putting a patch in that doesn't use ADT's for filtering doesn't let
> people dig in.  It gives people something to play with that will not look
> like the final result IMO.

Fair enough.

> > You can only imagine so many scenarios and draw so much on a white
> > board to solve them. Sometimes it is tremendously useful to actually
> > start down the path to encounter the real roadblocks.
> 
> Yes, I agree.  However starting down the wrong path can often make it
> harder to find the right one.  This patch is IMO the wrong path.  I have
> asked for proof that I am wrong and this morning I detailed exactly
> what I need to see to be convinced.  The proof should be relatively easy
> to create if I am wrong, and then I will step aside and let the patch go
> in.

Also reasonable. I just wanted to make sure we knew we weren't
looking for the perfect solution before we took *any* steps forward.

Jim

Re: summary: issues for my filtering patch

Posted by rb...@covalent.net.

> > Basically, the core will provide buckets, which can get so big.  If the
> > buckets get too big, the buckets themselves flush out to the disk, and
> > shrink.
> 
> This is probably a dumb question, but how does a bucket grow?

A bucket doesn't grow.  You have lists of buckets, so if you need more
memory, you just add another bucket to the end.  So, it was a bit of a
misnomer to say the bucket can only get so big, in reality, it is the list
of buckets that can only provide so much memory.  The concept is the same
though.

> > Basically, what I was getting at was that if the first filter that needs a
> > copy of the data makes a copy and marks it as modify-able memory, then no
> > other filter in the stack needs to make a copy, because it can just modify
> > it in place.
> 
> That makes perfect sense. It also seems like that should be relatively
> trivial to add. (Buckets just need to implement a "give me the
> pointer to a modifiable version of the contents" method, and
> depending how the bucket was allocated, the right magic can happen.

Yes.

> Just like a bucket may need to register a cleanup function for
> itself that knows how to free any resources it is using, like
> calling mysql_result_free.)

Yes.  If you look at Dean's original code, each bucket has a free function
associated with it.

> But maybe Greg's patch is further away from the ADT-style implementation
> of buckets than I'm assuming.

My impression after reviewing the patch is that Greg's patch has two
different filters.  One uses a char * to pass data.  This means there is
no ADT at all.  The other uses an ADT, but the ADT is not very advanced
IMHO.  The ability to implement a filter without using ANY ADT is IMHO a
bad idea.  This is where Greg and I disagree.  I think the filters need to
require the use of the ADT, and that while the initial ADT may be minimal,
like just read/write memory, it needs to be obvious how to add more types
to the back end.  When I say obvious, it would be enough to have a comment
in the code that says "To add another type to the ADT do the following..."

If you look at Dean's ap_buf.c file, he has gone 90% of the way towards
implementing the read/write memory bucket.  I have actually finished that
code, so it compiles.  I am slowly working through Greg's patch and my
previous patches to bring them together into one patch that uses
buckets.  This patch will take a while to create, because I am not
focusing on it.  I have stepped away from filtering until at least
Thursday so that when I step back towards it, I have a little different
perspective.

> I'd rather have every filter implement its own buffering than not
> have filtering. But I don't think that anyone is saying that
> providing a way for that buffering to happen in a standard way is
> not highly desirable (and even a prerequisite for releasing 2.0 if
> filtering is a part of 2.0), just that it doesn't have to be a
> prerequisite for the filtering work to start going into alpha and
> beta releases so people will really dig into it.

But, putting a patch in that doesn't use ADT's for filtering doesn't let
people dig in.  It gives people something to play with that will not look
like the final result IMO.

> You can only imagine so many scenarios and draw so much on a white
> board to solve them. Sometimes it is tremendously useful to actually
> start down the path to encounter the real roadblocks.

Yes, I agree.  However starting down the wrong path can often make it
harder to find the right one.  This patch is IMO the wrong path.  I have
asked for proof that I am wrong and this morning I detailed exactly
what I need to see to be convinced.  The proof should be relatively easy
to create if I am wrong, and then I will step aside and let the patch go
in.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: summary: issues for my filtering patch

Posted by Jim Winstead <ji...@trainedmonkey.com>.

On Jun 28, rbb@covalent.net wrote:
> 
> > > 2)  The core imposes some sane restrictions on how much of the request is
> > > in memory by controlling the buffering.
> > > 
> > > Greg is proposing #1, I am proposing #2 (after much prompting and head
> > > beating from Roy).  My feeling, after having looked at this problem for a
> > > VERY long time, is that trying to just fit option 2 in after the current
> > > patch is applied is almost impossible.
> > 
> > What are the restrictions imposed by the core in this scheme?
> 
> Basically, the core will provide buckets, which can get so big.  If the
> buckets get too big, the buckets themselves flush out to the disk, and
> shrink.

This is probably a dumb question, but how does a bucket grow?

> Allowing people to hang themselves, and not providing a way to prevent
> them from hanging themselves are two VERY different things.  Yes, if
> somebody wants to write a filter that takes down the server, then they
> will.  If we don't provide a sane bucket scheme, then ANYBODY who writes a
> filter has a very good chance of taking down the server.

Agreed. I just don't want to see anyone slip into the mentality
"this must be perfect in all situations" before the patch gets
commited and people really dig into using it.

> Defining the life cycle of the bucket to end when the bucket is sent to
> the network doesn't have anything to do with single or zero copy.  The
> problem is that in the current patch, no information about the memory is
> passed with the memory.  This means that each filter must assume the
> memory cannot be touched.

Got it. This was the piece I was missing.

> Basically, what I was getting at was that if the first filter that needs a
> copy of the data makes a copy and marks it as modify-able memory, then no
> other filter in the stack needs to make a copy, because it can just modify
> it in place.

That makes perfect sense. It also seems like that should be relatively
trivial to add. (Buckets just need to implement a "give me the
pointer to a modifiable version of the contents" method, and
depending how the bucket was allocated, the right magic can happen.
Just like a bucket may need to register a cleanup function for
itself that knows how to free any resources it is using, like
calling mysql_result_free.)

But maybe Greg's patch is further away from the ADT-style implementation
of buckets than I'm assuming.

> PHP can add it's own buffering in, there is nothing to stop that and there
> is no reason to.  What we want to keep from happening, is to make EVERY
> SINGLE filter implement their own buffering.

I'd rather have every filter implement its own buffering than not
have filtering. But I don't think that anyone is saying that
providing a way for that buffering to happen in a standard way is
not highly desirable (and even a prerequisite for releasing 2.0 if
filtering is a part of 2.0), just that it doesn't have to be a
prerequisite for the filtering work to start going into alpha and
beta releases so people will really dig into it.

You can only imagine so many scenarios and draw so much on a white
board to solve them. Sometimes it is tremendously useful to actually
start down the path to encounter the real roadblocks.

Jim

Re: summary: issues for my filtering patch

Posted by rb...@covalent.net.

> > 2)  The core imposes some sane restrictions on how much of the request is
> > in memory by controlling the buffering.
> > 
> > Greg is proposing #1, I am proposing #2 (after much prompting and head
> > beating from Roy).  My feeling, after having looked at this problem for a
> > VERY long time, is that trying to just fit option 2 in after the current
> > patch is applied is almost impossible.
> 
> What are the restrictions imposed by the core in this scheme?

Basically, the core will provide buckets, which can get so big.  If the
buckets get too big, the buckets themselves flush out to the disk, and
shrink.

> > Using option #1 is not a good idea IMNSHO, because it allows stupid
> > filters to take down the server.
> 
> I don't see how you're ever going to prevent that. People always
> find ways to do stupid things. :)

Allowing people to hang themselves, and not providing a way to prevent
them from hanging themselves are two VERY different things.  Yes, if
somebody wants to write a filter that takes down the server, then they
will.  If we don't provide a sane bucket scheme, then ANYBODY who writes a
filter has a very good chance of taking down the server.

> > By having a top level filter do the buffering it requires, we can actually
> > get single or zero copy in.
> 
> It sounds to me like that is resolved just by defining the memory
> life-cycle of a "bucket" to always end once it is written to the
> network.

Defining the life cycle of the bucket to end when the bucket is sent to
the network doesn't have anything to do with single or zero copy.  The
problem is that in the current patch, no information about the memory is
passed with the memory.  This means that each filter must assume the
memory cannot be touched.

By passing information about the memory around with the memory, filters
can determine whether they can modify the data in place, or if they have
to copy it over in order to modify it.

In the current patch, any module that wants to modify the data must
allocate more memory (whether that be on the stack or heap it must be
alloacted).  This is because you just don't know where the memory came
from, or what somebody else is doing to it.

> If some intervening layer needs to make a copy (to store for later
> use, or to move it to an area it can change), I don't see how you
> can avoid that.

What you can avoid is the _need_ to move it to an area it can
change.  Whether the data is modify-able is information that can be passed
around with the memory.  Take a look at Roy's bucket brigades post, and
Dean's original implementation of ap_buf.[ch].  When I really started
reading those two things, everything seemed to come together for me.

> I'm a little confused as to what you mean by "top level filter".
> Is that closest to the network, or some distance away?

I consider the filters to be a stack.  The top level is the first filter
the data passes through.  the bottom filter is the one that writes out to
the network.

Basically, what I was getting at was that if the first filter that needs a
copy of the data makes a copy and marks it as modify-able memory, then no
other filter in the stack needs to make a copy, because it can just modify
it in place.

> > Again, it is how the buffering is done that is important.  Asking each
> > module to implement their own filtering is asking for trouble.
> 
> Some modules (like PHP) are probably going to implement their own
> buffering (which is what I assume you meant to say) regardless of
> what the core does or provides. It doesn't seem like a good reason
> to hold back filtering to me.

PHP can add it's own buffering in, there is nothing to stop that and there
is no reason to.  What we want to keep from happening, is to make EVERY
SINGLE filter implement their own buffering.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: summary: issues for my filtering patch

Posted by Jim Winstead <ji...@trainedmonkey.com>.

On Jun 28, rbb@covalent.net wrote:
> 2)  The core imposes some sane restrictions on how much of the request is
> in memory by controlling the buffering.
> 
> Greg is proposing #1, I am proposing #2 (after much prompting and head
> beating from Roy).  My feeling, after having looked at this problem for a
> VERY long time, is that trying to just fit option 2 in after the current
> patch is applied is almost impossible.

What are the restrictions imposed by the core in this scheme?

> Using option #1 is not a good idea IMNSHO, because it allows stupid
> filters to take down the server.

I don't see how you're ever going to prevent that. People always
find ways to do stupid things. :)

> > that says "hey, here's the data to send, but don't send it until
> > I tell you its okay, because I'm going to be adding a header". But
> > all you've gained in that case is the ability for the later layers
> > to start processing the data instead of having to wait for the
> > layer that thinks it needs to see all of its input before it can
> > send on its output. (Which it will presumably do by sending all
> > the accumulated data at once, so each later layer gets called
> > once and doesn't have to do any buffering.)
> 
> By having a top level filter do the buffering it requires, we can actually
> get single or zero copy in.

It sounds to me like that is resolved just by defining the memory
life-cycle of a "bucket" to always end once it is written to the
network.

If some intervening layer needs to make a copy (to store for later
use, or to move it to an area it can change), I don't see how you
can avoid that.

I'm a little confused as to what you mean by "top level filter".
Is that closest to the network, or some distance away?

> > But it also just seems like an optimization. Like I said, PHP
> > didn't have any sort of output buffering until the latest version,
> > and although there was some grousing from users, they generally
> > accepted the limitation of not being able to add headers after
> > non-header data had been sent.
> 
> Again, it is how the buffering is done that is important.  Asking each
> module to implement their own filtering is asking for trouble.

Some modules (like PHP) are probably going to implement their own
buffering (which is what I assume you meant to say) regardless of
what the core does or provides. It doesn't seem like a good reason
to hold back filtering to me.

Jim

Re: summary: issues for my filtering patch

Posted by rb...@covalent.net.

> > ANY module that wants or needs to modify a header MUST cache the entire
> > request until it knows that it is done modifying headers.  This means that
> > I believe mod_auth_digest will need to cache the entire request, because
> > it needs to modify a header to have the md5sum.  Any module that needs to
> > modify the content length needs to cache the entire request.
> 
> Of course, this is only true assuming you can't really issue HTTP/1.1
> responses (with footers and chunked encoding).
>
> And if a filter happens to know up-front what its effect on
> content-length will be (like it will be inserting a header with a
> fixed size), it doesn't have to cache the request. (And if it simply
> decides it would rather just filter out the content-length so it
> doesn't have to cache the response, that seems perfectly valid.)
>
> I actually think (and hope) that the cases where a filter needs to
> act as a bottleneck on the response are few and far between.
> There's almost a danger in making it too easy to become a bottleneck.

It is easy enough to come up with examples of where a module isn't
going to become a bottleneck.  The fact is, anytime the entire request can
be sent as one string, no module will become a bottleneck.  The challenge
here, is to make it easy for modules to become the bottleneck.  Never
though I would say that.  :-)

There is danger is in making it too hard for modules to become a
bottleneck, because then the server doesn't respond correctly.  We need to
acheive some sort of balance.  This is where the real challenge lies.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: summary: issues for my filtering patch

Posted by Jim Winstead <ji...@trainedmonkey.com>.

On Jun 28, rbb@covalent.net wrote:
> ANY module that wants or needs to modify a header MUST cache the entire
> request until it knows that it is done modifying headers.  This means that
> I believe mod_auth_digest will need to cache the entire request, because
> it needs to modify a header to have the md5sum.  Any module that needs to
> modify the content length needs to cache the entire request.

Of course, this is only true assuming you can't really issue HTTP/1.1
responses (with footers and chunked encoding).

And if a filter happens to know up-front what its effect on
content-length will be (like it will be inserting a header with a
fixed size), it doesn't have to cache the request. (And if it simply
decides it would rather just filter out the content-length so it
doesn't have to cache the response, that seems perfectly valid.)

I actually think (and hope) that the cases where a filter needs to
act as a bottleneck on the response are few and far between.
There's almost a danger in making it too easy to become a bottleneck.

Jim

Re: challenge requirements

Posted by Greg Stein <gs...@lyra.org>.

Okay... I haven't got to this completely. I went out to dinner and had many
drinks :-) ... The "set this aside" is easy; I've been spending most of the
time figuring out the proper strategy for spilling a too-large request off
to a file.

On Wed, Jun 28, 2000 at 03:04:27PM -0700, rbb@covalent.net wrote:
>...
> > IOW, is it fair to state that the current patch can go in *without* this
> > particular module/support structure? That simply showing that your challenge
> > is possible, will validate that the current patch can support it?
> 
> Maybe.  If you design ten new structures that are pretty complex, no.  If
> you design two or three obvious structures to support this, then of course
> this doesn't need to be rolled into the current patch.  Basically what I
> am saying is that if supporting this means changing the current patch
> drastically, then the current patch isn't the right patch.  Make
> sense?  do you agree?

Yes, yes.

I have defined two new structures and a bucket type at this point. There
will be a few new functions, but I haven't moved passed the data structures
yet (cuz of booze :-) and designing the spill strategy).

I might add a third structure for the spill mechanism (rather than sticking
with two, and overloading the meaning of one of those).

>...
> > This is just a quick read through. I may come back when I dig in.
> 
> Please come back as much as you need to.  Fair warning.  I have guests
> tonight.  I am stopping work after they have toured the city.  Once I stop
> for the day, I'm unlikely to check e-mail until tomorrow morning.  I'll
> answer your questions as soon as I get them though.

hehe... it ends up that I didn't get to this today/tonite either :-)

> > > 2)  No memory can be allocated out of r->pool.  If it is allocated out of
> > > r->pool, then the memory must be allocated for the length of the
> > > request.  Sub-pools are okay, but I will challenge them, and I will want a
> > > very good piece of psuedo-code to explain why the sub-pool won't require
> > > copying.
> > 
> > Agreed.
> > 
> > Note that r->pool and/or a subpool may be used to make a one-time allocation
> > of a fixed-size value (i.e. not proportional to the response size). This
> > should be okay. Agree?
> 
> I think so, but exactly what the buffer is used for may invalidate
> that.  I know I am being unclear.  I am sorry about that.  Tell you
> what, assume I agree with this.  If it turns out the memory allocation is
> "bad", then I will be required to give both great docs and incredibly
> psuedo-code or real code to prove it.  Sound fair?

Yup. I'm not worried about allocs at this point. I'm pondering on a clean
spill strategy and structure. Specifically, if a file descriptor comes down
the pipe, do you copy its contents into your spill file, or do you retain
the file as-is? The latter is obviously most efficient, so then what do you
do when you have a sequence of MEM-FD-MEM-FD-MEM-FD-MEM? How do you get each
of those mem chunks into the spill file, yet retain the interleaved files?
:-) ... I'll solve it cleanly, but it isn't a straight-forward problem :-)

>...
> > > > If I provide the structures and detailed pseudo-code, will that be
> > > > sufficient? I can do this today.
> > > 
> > > The more actual compile-able code there is, the better off we are.  One of
> > > the arguments you have been making is that your patch can do 99% of this
> > > today, as is.
> > 
> > Um. Then I was unclear. If I used the word "can", then I meant it in the
> > future tense (i.e. "it can do that [with changes]"). Let's rephrase as "my
> > patch could do this with straight-forward extensions."
> > 
> > Your "challenge" is in response to that statement :-). Fair enough.
> 
> My challenge is that if the current patch requires extensive changes to
> make it do what we are talking about, the patch isn't "the right
> solution".  If the changes are straight forward, and don't require months
> to make, then I think that is fine.

It won't be months. Should happen tomorrow. I've got a lunch, but I'll
complete the code then.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: challenge requirements (was: summary: issues for my filtering patch)

Posted by rb...@covalent.net.

> > I would much prefer an actual module that actually compiles and caches the
> > response.  If it doesn't actually change the headers that are sent, but
> > does make the ap_set_header call, then I will be fine with that.
> 
> I'll see what I can do. If it starts to add too much to the current patch,
> then I'll come back. Again: I'd like to keep the original patch slim for
> wider reviewability, your requested module as a future change, and the
> support for that module another future change.

I agree, keep the current patch as simple as possible.  

> IOW, is it fair to state that the current patch can go in *without* this
> particular module/support structure? That simply showing that your challenge
> is possible, will validate that the current patch can support it?

Maybe.  If you design ten new structures that are pretty complex, no.  If
you design two or three obvious structures to support this, then of course
this doesn't need to be rolled into the current patch.  Basically what I
am saying is that if supporting this means changing the current patch
drastically, then the current patch isn't the right patch.  Make
sense?  do you agree?

> > The bucket brigade scheme does not force ap_rwrite(), ap_rputs to allocate
> > any memory, but for the time being, let's ignore that.
> 
> Any scheme does since the char* passed to ap_rwrite() has an unknown
> lifetime. It *must* be copied *if* you want to change its lifetime.

Ahhhh.  But, the bucket brigades don't always change an object's
lifetime.  When dealing with a char *, ap_rwrite/ap_rputs/... will either
allocate memory or not, based on which type of bucket is being created.  I
should have specified that.

> The bucket brigade can say "here is a string, it will die when you return."
> Your request here is to *also* allow the framework to say "here is a string,
> keep it if you'd like... it will survive until you say otherwise."
> 
> Agreed?

Essentially.  The way I see the bucket brigades, they say "Here's a
string, it will die when you no longer need it, don't worry about
how."  The framework then makes that happen.  I see how this is done, but
trying to explain it without code would be hard.  I'll be starting the new
patch sometime later this week/early next week.  I am not coding filtering
for a few days.  :-)

BTW, the bucket brigades that I am talking about here are my
interpretation of what Roy is talking about.  I may not see the same thing
he does.  I think we all know communicating through e-mail isn't always
the clearest thing in the world.

> This is just a quick read through. I may come back when I dig in.

Please come back as much as you need to.  Fair warning.  I have guests
tonight.  I am stopping work after they have toured the city.  Once I stop
for the day, I'm unlikely to check e-mail until tomorrow morning.  I'll
answer your questions as soon as I get them though.

> > 2)  No memory can be allocated out of r->pool.  If it is allocated out of
> > r->pool, then the memory must be allocated for the length of the
> > request.  Sub-pools are okay, but I will challenge them, and I will want a
> > very good piece of psuedo-code to explain why the sub-pool won't require
> > copying.
> 
> Agreed.
> 
> Note that r->pool and/or a subpool may be used to make a one-time allocation
> of a fixed-size value (i.e. not proportional to the response size). This
> should be okay. Agree?

I think so, but exactly what the buffer is used for may invalidate
that.  I know I am being unclear.  I am sorry about that.  Tell you
what, assume I agree with this.  If it turns out the memory allocation is
"bad", then I will be required to give both great docs and incredibly
psuedo-code or real code to prove it.  Sound fair?

> > 3)  It is assumed that the filter is the first filter in a chain of
> > filters.  ANY of the subsequent filters may also need to modify 
> > headers.  ANY of the subsequent filters may be either bucket or char *
> > type filters.
> 
> Agreed. I had assumed that.
> 
> Note that the subsequent filters are somewhat moot since this first filter
> is going to be holding back all the content until it is ready. When it
> finally releases it, then the next guys can do what they will.

Yes, BUT the next guy may want to actually modify the data.  If the next
filter wants to modify the data, it shouldn't have to copy the data to a
new buffer, and it should only allocate more memory if it wants to change
the size of the data or the current memory can't be modified.  Subsequent
filters are important to the design IMO, but they can be ignored most of
the time.

> > 4)  It would be nice although not required if this module was written
> > twice, once as a bucket type and once as a char *.  Because the two
> > filters have different purposes, this may not feasable.  I can understand
> > this.
> 
> The char* does not support this. You would not use a char* style of filter
> if you need to hold back the entire response.

OKAY.  I had thought so, but I thought I would check to be sure.  This is
why it was "gee that would be nice" not "you must do this to remove my
veto".  :-)

> > 5)  The filter doesn't have to actually modify anything, so the module
> > itself cannot allocate any memory to hold strings, except for structures
> > used to store the strings.
> 
> See (2). Some fixed amount of memory may be allocated.

Fine.

> > > If I provide the structures and detailed pseudo-code, will that be
> > > sufficient? I can do this today.
> > 
> > The more actual compile-able code there is, the better off we are.  One of
> > the arguments you have been making is that your patch can do 99% of this
> > today, as is.
> 
> Um. Then I was unclear. If I used the word "can", then I meant it in the
> future tense (i.e. "it can do that [with changes]"). Let's rephrase as "my
> patch could do this with straight-forward extensions."
> 
> Your "challenge" is in response to that statement :-). Fair enough.

My challenge is that if the current patch requires extensive changes to
make it do what we are talking about, the patch isn't "the right
solution".  If the changes are straight forward, and don't require months
to make, then I think that is fine.

> > In the end, my conditions may not be explained in enough detail.  I have
> > tried to think of everything, but I probably missed something.
> 
> That's all right. I understand the problem space ("hold the entire response,
> modify a header, release the response; zero copying occurs within the
> filter"). I'll write code to solve *that* problem. If that doesn't agree
> with you, then we can iterate.

GREAT!  Wait a second Greg, we are making progress again.  :-)

> Absolutely fair. Thanx.

Ain't it great when things just work.  :-)

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

challenge requirements (was: summary: issues for my filtering patch)

Posted by Greg Stein <gs...@lyra.org>.

On Wed, Jun 28, 2000 at 02:11:11PM -0700, rbb@covalent.net wrote:
>...
> I would much prefer an actual module that actually compiles and caches the
> response.  If it doesn't actually change the headers that are sent, but
> does make the ap_set_header call, then I will be fine with that.

I'll see what I can do. If it starts to add too much to the current patch,
then I'll come back. Again: I'd like to keep the original patch slim for
wider reviewability, your requested module as a future change, and the
support for that module another future change.

IOW, is it fair to state that the current patch can go in *without* this
particular module/support structure? That simply showing that your challenge
is possible, will validate that the current patch can support it?

> The bucket brigade scheme does not force ap_rwrite(), ap_rputs to allocate
> any memory, but for the time being, let's ignore that.

Any scheme does since the char* passed to ap_rwrite() has an unknown
lifetime. It *must* be copied *if* you want to change its lifetime.

The bucket brigade can say "here is a string, it will die when you return."
Your request here is to *also* allow the framework to say "here is a string,
keep it if you'd like... it will survive until you say otherwise."

Agreed?

>...
> The criteria:

This is just a quick read through. I may come back when I dig in.

> 1)  The entire response MUST be help by the module at one time.  I don't
> care how many calls the module makes to actually write out the data, but
> the whole response must be accessable by the filter at one time.  This
> must hold true whether the request is 500 bytes or 500 MB.  (I think it
> goes without saying that if your module tries to keep 500 MB in memory at
> one time, the module doesn't meet my requirements.  :-)

No problem.

> 2)  No memory can be allocated out of r->pool.  If it is allocated out of
> r->pool, then the memory must be allocated for the length of the
> request.  Sub-pools are okay, but I will challenge them, and I will want a
> very good piece of psuedo-code to explain why the sub-pool won't require
> copying.

Agreed.

Note that r->pool and/or a subpool may be used to make a one-time allocation
of a fixed-size value (i.e. not proportional to the response size). This
should be okay. Agree?

> 3)  It is assumed that the filter is the first filter in a chain of
> filters.  ANY of the subsequent filters may also need to modify
> headers.  ANY of the subsequent filters may be either bucket or char *
> type filters.

Agreed. I had assumed that.

Note that the subsequent filters are somewhat moot since this first filter
is going to be holding back all the content until it is ready. When it
finally releases it, then the next guys can do what they will.

> 4)  It would be nice although not required if this module was written
> twice, once as a bucket type and once as a char *.  Because the two
> filters have different purposes, this may not feasable.  I can understand
> this.

The char* does not support this. You would not use a char* style of filter
if you need to hold back the entire response.

> 5)  The filter doesn't have to actually modify anything, so the module
> itself cannot allocate any memory to hold strings, except for structures
> used to store the strings.

See (2). Some fixed amount of memory may be allocated.

> 6)  The module cannot copy ANY data that it doesn't not
> modify.  Basically, what I am trying to protect against here, is having
> ap_rwrite copy the data into a buffer, and then the module copies it off
> to the side so that ap_rwrite can use the same buffer for the next
> chunk.  It is relatively easy to prove that this design requires that each
> modules copy all of the data.

Agreed. No worries: the module won't copy data.

> > If I provide the structures and detailed pseudo-code, will that be
> > sufficient? I can do this today.
> 
> The more actual compile-able code there is, the better off we are.  One of
> the arguments you have been making is that your patch can do 99% of this
> today, as is.

Um. Then I was unclear. If I used the word "can", then I meant it in the
future tense (i.e. "it can do that [with changes]"). Let's rephrase as "my
patch could do this with straight-forward extensions."

Your "challenge" is in response to that statement :-). Fair enough.

> I can understand that the headers that are actually sent to
> the client will not be modified.  I can understand that it will not be
> optimized.  If there are structures that are required but not implemented
> yet, the more detail there is the better.

No problem.

> In the end, my conditions may not be explained in enough detail.  I have
> tried to think of everything, but I probably missed something.

That's all right. I understand the problem space ("hold the entire response,
modify a header, release the response; zero copying occurs within the
filter"). I'll write code to solve *that* problem. If that doesn't agree
with you, then we can iterate.

> I will of course try to be fair when reading through the module, and I
> will try not to make up new requirements, but it is VERY likely that I
> will apply the current requirements in strange ways.  If I do, I will
> endevour to explain why I have done so.  If new requirements are brought
> up during the review, I will explain exactly why they are issues.

Fair enough.

> I believe I see how to write a module that meets all of these requirements
> with the bucket brigades design.

I do, too. I also see how to do it with my current patch. :-)

> This will just remove my veto.  I will not give this patch a +1, I will
> give it a +0, and step out of the way.  I think that is fair.  Greg, this
> is the best that I can do.

Absolutely fair. Thanx.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: summary: issues for my filtering patch

Posted by rb...@covalent.net.

> I can definitely demonstrate this.
> 
> It will be easier to outline the structures and the pseudo-code than to
> truly fold it into the patch. Why? Simply because it also requires the
> delay-header stuff, which I would like to save for a second pass (e.g. check
> in patch one, then take your delayed-header stuff and check that in for a
> second pass).
> 
> Second: since ap_rwrite(), ap_rputs(), etc, do not allow for "passing
> memory" along, these will need to copy data regardless. The API that a
> content-generator uses for allowing the zero-copy set-aside behavior would
> be new. Is that okay?

I would much prefer an actual module that actually compiles and caches the
response.  If it doesn't actually change the headers that are sent, but
does make the ap_set_header call, then I will be fine with that.

The bucket brigade scheme does not force ap_rwrite(), ap_rputs to allocate
any memory, but for the time being, let's ignore that.

I will be VERY specific with what this module must do to meet my
criteria.  NOTE, these are MY criteria, that _I_ need to remove my
veto.  If they don't make sense, ask.  I will try my best to explain why
they are important to me.  If they still don't make sense, we can continue
to discuss it.

The criteria:

1)  The entire response MUST be help by the module at one time.  I don't
care how many calls the module makes to actually write out the data, but
the whole response must be accessable by the filter at one time.  This
must hold true whether the request is 500 bytes or 500 MB.  (I think it
goes without saying that if your module tries to keep 500 MB in memory at
one time, the module doesn't meet my requirements.  :-)

2)  No memory can be allocated out of r->pool.  If it is allocated out of
r->pool, then the memory must be allocated for the length of the
request.  Sub-pools are okay, but I will challenge them, and I will want a
very good piece of psuedo-code to explain why the sub-pool won't require
copying.

3)  It is assumed that the filter is the first filter in a chain of
filters.  ANY of the subsequent filters may also need to modify
headers.  ANY of the subsequent filters may be either bucket or char *
type filters.

4)  It would be nice although not required if this module was written
twice, once as a bucket type and once as a char *.  Because the two
filters have different purposes, this may not feasable.  I can understand
this.

5)  The filter doesn't have to actually modify anything, so the module
itself cannot allocate any memory to hold strings, except for structures
used to store the strings.

6)  The module cannot copy ANY data that it doesn't not
modify.  Basically, what I am trying to protect against here, is having
ap_rwrite copy the data into a buffer, and then the module copies it off
to the side so that ap_rwrite can use the same buffer for the next
chunk.  It is relatively easy to prove that this design requires that each
modules copy all of the data.

> If I provide the structures and detailed pseudo-code, will that be
> sufficient? I can do this today.

The more actual compile-able code there is, the better off we are.  One of
the arguments you have been making is that your patch can do 99% of this
today, as is.  I can understand that the headers that are actually sent to
the client will not be modified.  I can understand that it will not be
optimized.  If there are structures that are required but not implemented
yet, the more detail there is the better.

In the end, my conditions may not be explained in enough detail.  I have
tried to think of everything, but I probably missed something.

I will of course try to be fair when reading through the module, and I
will try not to make up new requirements, but it is VERY likely that I
will apply the current requirements in strange ways.  If I do, I will
endevour to explain why I have done so.  If new requirements are brought
up during the review, I will explain exactly why they are issues.

I believe I see how to write a module that meets all of these requirements
with the bucket brigades design.

This will just remove my veto.  I will not give this patch a +1, I will
give it a +0, and step out of the way.  I think that is fair.  Greg, this
is the best that I can do.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: summary: issues for my filtering patch

Posted by Greg Stein <gs...@lyra.org>.

On Wed, Jun 28, 2000 at 12:43:53PM -0700, rbb@covalent.net wrote:
>...
> This is why I have asked for the module.  When I see that the current
> patch can allow modules to cache the entire request without copying the
> entire request (I don't really even mind if the module keeps the whole
> thing in memory, although a comment explaining how it would flush to disk 
> would be nice), I will remove my veto.  I personally do not believe this
> can be done.
> 
> I see how this module would be written with the bucket brigade scheme.  I
> do not see it with the current patch.

I can definitely demonstrate this.

It will be easier to outline the structures and the pseudo-code than to
truly fold it into the patch. Why? Simply because it also requires the
delay-header stuff, which I would like to save for a second pass (e.g. check
in patch one, then take your delayed-header stuff and check that in for a
second pass).

Second: since ap_rwrite(), ap_rputs(), etc, do not allow for "passing
memory" along, these will need to copy data regardless. The API that a
content-generator uses for allowing the zero-copy set-aside behavior would
be new. Is that okay?

If I provide the structures and detailed pseudo-code, will that be
sufficient? I can do this today.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

RE: summary: issues for my filtering patch

Posted by rb...@covalent.net.

> > Exactly!  The filter can't avoid it.  So, then the question is how does
> > the filter buffer the data.  There are two answers to this:
> > 
> > 1)  The filter just comes up with it's own method for filtering, and if
> > that means keeping the whole request in memory, then so be it.
> 
> If this happens, and no filter -wants- to manipulate the output, the
> CGI's behavior may be async, and we can start showing users the results
> of their inquiry immediately (or at least the table headers for data
> to come in 15 seconds or so.)

This will happen in any approach.  Neither approach puts a limit on how
little memory is passed to the next layer.  However, I believe the buckets
approach does encourage filter writers to send more data down to the next
filter at once.  This is a gut feeling, and is not proven, so it's really
not worth arguing.

> > 2)  The core imposes some sane restrictions on how much of the request is
> > in memory by controlling the buffering.
> 
> Then it sounds like we can't see async behavior?  Unless some filter really
> -must- hold the headers until the entire data stream is cached, then this
> is sounds like a very bad behavior.  Who -must- hold headers until the
> entire body is constructed?  Perhaps my own mod_robots, and it only cares
> about reparsing the <HTML><HEAD>[<META ...>...]</HEAD> stream, so even it 
> should release the request fairly quickly.  

ANY module that wants or needs to modify a header MUST cache the entire
request until it knows that it is done modifying headers.  This means that
I believe mod_auth_digest will need to cache the entire request, because
it needs to modify a header to have the md5sum.  Any module that needs to
modify the content length needs to cache the entire request.

> Few filters should even need to do what this suggests.  I agree the server 
> aught to become responsible and prohibit really bad memory behavior by 
> laying out the rules and mechanisims, but I strongly believe either patch
> can and should provide the api to address this issue.  But the abuser of
> this feature should be the one to jump through hoops, not the rest of the
> module writing world :)

This is why I have asked for the module.  When I see that the current
patch can allow modules to cache the entire request without copying the
entire request (I don't really even mind if the module keeps the whole
thing in memory, although a comment explaining how it would flush to disk 
would be nice), I will remove my veto.  I personally do not believe this
can be done.

I see how this module would be written with the bucket brigade scheme.  I
do not see it with the current patch.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

RE: summary: issues for my filtering patch

Posted by "William A. Rowe, Jr." <wr...@lnd.com>.

> From: rbb@covalent.net [mailto:rbb@covalent.net]
> Sent: Wednesday, June 28, 2000 11:17 AM
> 
> > If a filter is going to need to possibly modify a header based on
> > content, I don't see any way that filter can avoid buffering the
> > data. Unless the filter simply passes a flag on to a lower level
> 
> Exactly!  The filter can't avoid it.  So, then the question is how does
> the filter buffer the data.  There are two answers to this:
> 
> 1)  The filter just comes up with it's own method for filtering, and if
> that means keeping the whole request in memory, then so be it.

If this happens, and no filter -wants- to manipulate the output, the
CGI's behavior may be async, and we can start showing users the results
of their inquiry immediately (or at least the table headers for data
to come in 15 seconds or so.)
 
> 2)  The core imposes some sane restrictions on how much of the request is
> in memory by controlling the buffering.

Then it sounds like we can't see async behavior?  Unless some filter really
-must- hold the headers until the entire data stream is cached, then this
is sounds like a very bad behavior.  Who -must- hold headers until the
entire body is constructed?  Perhaps my own mod_robots, and it only cares
about reparsing the <HTML><HEAD>[<META ...>...]</HEAD> stream, so even it 
should release the request fairly quickly.  

Few filters should even need to do what this suggests.  I agree the server 
aught to become responsible and prohibit really bad memory behavior by 
laying out the rules and mechanisims, but I strongly believe either patch
can and should provide the api to address this issue.  But the abuser of
this feature should be the one to jump through hoops, not the rest of the
module writing world :)

> Greg is proposing #1, I am proposing #2 (after much prompting and head
> beating from Roy).  My feeling, after having looked at this 
> problem for a
> VERY long time, is that trying to just fit option 2 in after 
> the current
> patch is applied is almost impossible.
> 
> Using option #1 is not a good idea IMNSHO, because it allows stupid
> filters to take down the server.
> 
> > that says "hey, here's the data to send, but don't send it until
> > I tell you its okay, because I'm going to be adding a header". But
> > all you've gained in that case is the ability for the later layers
> > to start processing the data instead of having to wait for the
> > layer that thinks it needs to see all of its input before it can
> > send on its output. (Which it will presumably do by sending all
> > the accumulated data at once, so each later layer gets called
> > once and doesn't have to do any buffering.)
> 
> By having a top level filter do the buffering it requires, we 
> can actually
> get single or zero copy in.
> 
> > But it also just seems like an optimization. Like I said, PHP
> > didn't have any sort of output buffering until the latest version,
> > and although there was some grousing from users, they generally
> > accepted the limitation of not being able to add headers after
> > non-header data had been sent.
> 
> Again, it is how the buffering is done that is important.  Asking each
> module to implement their own filtering is asking for trouble.
> 
> Ryan
> 
> ______________________________________________________________
> _________________
> Ryan Bloom                        	rbb@apache.org
> 406 29th St.
> San Francisco, CA 94131
> --------------------------------------------------------------
> -----------------
>

Re: summary: issues for my filtering patch

Posted by rb...@covalent.net.

> If a filter is going to need to possibly modify a header based on
> content, I don't see any way that filter can avoid buffering the
> data. Unless the filter simply passes a flag on to a lower level

Exactly!  The filter can't avoid it.  So, then the question is how does
the filter buffer the data.  There are two answers to this:

1)  The filter just comes up with it's own method for filtering, and if
that means keeping the whole request in memory, then so be it.

2)  The core imposes some sane restrictions on how much of the request is
in memory by controlling the buffering.

Greg is proposing #1, I am proposing #2 (after much prompting and head
beating from Roy).  My feeling, after having looked at this problem for a
VERY long time, is that trying to just fit option 2 in after the current
patch is applied is almost impossible.

Using option #1 is not a good idea IMNSHO, because it allows stupid
filters to take down the server.

> that says "hey, here's the data to send, but don't send it until
> I tell you its okay, because I'm going to be adding a header". But
> all you've gained in that case is the ability for the later layers
> to start processing the data instead of having to wait for the
> layer that thinks it needs to see all of its input before it can
> send on its output. (Which it will presumably do by sending all
> the accumulated data at once, so each later layer gets called
> once and doesn't have to do any buffering.)

By having a top level filter do the buffering it requires, we can actually
get single or zero copy in.

> But it also just seems like an optimization. Like I said, PHP
> didn't have any sort of output buffering until the latest version,
> and although there was some grousing from users, they generally
> accepted the limitation of not being able to add headers after
> non-header data had been sent.

Again, it is how the buffering is done that is important.  Asking each
module to implement their own filtering is asking for trouble.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: summary: issues for my filtering patch

Posted by Jim Winstead <ji...@trainedmonkey.com>.

On Jun 28, rbb@covalent.net wrote:
> The requirement is still valid IMHO.  The ability to modify a header based
> on data in the response is a big part of filtering.  
> 
> This requirement is just as valid for the content-length, which Greg has
> said will be implemented in a later patch.  My argument is that this can
> not be done as an add-on.  This must be done as part of the original
> patch, or the original patch is incomplete and not the correct patch.

It seemed like the approach for a time was that the content-length
would simply get stripped out. The later enhancement I see is then
allowing filters to say up-front "I won't screw up the content-length"
or "I'll be adding 300 bytes to content-length". Requiring
content-length to always be preserved would force buffering all of
the content, and is contrary to how content-length is handled even
today.

If a filter is going to need to possibly modify a header based on
content, I don't see any way that filter can avoid buffering the
data. Unless the filter simply passes a flag on to a lower level
that says "hey, here's the data to send, but don't send it until
I tell you its okay, because I'm going to be adding a header". But
all you've gained in that case is the ability for the later layers
to start processing the data instead of having to wait for the
layer that thinks it needs to see all of its input before it can
send on its output. (Which it will presumably do by sending all
the accumulated data at once, so each later layer gets called
once and doesn't have to do any buffering.)

But it also just seems like an optimization. Like I said, PHP
didn't have any sort of output buffering until the latest version,
and although there was some grousing from users, they generally
accepted the limitation of not being able to add headers after
non-header data had been sent.

Jim

Re: summary: issues for my filtering patch

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

rbb@covalent.net wrote:
> 
> The requirement is still valid IMHO.  The ability to modify a
> header based on data in the response is a big part of filtering.

I disagree.  Please name two cases from real-world desires for
filter functionality.  Any time this was necessary, such as
digestification, the content has had to be buffered.  I feel
that saying a filter can't be only as good as a content handler
is excessively restrictive.
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

Re: summary: issues for my filtering patch

Posted by rb...@covalent.net.

> > Warning:  I am not a PHP programmer, but I am using one of your examples
> > from earlier.  
> > 
> > PHP allows a programmer to add a cookie to the request.  If that AddCookie
> > function is the last line in the program, then PHP will have to cache all
> > of the data to ensure that it is set before the data is sent.  Because PHP
> > has no way of knowing if this will happen or not, PHP will always have to
> > cache the data.
> 
> To be fair, PHP currently does not allow this scenario (calling a
> function to add headers after data is sent) without turning on its
> own output buffering first (in PHP4 -- in PHP3 you have to send
> headers before you send anything else, period).

The requirement is still valid IMHO.  The ability to modify a header based
on data in the response is a big part of filtering.  

This requirement is just as valid for the content-length, which Greg has
said will be implemented in a later patch.  My argument is that this can
not be done as an add-on.  This must be done as part of the original
patch, or the original patch is incomplete and not the correct patch.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: summary: issues for my filtering patch

Posted by Jim Winstead <ji...@trainedmonkey.com>.

On Jun 28, rbb@covalent.net wrote:
> Warning:  I am not a PHP programmer, but I am using one of your examples
> from earlier.  
> 
> PHP allows a programmer to add a cookie to the request.  If that AddCookie
> function is the last line in the program, then PHP will have to cache all
> of the data to ensure that it is set before the data is sent.  Because PHP
> has no way of knowing if this will happen or not, PHP will always have to
> cache the data.

To be fair, PHP currently does not allow this scenario (calling a
function to add headers after data is sent) without turning on its
own output buffering first (in PHP4 -- in PHP3 you have to send
headers before you send anything else, period).

Jim

Re: summary: issues for my filtering patch

Posted by Greg Stein <gs...@lyra.org>.

On Thu, Jun 29, 2000 at 02:31:48PM -0700, rbb@covalent.net wrote:
> On Thu, 29 Jun 2000, Greg Stein wrote:
> 
> > I agree with Ken on the usefulness/utility of the offered challenege (read:
> > low/marginal utility). However, Ryan made it seem that this is the shortest
> > route to dropping his veto. I'd rather write a bit of code than continuing
> > to spew hundreds of lines of email :-)
> > 
> > Now... this does assume that Ryan is doing the evaluation in a timely
> > manner. Given that he has a veto in place, I would expect him to be timely.
> > Ryan? What is the status of your review of the challenge code?
> 
> The review was made and it was posted earlier today.  I can re-post if you

I missed it, sorry... Ingo pointed me at it.

Please see my response. I think the angle of critique was incorrect, and we
need to resolve that before continuing.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: summary: issues for my filtering patch

Posted by rb...@covalent.net.

On Thu, 29 Jun 2000, Greg Stein wrote:

> I agree with Ken on the usefulness/utility of the offered challenege (read:
> low/marginal utility). However, Ryan made it seem that this is the shortest
> route to dropping his veto. I'd rather write a bit of code than continuing
> to spew hundreds of lines of email :-)
> 
> Now... this does assume that Ryan is doing the evaluation in a timely
> manner. Given that he has a veto in place, I would expect him to be timely.
> Ryan? What is the status of your review of the challenge code?

The review was made and it was posted earlier today.  I can re-post if you
would like.  The long and short of it is that this patch didn't meat the
challenge.  The exact details are in the review, and they are rather
detailed.

Whether you see this as a bogus challenge or not, I don't.  This challenge
goes directly to how easy it is to write a module that can buffer an
entire request, without copying all of the data, and without holding all
of the data in memory at once.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: summary: issues for my filtering patch

Posted by Greg Stein <gs...@lyra.org>.

I agree with Ken on the usefulness/utility of the offered challenege (read:
low/marginal utility). However, Ryan made it seem that this is the shortest
route to dropping his veto. I'd rather write a bit of code than continuing
to spew hundreds of lines of email :-)

Now... this does assume that Ryan is doing the evaluation in a timely
manner. Given that he has a veto in place, I would expect him to be timely.
Ryan? What is the status of your review of the challenge code?

Cheers,
-g

On Thu, Jun 29, 2000 at 02:22:41PM -0400, Rodent of Unusual Size wrote:
> rbb@covalent.net wrote:
> > 
> > The challenge:
> > 
> > Write a filter that needs to set a header based on information found
> > in the data passed to the filter.  This data can be ANYWHERE in the
> > response, and that must be taken into account.
> 
> Um, yeah?  So what?  Why is this suddenly a problem that MUST be
> resolved for filters, when it isn't for current content handlers?
> Sure, it might bleed core, but "doctor it hurts when I do that."
> This strikes me as rather arbitrary; you're adding a requirment
> for filters that has never existed before.  I think that's unfair;
> if you must set a minimum bar height, say "a filter model is only
> acceptable if each filter has no (or only marginally) more impact
> than a content-handler doing the same thing."
> 
> > it is possible/likely that the thing that makes
> > you change the header could come in the last block of data.
> 
> Make it even simpler: require the behavioural typification of
> a filter that sets response header field X-LastFour to the hex
> representation of the last four bytes of content.
> -- 
> #ken    P-)}
> 
> Ken Coar                    <http://Golux.Com/coar/>
> Apache Software Foundation  <http://www.apache.org/>
> "Apache Server for Dummies" <http://Apache-Server.Com/>
> "Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

-- 
Greg Stein, http://www.lyra.org/

Re: summary: issues for my filtering patch

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

rbb@covalent.net wrote:
> 
> The challenge:
> 
> Write a filter that needs to set a header based on information found
> in the data passed to the filter.  This data can be ANYWHERE in the
> response, and that must be taken into account.

Um, yeah?  So what?  Why is this suddenly a problem that MUST be
resolved for filters, when it isn't for current content handlers?
Sure, it might bleed core, but "doctor it hurts when I do that."
This strikes me as rather arbitrary; you're adding a requirment
for filters that has never existed before.  I think that's unfair;
if you must set a minimum bar height, say "a filter model is only
acceptable if each filter has no (or only marginally) more impact
than a content-handler doing the same thing."

> it is possible/likely that the thing that makes
> you change the header could come in the last block of data.

Make it even simpler: require the behavioural typification of
a filter that sets response header field X-LastFour to the hex
representation of the last four bytes of content.
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

Re: summary: issues for my filtering patch

Posted by rb...@covalent.net.

I will tell you right now exactly what is required for me to remove my -1.  

The challenge:

Write a filter that needs to set a header based on information found
in the data passed to the filter.  This data can be ANYWHERE in the
response, and that must be taken into account.


What I think will happen:

Your module will have to cache all of the data in the response so that it
can be sure to change the headers before the first block of data is
written to the network.

The other option, is that the filter will end up writing huge blocks of
code to malloc/free memory and will in the end actually have the
beginnings of the bucket brigades patch.

I also think that because it is possible/likely that the thing that makes
you change the header could come in the last block of data.  This means
your filter is caching all of the data for the response.


Example:

Warning:  I am not a PHP programmer, but I am using one of your examples
from earlier.  

PHP allows a programmer to add a cookie to the request.  If that AddCookie
function is the last line in the program, then PHP will have to cache all
of the data to ensure that it is set before the data is sent.  Because PHP
has no way of knowing if this will happen or not, PHP will always have to
cache the data.

If you put a second module in line that has the same restriction, you will
make two copies of the data, and neither will be freed until the request
is finished.


Ending Comments:

If I am correct, then this patch does not solve the problems, and can not
be committed.  If I am incorrect, then I look forward to being proven
wrong and I will happily remove my veto.

I look forward to the code.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------