You are viewing a plain text version of this content. The canonical link for it is here.

Posted to apreq-dev@httpd.apache.org by Joe Schaefer <jo...@sunstarsys.com> on 2002/08/21 20:49:05 UTC

dev question: apreq 2 as a filter?

During the ongoing discussion on dev regarding apreq-2,
William Rowe suggested we might implement apreq-2 as an
apache filter.  In principle, I think this is a great idea.
But I have a feeling it will require substantial reworking of
all our parser-based code (basically everything in 
apreq_parser.c).

Here's why:  running apreq-2 as an input filter means (I think)
that we'd need to reimplement the parsers as callbacks which
relinquish control after they've read a few chunks of data.
Currently they consume everything, but there's really no
reason they can't stop after locating a urlword or a block of 
data from a file upload.  It's possible that we could rework
the parsers themselves to be filters, instead of maintaining
our own internal parser stack (req->parsers).  Unfortunately
I've no experience with the apache filter API, so this may
all be a lot of hot air.

Thoughts on API changes?  Should apreq_parser_register() register
an apache filter instead?

Re: dev question: apreq 2 as a filter?

Posted by "David N. Welton" <da...@dedasys.com>.

Joe Schaefer <jo...@sunstarsys.com> writes:

> I've no experience with the apache filter API, so this may all be a
> lot of hot air.
> 
> Thoughts on API changes?  Should apreq_parser_register() register an
> apache filter instead?

Neither do I - let's get more information before we decide anything.
Unfortunately, I don't have much (any?) spare time.

-- 
David N. Welton
   Consulting: http://www.dedasys.com/
     Personal: http://www.dedasys.com/davidw/
Free Software: http://www.dedasys.com/freesoftware/
   Apache Tcl: http://tcl.apache.org/

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:

>>As mentioned before, I'd rather wait before httpd-dev decides how they
>>want apreq to be in order to accept it.
> 
> 
> Agreed, but I don't think it hurts anything to *discuss* potential
> implications of a filter based approach.  

Of course ;)

> FWIW, I think filters
> are a non-starter if it means that a +100MB file upload will balloon
> the httpd process size by +100MB.

Meaning that ideally it should be hookable both ways, the old way and as 
a filter. If we have that, we aren't restricted in exploring the filter 
option while having the knowingly working "normal" interface.

Plus it will probably need to be more configurable. For example if 
acting as a filter, we may need an option to suck the body-in or copy 
it. And be able to limit the body size like we do now to avoid DOS attacks.

p.s. As you can see on the httpd-dev list, so far more reaction was 
generated regading the silly macro which is a one-sec fix, rather than 
the much bigger issue we are trying to resolve here.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Stas Bekman <st...@stason.org> writes:

[...]

> As mentioned before, I'd rather wait before httpd-dev decides how they
> want apreq to be in order to accept it.

Agreed, but I don't think it hurts anything to *discuss* potential
implications of a filter based approach.  FWIW, I think filters
are a non-starter if it means that a +100MB file upload will balloon
the httpd process size by +100MB.

[...]

> Joe, see my preliminary diagrams/explanations at 
> http://perl.apache.org/docs/2.0/user/handlers/handlers.html

Thanks Stas, I'll take a look.

> The point is that you don't need to relinquish control since you never 
> take it in first place, think of a passive filter and read my perl 
> implementation of MyApache::FilterSnoop, which perfectly shows how this 
> works:
> http://perl.apache.org/docs/2.0/user/handlers/handlers.html#All_in_One_Filter

Ditto.

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> During the ongoing discussion on dev regarding apreq-2,
> William Rowe suggested we might implement apreq-2 as an
> apache filter.  In principle, I think this is a great idea.
> But I have a feeling it will require substantial reworking of
> all our parser-based code (basically everything in 
> apreq_parser.c).
> 
> Here's why:  running apreq-2 as an input filter means (I think)
> that we'd need to reimplement the parsers as callbacks which
> relinquish control after they've read a few chunks of data.
> Currently they consume everything, but there's really no
> reason they can't stop after locating a urlword or a block of 
> data from a file upload.  It's possible that we could rework
> the parsers themselves to be filters, instead of maintaining
> our own internal parser stack (req->parsers).  Unfortunately
> I've no experience with the apache filter API, so this may
> all be a lot of hot air.
> 
> Thoughts on API changes?  Should apreq_parser_register() register
> an apache filter instead?

As mentioned before, I'd rather wait before httpd-dev decides how they
want apreq to be in order to accept it.

Regarding filters, I think there shouldn't be a lot to change regarding 
the parsing. You simply suck in all the data as before and parse it as 
before. Look at mod_deflate, which does a more complicated thing since 
it also modifies the data, whereas we aren't. Once the data is parsed it 
should be stuck into the connection context, so it's available to the 
letter stages.

Joe, see my preliminary diagrams/explanations at 
http://perl.apache.org/docs/2.0/user/handlers/handlers.html
to get a quick intro on how things work in 2.0. This is rather mod_perl 
2.0 specific but not much different from Apache itself, which does more 
things. The concepts are the same.

The point is that you don't need to relinquish control since you never 
take it in first place, think of a passive filter and read my perl 
implementation of MyApache::FilterSnoop, which perfectly shows how this 
works:
http://perl.apache.org/docs/2.0/user/handlers/handlers.html#All_in_One_Filter

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Stas Bekman <st...@stason.org> writes:

> Joe Schaefer wrote:

[...]

> > 
> > snoop ("connection", ...
> > snoop ("request", ...
> > snoop ("connection", ...
> > snoop ("request", ...
> > snoop ("connection", ...
> > snoop ("request", ...
> > 
> > but NOT in a store-and-forward 'ish
> > 
> > snoop ("connection", ...
> > snoop ("connection", ...
> > snoop ("connection", ...
> > snoop ("request", ...
> > snoop ("request", ...
> > snoop ("request", ...
> > 
> > It looks to me (based on the webpage) like the second case 
> > is what's really happening.  Is that right?

Yes, I am wrong. :-)

> I guess my explanations aren't good enough :( Dump is a response 
> handler, it has nothing to do with filters. it just reads the query 
> string and the body and echos them back as a response. it could just say 
> "hello". The point of Dump is that it calls $r->content which invokes 
> request input filters. If you don't call $r->content request input 
> filter will never be called at all.

Oops- I'm sorry for not looking at the example more carefully.
I'll try to do better in the future.

> All filters are inside FilterSnoop, try removing the request filters and 
> see how the connection filters work alone, then just the request 
> filters, then both. Filters never consume more then one brigade unless 
> you want to buffer up, usually they process the brigade and forward it 
> further.

OK, I think it's starting to gel now.  The input filter's 
control flow (in C) centers around ap_get_brigade.  I think
the upshot for us means that converting the parsers to filters
amounts to

  1) reworking apreq_list_read to read from an arbitrary filter, 
     not just r->filters_in.  It also has to pass along the brigade
     instead of clearing it.  The necessary changes to apreq_list.[ch]
     are trivial.

  2) literally removing the for(;;) loops from the parsers in
     apreq_parser.c.  All parsers take their input from apreq_list,
     so the only modifications would be to have them operate as 
     callbacks.  I don't think that's much of an issue at all.

[...]

> in reality filters interleave because they are stacked and each brigade 
> goes through all filters in the stack.

Yup- that's what I needed to know.  Do I have it right now?

> Give me some more time, I'll add more filter examples tomorrow.

No need to rush,  I came across an onlamp article by Ryan Bloom
that gives lots of jucy details:

http://www.onlamp.com/pub/a/apache/2001/09/20/apache_2.html

I'm just enjoying this substantive discussion that we're *not* 
having on dev :-)

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Justin Erenkrantz <je...@apache.org>.

On Sun, Aug 25, 2002 at 01:03:52AM -0500, William A. Rowe, Jr. wrote:
> The fact that the input filtering schema is a bit clumsy for apreq
> is a perfect example of WHY we should incorporate apreq into
> the Apache core.  Without a good use case, input filters will
> never become as polished as they aught to be.

Can someone please explain how input filtering is clumsy for
apreq (i.e. use cases)?  I'm not sure what apreq should be doing
as a filter - since it is only a set of helper functions.

It seems like this thread is happening on another list...  -- justin

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> [...]
> 
> 
>>as you can see the input filter that saw the body was invoked *after* 
>>the response phase has finished. So my question was, how to force the 
>>connection filter to request the next brigades which include the body, 
>>if nobody else does that. This part can be very tricky if you understand 
>>what I mean. I hope Bill can see the problem here, unless I miss something.
> 
> 
> I see the problem.  However, don't we have the exact same problem
> with the current code?  I mean, if the reported Content-Length is
> too big, WE don't attempt to read any POST data.  We also give up
> if we've accumulated too much data.

No, the problem I'm referring to is how to invoke the filter in first 
place. It won't be invoked if the response handler won't call 
ap_get_brigade. Hmm, I think I know how this should work.

Any time anybody does any enquiry from apreq, we check a flag whether we 
have the data consumed and parsed (which is done already). If it wasn't 
consumed yet, apreq inserts its input filter and performs the 
ap_get_brigade call.

Bill, please correct me if I'm wrong as I see the corrected picture in 
my mind:

apreq is split into 2 parts: the warehouse and the filter.

The warehouse is invoked from HTTP response handler by simply performing 
*any* call into apreq_, which essentially asks for something. the 
warehouse looks whether the body has been consumed already, if it was 
and parsed it answers the quiery. If the data wasn't consumed yet, the 
warehouse inserts apreq filter as the last request input filter and 
immediately calls ap_get_brigade till it gets EOS bucket or it decides 
to terminate the sucking action (e.g. because of POST limit was exceeded).

The filter is really just a sucker which feeds the warehouse which does 
the parsing and storing of the parsed data.

hmm, for some reason I think that we end up using the current apreq 
model, just that it gets feeded from its own filter, which can be 
eliminated altogether.

the point is that you cannot invoke the apreq filter by itself, somebody 
has to invoke it (inserting is not enough), that somebody is the 
response handler, so we return to where we have started, not really 
needing any filter stuff at all.

> In the 1.3-ish past, I'd assumed that the proper course of action for 
> these situations was to instruct apache to shut down the 
> connection.  Otherwise (say with keepalives on) the client will
> send the post data and apache will treat it as a new, malformed 
> http request.

I think that this part is of a later concern, but as Bill has mentioned 
before discard_request_body() will probably take care of it.

For future optimizations, I can see the situation where the lazy mode 
can be used, e.g. don't consume the whole body as long as you have 
satisfied the query. e.g. the form data followed the file upload, but 
the form wasn't filled properly so we don't care about the file because 
we want to return user the form to complete again.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Stas Bekman <st...@stason.org> writes:

[...]

> Bill was saying that he won't grab more than 64k of the body by default. 
> I suggested that this should be configurable by apreq.
> 
> I think that we are all saying a similar thing, the only confusion is 
> about what happens if apreq injects its filter, but somebody else calls 
> ap_get_brigade. 

Yes, I think this is exactly the situation where we disagree.

> This probably should never happen if apreq first consumes(/copies) all
> the body, assuming that it's configured that way and the body is not
> too big.

But we don't *need* the filter to consume everything- the apreq filter 
can just grab enough input data to keep the parser (and the previous
filter) happy.  Usually that'll be 8KB or less, even say in the case 
of a 100MB file upload.  Yes we need to worry about pathological cases, 
but the necessary security features are pretty much already there.

> And I agree with Issac's two descriptions, where 2) simply consumes all 
> the body without using filters, which can be done from 1) where the 
> filter simply consumes the data and doesn't pass anything but EOS further.

I'd rather the apreq filter behaved like a block-buffered filter would.
AFAICT mod_deflate doesn't consume the whole body before passing it 
along,  so why do you think we need to?

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Ian Holsman <ia...@apache.org>.

William A. Rowe, Jr. wrote:
> Just a quick observation.
> 
> The fact that the input filtering schema is a bit clumsy for apreq
> is a perfect example of WHY we should incorporate apreq into
> the Apache core.  Without a good use case, input filters will
> never become as polished as they aught to be.
> 
> Bill
> 
+1
any helpers to the input filters were definatly be a good thing

Re: dev question: apreq 2 as a filter?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

Just a quick observation.

The fact that the input filtering schema is a bit clumsy for apreq
is a perfect example of WHY we should incorporate apreq into
the Apache core.  Without a good use case, input filters will
never become as polished as they aught to be.

Bill

Re: dev question: apreq 2 as a filter?

Posted by Issac Goldstand <ma...@beamartyr.net>.

----- Original Message -----
From: "Stas Bekman" <st...@stason.org>
To: "William A. Rowe, Jr." <wr...@rowe-clan.net>
Cc: "Joe Schaefer" <jo...@sunstarsys.com>; "apreq list"
<ap...@httpd.apache.org>
Sent: Monday, August 26, 2002 9:52 AM
Subject: Re: dev question: apreq 2 as a filter?


> William A. Rowe, Jr. wrote:
> > At 11:38 PM 8/24/2002, Joe Schaefer wrote:
>
> >> I'd rather the apreq filter behaved like a block-buffered filter would.
> >> AFAICT mod_deflate doesn't consume the whole body before passing it
> >> along,  so why do you think we need to?
> >
> >
> > No, we don't need the whole body.  The idea of pre-fetching 8kb, 16kb or
> > even 64kb is so that the variables are know up-front.  If someone needs
> > more, they simply need to call ap_get_brigade again until they have the
> > content they need.  And then it's their job to set aside the content for
> > the
> > next upstream filter or the final handler destination.
>
> What happens if you have a form with big input values, so that (e.g.)
> the last key/value won't feet into 8/16/64k. So when you call
> $r->params() (which is supposed to return all keys) you won't be able to
> get them all, meaning that you really have to suck the whole data in,
> before you know whether you have all the keys or not.

This is true, but it's a limitation of the content-type of the POSTed data,
not of HTTP or apreq.

  Issac

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

William A. Rowe, Jr. wrote:
> At 11:38 PM 8/24/2002, Joe Schaefer wrote:

>> I'd rather the apreq filter behaved like a block-buffered filter would.
>> AFAICT mod_deflate doesn't consume the whole body before passing it
>> along,  so why do you think we need to?
> 
> 
> No, we don't need the whole body.  The idea of pre-fetching 8kb, 16kb or
> even 64kb is so that the variables are know up-front.  If someone needs
> more, they simply need to call ap_get_brigade again until they have the
> content they need.  And then it's their job to set aside the content for 
> the
> next upstream filter or the final handler destination.

What happens if you have a form with big input values, so that (e.g.) 
the last key/value won't feet into 8/16/64k. So when you call 
$r->params() (which is supposed to return all keys) you won't be able to 
get them all, meaning that you really have to suck the whole data in, 
before you know whether you have all the keys or not.



__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> "Issac Goldstand" <ma...@beamartyr.net> writes:
> 
> [...]
> 
> 
>>Joe Schaefer wrote:
> 
> 
>>>I don't agree.  IMO (using your terminology) the warehouse should
>>>be off-limits until the POST data has been parsed *completely*.  That
>>>means *only* the content handler should be making any enquiries.
>>>
>>>Furthermore, if the content handler wants to call ap_get_brigade
>>>itself to get at a portion of the POST stream, it should do that
>>>*before* ever visiting our warehouse.  Otherwise apreq_request_parse
>>>should just gobble it all up.
>>
>>I don't get it...  Correct me if I'm wrong, but I see two possible scenarios
>>for apreq2 implementation:
>>
>>1)  Installed as filter - In this case, any call to ap_get_brigade will
>>cause data to pass through apreq (which will duly save of copy of the data
>>it recieves in its "warehouse").  Alternatively, any implicit or explicit
>>call to $q->parse will trigger apreq to call ap_get_brigade internally to
>>grab the data.
> 
> 
> That is *exactly* what I'm saying; I think part of the confusion we're 
> having centers around *when* the apreq filter gets installed.  The 
> content-handler needs the ability to inject our apreq filter at runtime.  
> IMO, the injection should take place in the apreq_request_new 
> call, and the content-handler wants to call ap_get_brigade, it should
> do it between apreq_request_new() and apreq_request_parse().  
> 
> I think Stas is arguing that the apreq filter could be injected
> later on, perhaps inside the apreq_request_parse call, but I think 
> that makes things too complicated.

Ah, no, I'm not arguing about that... or anything at all :)

Bill was saying that he won't grab more than 64k of the body by default. 
I suggested that this should be configurable by apreq.

I think that we are all saying a similar thing, the only confusion is 
about what happens if apreq injects its filter, but somebody else calls 
ap_get_brigade. This probably should never happen if apreq first 
consumes(/copies) all the body, assuming that it's configured that way 
and the body is not too big.

And I agree with Issac's two descriptions, where 2) simply consumes all 
the body without using filters, which can be done from 1) where the 
filter simply consumes the data and doesn't pass anything but EOS further.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

"Issac Goldstand" <ma...@beamartyr.net> writes:

[...]

> Joe Schaefer wrote:

> > I don't agree.  IMO (using your terminology) the warehouse should
> > be off-limits until the POST data has been parsed *completely*.  That
> > means *only* the content handler should be making any enquiries.
> >
> > Furthermore, if the content handler wants to call ap_get_brigade
> > itself to get at a portion of the POST stream, it should do that
> > *before* ever visiting our warehouse.  Otherwise apreq_request_parse
> > should just gobble it all up.
> 
> I don't get it...  Correct me if I'm wrong, but I see two possible scenarios
> for apreq2 implementation:
> 
> 1)  Installed as filter - In this case, any call to ap_get_brigade will
> cause data to pass through apreq (which will duly save of copy of the data
> it recieves in its "warehouse").  Alternatively, any implicit or explicit
> call to $q->parse will trigger apreq to call ap_get_brigade internally to
> grab the data.

That is *exactly* what I'm saying; I think part of the confusion we're 
having centers around *when* the apreq filter gets installed.  The 
content-handler needs the ability to inject our apreq filter at runtime.  
IMO, the injection should take place in the apreq_request_new 
call, and the content-handler wants to call ap_get_brigade, it should
do it between apreq_request_new() and apreq_request_parse().  

I think Stas is arguing that the apreq filter could be injected
later on, perhaps inside the apreq_request_parse call, but I think 
that makes things too complicated.

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Issac Goldstand <ma...@beamartyr.net>.

----- Original Message -----
From: "William A. Rowe, Jr." <wr...@rowe-clan.net>
To: "Issac Goldstand" <ma...@beamartyr.net>
Cc: "William A. Rowe, Jr." <wr...@rowe-clan.net>; "apreq list"
<ap...@httpd.apache.org>; "Joe Schaefer" <jo...@sunstarsys.com>; "Stas
Bekman" <st...@stason.org>
Sent: Monday, August 26, 2002 5:30 PM
Subject: Re: dev question: apreq 2 as a filter?


> At 02:57 AM 8/25/2002, Issac Goldstand wrote:
>
> >Let me restate this whole process, as I see it happening, using the
> >warehouse lingo that we were using before.  I believe that we're on the
same
> >wavelength here, but want to make sure...  I see three major components
> >here:  The filter, the parser, and the "warehouse manager"...
>
> I'll split out your points then and answer specific issues.
>
> >(1) First of all, the only real way that apreq should be installed is as
an
> >input filter.
> >(2) The filter should be installed as early as possible and
> >(3) immediately create an empty data structure in memory - I'm not going
to
> >say where (notes table should be fine if it's still there in Apache2),
> >because that's probably an entire conversation on its own.  In any case,
> >user intervention SHOULD take place as early as possible in Apache's
> >request-phase chain as possible.
> >((4)Frankly, we may find useful to provide
> >httpd.conf directives to enable users to somewhat tweak the necessary
> >configurations, and provide a handler that runs as early as possible to
scan
> >directives for each location before it starts.  The action taken would
This
> >should include a directive to *uninstall* (or disable, or whatever) the
> >apreq filter, too).
>
> The more I consider what apreq must accomplish, the more I'm against
> user 'config' of the apreq filter.  It should be programmatically
configured,
> by all of the modules that want it injected.
>
> That means one filter module could supercede another module's
> requirements.  That is a bad thing.  So we need to use a greatest
> common denominator configuration scheme.
>
> So module A. expects some POST variables that it expects are no
> greater than 8kb.  module B. expects to deal with 64kb or greater,
> and is willing to handle a multipart-form upload.  In this case, module
> B registered a file upload callback and requests that the set-aside
> or 'prescan' limit is 64kb.  Those should override module A's miserly
> 8kb expectation.
>
> Even if module A calls apreq to inject itself after module B, the GCD
> needs to win.
>

Yes, assuming we're talking about the same request.  My point was that each
request should re-configure apreq to prepare the data in the best possible
manner based on the parameters of that particular request.  My visualization
of your example would be like this:

A) apreq injected into the filter chain.  Configuration table is created in
notes and set to default values.
B) Module A initializes itself.  Calls on apreq to get instance of request
object and attempts to set 'prebuffer' flag to 8kb.  Default value is, say,
4kb.  8 > 4, therefore apreq changes the value and returns OK.
C) Module B initializes itself.  Calls on apreq to get instance of request
object and attempts to set
'upload OK' to true.  Since it defaults to false, apreq changes the value.
Next it attempts to change prebuffer to 64kb.  Since this is greater than 8
it also changes and returns OK.

Now, the reverse situation is (A) then (C) and then:

D) Module A initializes itself.  Calls on apreq to get instance of request
object and attempts to set 'prebuffer' flag to 8kb.  Value is already 64.  8
< 64, so it does NOT change the value.  Returns OK.

> >Moving along, (5) mechanisms to override the default input method, which
is
> >input and share with other filters, should be provided, and can be
invoked
> >anywhere up until the first call to ap_get_brigade.  Frankly, it ought to
> >work afterwards also, but then we run the risk of other filters choking
on
> >mangled data.  We wouldn't want another filter to do that to us, so we
> >oughtn't do it to them.
>
> NO!  You cannot supercede the defined behavior!  Otherwise the oddball
> module will -break- all other installed filters for the request!!!
>
> Think of superclassing a C object.  In this HTTP schema, we have variables
> and values.  We need a clean definition of how to read/retrieve them.  If
that
> defintion is reasonably extensible, the modules don't have to know that
the
> received data was sent in multipart-form or in XML format.
>
> Extensible and large values will have to be supported in an abstract way.
> This is what makes brigades and metadata so [potentially] appealing.
> Those already have some definitions we can extend, that I think would
> cover nearly any future potential use cases.

Er, I think I'm missing the gist of your argument.  Maybe just because I'm
tired just now.  I'll try to read this again tomorrow and send you mail
off-list if I need you to clarify...

> >In any case, (6) the configuration for the
> >request-specific parameters of the apreq call should be read during the
> >first callback of the actual filter (eg, first time ap_get_brigade is
called
> >from anywhere).
>
> I'm suggesting the other filters and handler that want the data spell out
> their requirements.  If those can be flexible [one says pre-cache 8kb,
> and another asks to precache 64kb, let's let the 64kb requestor win.]

Agreed - see above.

> >(7)At that point, a flag is set in our little
> >request-specific apreq notepad to tell us that we've started munching
data,
> >and that (7a) requests for behavior changes for the request-specific
apreq
> >call should fail (not silently - it should return failure status to
user -
> >possibly with a reason) and (7b) the warehouse doors are now open, but
the
> >warehouse is flagged as being "stocking in progress" (the warehouse
should
> >most likely NOT be in the same place as the configuration directives -
the
> >former potentially needs lots fo room, while latter doesn't).
>
> 7a), why?  If we could keep pre-fetching in order to satisfy a given
request,
> let's do so.  Remember that several filters and a handler may all be
looking
> for apreq fields.  It's best to play nicely with all of them.
>
> As long as our input filter keeps setting the data aside for subsequent
> ap_get_brigade() calls, and can later satisfy them, we should be fine.

Well, I'm not going to say "you're wrong".  What I'm trying to do here,
though, is to avoid any module A getting any sort of nasty surprise as to
how apreq is handling its data just because module B decided to screw around
with apreq once it started slurping in the data...  Let's just say that 7a
is because I'm being cautious...

> >(8)If the
> >"exclusive mode" flag is set for this request (file upload? It doesn't
> >matter - what matters is that this is the Apache 1.x style apreq that
> >everyone's so keen on having in apreq2), then we simply don't pass the
> >brigade on to the next filter, unless, of course, it's EOS.
>
> Bzzt.... that one's wrong.  There is no exclusive mode in this model.  All
> consumers must have access to all the input data.  When a collection of
> PHP, Perl and some special purpose filters all want to see the variables,
> they will all see them.  The only problem, huge posts [e.g. file uploads]
> are messy.  I have a thought on that one, too.
>
> I'll propose a suggestion for the worst case [file upload variable] in
another
> post to show that this isn't a problem.

Well, to be honest, I don't see a need for it either - but it seemed that
everyone was trying to get an Apache 1.x-like model in apreq2.  I was trying
to meet that want.  If noone wants it, then chuck it...

> >(9)Also at this
> >point (this is still *first* ap_get_brigade call only), we check to see
if
> >the "populate-at-once" flag is set for this request.   We can have a
> >mechanism where we continuously call ap_get_brigade until we hit EOS to
do
> >this.  Note that the "populate-at-once" and "exclusive" modes can thus
run
> >independantly of one-another.
>
> If we give the apreq consumer a simple call to ask for a given variable,
and
> it's not yet present, we can continue to consume the client body and set
it
> aside for the filter chain, until that 'variable' has been read complete
or
> the
> entire client request body is read..

I said exactly that below.  It could be, though, that some module calls
$q->parse, which tells apreq to finish reading the entire request.  That
would be an example of populate-at-once mode.  I'm sure I could think of
others too...

> >(10) Lastly, once EOS is recieved, we mark the
> >warehouse as "warehouse full" in the request-specific configuration
notepad.
> >  What remains is the warehouse manager.
>
> Yup.  We definately need the NOT_READ, IN_PROGRESS, COMPLETE
> and NO_BODY placeholder :-)

Forgive me on my lack of knowledge of Apache 2.0 internals, but...  What's
that?

> >(11)I think we need a 3-key system
> >to manage the warehouse entries: "Data/Name", "Value" and some flag
(bit?)
> >"Status".  To do this, the parser would start populating entries in the
> >warehouse as it comes in (from the filter).
>
> Sounds right.  I was picturing each residing in a metadata + data buckets,
> which I will write up a description for.
>
> >(12)As soon as each entry is
> >completed in the warehouse, the status flag should be set to indicate
> >"in-stock".  (13)An entry in the per-request configuration "notepad" can
> >contain the name of the current "item" being imported into the warehouse.
> >(14)Calls to get data from warehouse (this is the "warehouse manager"
part)
> >should scan the warehouse entries.  (14a)If an item is "in-stock", no
> >additional data-collection is needed.  (14b) If an item is in, but not
> >flagged, we call ap_get_brigade until it's flagged "in-stock" by the
parser
> >(ONLY the parser can import to the warehouse, whereas ONLY the warehouse
> >manager can actually read items from the warehouse).  (14c) If the data
is
> >not found and the "warehouse full" flag is set, the call fails.  (14d)
> >Otherwise, we continue to call ap_get_brigade (either explicitly from the
> >parser, or implicitly by simply setting the "populate-at-once" flag and
> >calling ap_get_brigade once from the parser [I'd say explicit is better,
> >simply becuase it allows us to contiually check the warehouse for the
> >addition of our data and stop calling ap_get_brigade once our data is
> >"in-stock"].)  Once we hit "warehouse full" (note that the warehouse
manager
> >doesn't care about EOS - all it cares about is "warehouse full") and
haven't
> >found our data, the call fails.
>
> This all sounds about right.  The biggest problem is consuming the
occasional
> huge item that exceeds a sanity threshold, e.g. a file upload item, and
that
> case I'll spell out in that another post when I have a few minutes.
>
> >I think that about covers the lifespan of an apreq call.  What do you
people
> >think?
>
> I hate new metaphors if we require programmers to code to them :-)
> I don't mind them at all for illustration though, yours works pretty well.

Never said we have to officially *document* the warehouse metaphore, but if
we start using this for planning here, it gives us all a clear shared view
on which component of apreq we're dealing with. :-)

  Issac

Re: dev question: apreq 2 as a filter?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 02:57 AM 8/25/2002, Issac Goldstand wrote:

>Let me restate this whole process, as I see it happening, using the
>warehouse lingo that we were using before.  I believe that we're on the same
>wavelength here, but want to make sure...  I see three major components
>here:  The filter, the parser, and the "warehouse manager"...

I'll split out your points then and answer specific issues.

>(1) First of all, the only real way that apreq should be installed is as an
>input filter.
>(2) The filter should be installed as early as possible and
>(3) immediately create an empty data structure in memory - I'm not going to
>say where (notes table should be fine if it's still there in Apache2),
>because that's probably an entire conversation on its own.  In any case,
>user intervention SHOULD take place as early as possible in Apache's
>request-phase chain as possible.
>((4)Frankly, we may find useful to provide
>httpd.conf directives to enable users to somewhat tweak the necessary
>configurations, and provide a handler that runs as early as possible to scan
>directives for each location before it starts.  The action taken would This
>should include a directive to *uninstall* (or disable, or whatever) the
>apreq filter, too).

The more I consider what apreq must accomplish, the more I'm against
user 'config' of the apreq filter.  It should be programmatically configured,
by all of the modules that want it injected.

That means one filter module could supercede another module's
requirements.  That is a bad thing.  So we need to use a greatest
common denominator configuration scheme.

So module A. expects some POST variables that it expects are no
greater than 8kb.  module B. expects to deal with 64kb or greater,
and is willing to handle a multipart-form upload.  In this case, module
B registered a file upload callback and requests that the set-aside
or 'prescan' limit is 64kb.  Those should override module A's miserly
8kb expectation.

Even if module A calls apreq to inject itself after module B, the GCD
needs to win.

>Moving along, (5) mechanisms to override the default input method, which is
>input and share with other filters, should be provided, and can be invoked
>anywhere up until the first call to ap_get_brigade.  Frankly, it ought to
>work afterwards also, but then we run the risk of other filters choking on
>mangled data.  We wouldn't want another filter to do that to us, so we
>oughtn't do it to them.

NO!  You cannot supercede the defined behavior!  Otherwise the oddball
module will -break- all other installed filters for the request!!!

Think of superclassing a C object.  In this HTTP schema, we have variables
and values.  We need a clean definition of how to read/retrieve them.  If that
defintion is reasonably extensible, the modules don't have to know that the
received data was sent in multipart-form or in XML format.

Extensible and large values will have to be supported in an abstract way.
This is what makes brigades and metadata so [potentially] appealing.
Those already have some definitions we can extend, that I think would
cover nearly any future potential use cases.

>In any case, (6) the configuration for the
>request-specific parameters of the apreq call should be read during the
>first callback of the actual filter (eg, first time ap_get_brigade is called
>from anywhere).

I'm suggesting the other filters and handler that want the data spell out
their requirements.  If those can be flexible [one says pre-cache 8kb,
and another asks to precache 64kb, let's let the 64kb requestor win.]

>(7)At that point, a flag is set in our little
>request-specific apreq notepad to tell us that we've started munching data,
>and that (7a) requests for behavior changes for the request-specific apreq
>call should fail (not silently - it should return failure status to user -
>possibly with a reason) and (7b) the warehouse doors are now open, but the
>warehouse is flagged as being "stocking in progress" (the warehouse should
>most likely NOT be in the same place as the configuration directives - the
>former potentially needs lots fo room, while latter doesn't).

7a), why?  If we could keep pre-fetching in order to satisfy a given request,
let's do so.  Remember that several filters and a handler may all be looking
for apreq fields.  It's best to play nicely with all of them.

As long as our input filter keeps setting the data aside for subsequent
ap_get_brigade() calls, and can later satisfy them, we should be fine.

>(8)If the
>"exclusive mode" flag is set for this request (file upload? It doesn't
>matter - what matters is that this is the Apache 1.x style apreq that
>everyone's so keen on having in apreq2), then we simply don't pass the
>brigade on to the next filter, unless, of course, it's EOS.

Bzzt.... that one's wrong.  There is no exclusive mode in this model.  All
consumers must have access to all the input data.  When a collection of
PHP, Perl and some special purpose filters all want to see the variables,
they will all see them.  The only problem, huge posts [e.g. file uploads]
are messy.  I have a thought on that one, too.

I'll propose a suggestion for the worst case [file upload variable] in another
post to show that this isn't a problem.

>(9)Also at this
>point (this is still *first* ap_get_brigade call only), we check to see if
>the "populate-at-once" flag is set for this request.   We can have a
>mechanism where we continuously call ap_get_brigade until we hit EOS to do
>this.  Note that the "populate-at-once" and "exclusive" modes can thus run
>independantly of one-another.

If we give the apreq consumer a simple call to ask for a given variable, and
it's not yet present, we can continue to consume the client body and set it
aside for the filter chain, until that 'variable' has been read complete or 
the
entire client request body is read..

>(10) Lastly, once EOS is recieved, we mark the
>warehouse as "warehouse full" in the request-specific configuration notepad.
>  What remains is the warehouse manager.

Yup.  We definately need the NOT_READ, IN_PROGRESS, COMPLETE
and NO_BODY placeholder :-)

>(11)I think we need a 3-key system
>to manage the warehouse entries: "Data/Name", "Value" and some flag (bit?)
>"Status".  To do this, the parser would start populating entries in the
>warehouse as it comes in (from the filter).

Sounds right.  I was picturing each residing in a metadata + data buckets,
which I will write up a description for.

>(12)As soon as each entry is
>completed in the warehouse, the status flag should be set to indicate
>"in-stock".  (13)An entry in the per-request configuration "notepad" can
>contain the name of the current "item" being imported into the warehouse.
>(14)Calls to get data from warehouse (this is the "warehouse manager" part)
>should scan the warehouse entries.  (14a)If an item is "in-stock", no
>additional data-collection is needed.  (14b) If an item is in, but not
>flagged, we call ap_get_brigade until it's flagged "in-stock" by the parser
>(ONLY the parser can import to the warehouse, whereas ONLY the warehouse
>manager can actually read items from the warehouse).  (14c) If the data is
>not found and the "warehouse full" flag is set, the call fails.  (14d)
>Otherwise, we continue to call ap_get_brigade (either explicitly from the
>parser, or implicitly by simply setting the "populate-at-once" flag and
>calling ap_get_brigade once from the parser [I'd say explicit is better,
>simply becuase it allows us to contiually check the warehouse for the
>addition of our data and stop calling ap_get_brigade once our data is
>"in-stock"].)  Once we hit "warehouse full" (note that the warehouse manager
>doesn't care about EOS - all it cares about is "warehouse full") and haven't
>found our data, the call fails.

This all sounds about right.  The biggest problem is consuming the occasional
huge item that exceeds a sanity threshold, e.g. a file upload item, and that
case I'll spell out in that another post when I have a few minutes.

>I think that about covers the lifespan of an apreq call.  What do you people
>think?

I hate new metaphors if we require programmers to code to them :-)
I don't mind them at all for illustration though, yours works pretty well.

Re: dev question: apreq 2 as a filter?

Posted by Issac Goldstand <ma...@beamartyr.net>.

----- Original Message -----
From: "William A. Rowe, Jr." <wr...@rowe-clan.net>
To: "Joe Schaefer" <jo...@sunstarsys.com>
Cc: "Issac Goldstand" <ma...@beamartyr.net>; "apreq list"
<ap...@httpd.apache.org>
Sent: Sunday, August 25, 2002 9:02 AM
Subject: Re: dev question: apreq 2 as a filter?

> At 07:20 PM 8/24/2002, you wrote:
> >...
> >That is *exactly* what I'm saying; I think part of the confusion we're
> >having centers around *when* the apreq filter gets installed.  The
> >content-handler needs the ability to inject our apreq filter at runtime.
> >IMO, the injection should take place in the apreq_request_new
> >call, and the content-handler wants to call ap_get_brigade, it should
> >do it between apreq_request_new() and apreq_request_parse().
>
> Hmmm.  I'm suggesting we constantly parse the client input body
> in the same manner (once we have injected the apreq filter) without
> pausing for more data.  Until the entire body has been ap_get_brigade()'d
> the results are somehow tagged 'incomplete'.
>
> In other words, fill up those variables that parse 'complete', set aside
> the incomplete chunks, and continue to parse on the next brigade read from
> where we left off.

I really like this method.  I'm going to make one huge reply to this, and
other points, below...

> >I think Stas is arguing that the apreq filter could be injected
> >later on, perhaps inside the apreq_request_parse call, but I think
> >that makes things too complicated.
>
> It cannot be injected after ap_get_brigade, and I see no case where
> it ever should be.  It must be injected beforehand, and we have any
> number of places that would be a good idea (such as the beginning
> of the handler hook, or the insert_filter hook, or any hook prior to
that.)
>

I agree, but have a slightly different approach on how to accomplish this.
(See below :))

> To allow filters to 'peek' at the first 8+k of body before we hit the
> handler phase is also good, but I wouldn't get to the point that we
> slurp the entire POST body, only what might be necessary (the first
> few posted variables) before we hit the handler.
>
> In any case, we can never destroy the POSTed data in the input
> stack, so we need to set some arbitrary and sane limit on how much
> data can be pre-fetched before we hit the handlers, which ultimately
> are the final consumers.

Let me restate this whole process, as I see it happening, using the
warehouse lingo that we were using before.  I believe that we're on the same
wavelength here, but want to make sure...  I see three major components
here:  The filter, the parser, and the "warehouse manager"...

(1) First of all, the only real way that apreq should be installed is as an
input filter.  (2) The filter should be installed as early as possible and
(3) immediately create an empty data structure in memory - I'm not going to
say where (notes table should be fine if it's still there in Apache2),
because that's probably an entire conversation on its own.  In any case,
user intervention SHOULD take place as early as possible in Apache's
request-phase chain as possible.  ((4)Frankly, we may find useful to provide
httpd.conf directives to enable users to somewhat tweak the necessary
configurations, and provide a handler that runs as early as possible to scan
directives for each location before it starts.  The action taken would This
should include a directive to *uninstall* (or disable, or whatever) the
apreq filter, too).
Moving along, (5) mechanisms to override the default input method, which is
input and share with other filters, should be provided, and can be invoked
anywhere up until the first call to ap_get_brigade.  Frankly, it ought to
work afterwards also, but then we run the risk of other filters choking on
mangled data.  We wouldn't want another filter to do that to us, so we
oughtn't do it to them.  In any case, (6) the configuration for the
request-specific parameters of the apreq call should be read during the
first callback of the actual filter (eg, first time ap_get_brigade is called
from anywhere).  (7)At that point, a flag is set in our little
request-specific apreq notepad to tell us that we've started munching data,
and that (7a) requests for behavior changes for the request-specific apreq
call should fail (not silently - it should return failure status to user -
possibly with a reason) and (7b) the warehouse doors are now open, but the
warehouse is flagged as being "stocking in progress" (the warehouse should
most likely NOT be in the same place as the configuration directives - the
former potentially needs lots fo room, while latter doesn't).  (8)If the
"exclusive mode" flag is set for this request (file upload? It doesn't
matter - what matters is that this is the Apache 1.x style apreq that
everyone's so keen on having in apreq2), then we simply don't pass the
brigade on to the next filter, unless, of course, it's EOS.  (9)Also at this
point (this is still *first* ap_get_brigade call only), we check to see if
the "populate-at-once" flag is set for this request.   We can have a
mechanism where we continuously call ap_get_brigade until we hit EOS to do
this.  Note that the "populate-at-once" and "exclusive" modes can thus run
independantly of one-another. (10) Lastly, once EOS is recieved, we mark the
warehouse as "warehouse full" in the request-specific configuration notepad.
 What remains is the warehouse manager.  (11)I think we need a 3-key system
to manage the warehouse entries: "Data/Name", "Value" and some flag (bit?)
"Status".  To do this, the parser would start populating entries in the
warehouse as it comes in (from the filter).  (12)As soon as each entry is
completed in the warehouse, the status flag should be set to indicate
"in-stock".  (13)An entry in the per-request configuration "notepad" can
contain the name of the current "item" being imported into the warehouse.
(14)Calls to get data from warehouse (this is the "warehouse manager" part)
should scan the warehouse entries.  (14a)If an item is "in-stock", no
additional data-collection is needed.  (14b) If an item is in, but not
flagged, we call ap_get_brigade until it's flagged "in-stock" by the parser
(ONLY the parser can import to the warehouse, whereas ONLY the warehouse
manager can actually read items from the warehouse).  (14c) If the data is
not found and the "warehouse full" flag is set, the call fails.  (14d)
Otherwise, we continue to call ap_get_brigade (either explicitly from the
parser, or implicitly by simply setting the "populate-at-once" flag and
calling ap_get_brigade once from the parser [I'd say explicit is better,
simply becuase it allows us to contiually check the warehouse for the
addition of our data and stop calling ap_get_brigade once our data is
"in-stock"].)  Once we hit "warehouse full" (note that the warehouse manager
doesn't care about EOS - all it cares about is "warehouse full") and haven't
found our data, the call fails.

I think that about covers the lifespan of an apreq call.  What do you people
think?

  Issac

Re: dev question: apreq 2 as a filter?

Posted by Issac Goldstand <ma...@beamartyr.net>.

----- Original Message -----
From: "Joe Schaefer" <jo...@sunstarsys.com>
To: "Stas Bekman" <st...@stason.org>
Cc: "apreq list" <ap...@httpd.apache.org>
Sent: Saturday, August 24, 2002 10:43 AM
Subject: Re: dev question: apreq 2 as a filter?

>
> Stas Bekman <st...@stason.org> writes:
>
> > William A. Rowe, Jr. wrote:
> > > At 02:46 AM 8/23/2002, Stas Bekman wrote:
>
> [...]
>
> > >> No, the problem I'm referring to is how to invoke the filter in first
> > >> place. It won't be invoked if the response handler won't call
> > >> ap_get_brigade. Hmm, I think I know how this should work.
> > >>
> > >> Any time anybody does any enquiry from apreq, we check a flag whether
> > >> we have the data consumed and parsed (which is done already). If it
> > >> wasn't consumed yet, apreq inserts its input filter and performs the
> > >> ap_get_brigade call.
> > >
> > >
> > > Up to some, sane limit.  I wouldn't want us pulling more than 64k or
so
> > > without
> > > some extra thought.
> >
> > of course.
>
> I don't agree.  IMO (using your terminology) the warehouse should
> be off-limits until the POST data has been parsed *completely*.  That
> means *only* the content handler should be making any enquiries.
>
> Furthermore, if the content handler wants to call ap_get_brigade
> itself to get at a portion of the POST stream, it should do that
> *before* ever visiting our warehouse.  Otherwise apreq_request_parse
> should just gobble it all up.

I don't get it...  Correct me if I'm wrong, but I see two possible scenarios
for apreq2 implementation:

1)  Installed as filter - In this case, any call to ap_get_brigade will
cause data to pass through apreq (which will duly save of copy of the data
it recieves in its "warehouse").  Alternatively, any implicit or explicit
call to $q->parse will trigger apreq to call ap_get_brigade internally to
grab the data.

2) Installed Apache 1.x-style - In this case, apreq, when called upon, will
simply start slurping in the data all by itself.  However, in such a case,
it will either not use ap_get_brigade to get the data, or continuously call
ap_get_brigade until it gets EOS.  But implementing apreq like this will not
allow another application to call ap_get_brigade at its' own pace, or if it
does, apreq will choke on the missing data.

Joe, you seem to be seeing some third possibility which I must be missing,
as your comment doesn't fit either of these scenarios...  (Please don't
flame me too bad if I'm making some stupid error - this is the first time
I'm being brave enough to comment on anything Apache2 API related :-))

  Issac

-----
Maybe in order to understand mankind, we have to look at the word itself:
"Mankind". Basically, it's made up of two separate words - "mank" and "ind".
What do these words mean ? It's a mystery, just as is mankind.
 --Unknown

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> "William A. Rowe, Jr." <wr...@rowe-clan.net> writes:
> 
> 
>>At 11:18 AM 8/26/2002, Joe Schaefer wrote:
> 
> 
> [...]
> 
> 
>>>For instance, it would be a bad thing if the content-handler injects
>>>apreq at the end of the filter chain, then "does something" to cause
>>>apreq to prefetch some post DATA, and *then* wants to inject utf-8
>>>somewhere upstream from the apreq filter.
>>
>>That's easy... when you insert a filter, you can choose to insert it before
>>or after another filter.  Any filter that wants apreq results before processing
>>it's own input filtering MUST insert itself behind the apreq filter, after 
>>calling
>>the fn to inject and initialize the apreq filter.
> 
> 
> That's not the case I'm worried about.  I'm worried about the case
> where the to-be-inserted filter wants to modify the input stream 
> *before* apreq starts parsing it.  The to-be-inserted filter isn't
> interested in the apreq data whatsoever.
> 
> For example, someone may write a filter whose job is to run a SAX-ish
> XSL transform on the incoming "text/xml" data.  (Perhaps even as a fixup
> for apreq's xml parser).  We had better not have prefetched any of the 
> POST before that filter is injected upstream.

If the apreq filter copies the data it reads elsewhere, without changing 
anything passing through it, any other filters coming after it should be 
just fine, no?

In data => filterA => filterAPREQ => filter B => content handler =>
                            \                        /
                             -->--->------------->---

If that's correct, apreq filter should be placed on the list of filters, 
*after* filters that convert network packed/encoded data into normal 
data (ssl, deflate filters), but *before* any special purpose filters 
(.e.g. XSL transform filters, utf8, etc) because the content handler 
wants the body as it was sent by the client, without making any 
transformations on it.

Therefore it should be possible to insert special purpose filters after 
the apreq filter, which may mangle the original body, but this shouldn't 
affect the body seen by apreq and any content handler calls to apreq_*().

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> 
>>Joe Schaefer wrote:
> 
> 
> [...]
> 
> 
>>>That's not the case I'm worried about.  I'm worried about the case
>>>where the to-be-inserted filter wants to modify the input stream 
>>>*before* apreq starts parsing it.  The to-be-inserted filter isn't
>>>interested in the apreq data whatsoever.
>>>
>>>For example, someone may write a filter whose job is to run a SAX-ish
>>>XSL transform on the incoming "text/xml" data.  (Perhaps even as a fixup
>>>for apreq's xml parser).  We had better not have prefetched any of the 
>>>POST before that filter is injected upstream.
>>
>>If the apreq filter copies the data it reads elsewhere, without changing 
>>anything passing through it, any other filters coming after it should be 
>>just fine, no?
> 
> 
>>In data => filterA => filterAPREQ => filter B => content handler =>
>>                            \                        /
>>                             -->--->------------->---
> 
> 
> Right.  But imagine that the content handler injects filterA at runtime, 
> *after* filterAPREQ has prefetched some data. (In my example, filterA 
> was the XSL transformer).  This is clearly an error, but by whom?
> 
> Two specific questions:
> 
>   1) How would this error condition be communicated to the content
>      handler?  Would filterA's injection call be responsible for 
>      reporting it?
> 
>   2) Is this error condition predictable, in the sense that the 
>      content-handler knows when filterAPREQ might prefetch some
>      data?
> 
> I'm guessing the answer to these questions is yes, and question
> (2)'s answer might be along the lines of:  
> 
>   IO by the content handler, in *either* direction, may cause 
>   apreq to prefetch data. Until then, it should be safe to modify 
>   the input filters.
> 
> But I'm not sure, so I'd like to know what others think.

I suppose we are going to hear about similar problems with different 
other filters, it's getting harder to make things play nice with each other.

The only way to arbiter the insertion of the filters is to specify the 
filter names after and before which ones your filter should go in. 
Meaning that we might be forced to provide configuration directives 
which make things flexible and configurable by the end user.

>>If that's correct, apreq filter should be placed on the list of filters, 
>>*after* filters that convert network packed/encoded data into normal 
>>data (ssl, deflate filters), 
> 
> 
> Yes...
> 
> 
>>but *before* any special purpose filters (.e.g. XSL transform filters,
>>utf8, etc) because the content handler wants the body as it was sent
>>by the client, without making any transformations on it.
> 
> 
> This may not always be possible/desirable.  A perfect example is
> the botched empty-file-upload bug of 0.9.7 (it's missing a CRLF
> in any empty file upload block.)  Instead of writing a special-case 
> parser to deal with this, (or worse, allow our mfd parser to work around 
> it) it's far better to have an upstream filter just restore the 
> missing CRLFs.

that's a bit different. The filter you are talking about is part of the 
apreq bundle, so apreq can easily decide what's right for it and insert 
this filter before the main consuming apreq filter. I still think of it 
as a single filter, even though it could be that it's better to have a 
special filter in front. Though be careful here, adding too many filters 
will hurt performance. If it's possible to merge some filters, it'll 
probably make things faster.

But I was talking about *other* special purpose filters, which weren't 
designed to play nice with apreq.

Here is the adjusted data flow diagram:

=> filterA => APREQ filter(s)=> filter B => content handler = > 

                     \                        /
                      -->--->------------->---


__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Stas Bekman <st...@stason.org> writes:

> Joe Schaefer wrote:

[...]

> > That's not the case I'm worried about.  I'm worried about the case
> > where the to-be-inserted filter wants to modify the input stream 
> > *before* apreq starts parsing it.  The to-be-inserted filter isn't
> > interested in the apreq data whatsoever.
> > 
> > For example, someone may write a filter whose job is to run a SAX-ish
> > XSL transform on the incoming "text/xml" data.  (Perhaps even as a fixup
> > for apreq's xml parser).  We had better not have prefetched any of the 
> > POST before that filter is injected upstream.
> 
> If the apreq filter copies the data it reads elsewhere, without changing 
> anything passing through it, any other filters coming after it should be 
> just fine, no?

> 
> In data => filterA => filterAPREQ => filter B => content handler =>
>                             \                        /
>                              -->--->------------->---

Right.  But imagine that the content handler injects filterA at runtime, 
*after* filterAPREQ has prefetched some data. (In my example, filterA 
was the XSL transformer).  This is clearly an error, but by whom?

Two specific questions:

  1) How would this error condition be communicated to the content
     handler?  Would filterA's injection call be responsible for 
     reporting it?

  2) Is this error condition predictable, in the sense that the 
     content-handler knows when filterAPREQ might prefetch some
     data?

I'm guessing the answer to these questions is yes, and question
(2)'s answer might be along the lines of:  

  IO by the content handler, in *either* direction, may cause 
  apreq to prefetch data. Until then, it should be safe to modify 
  the input filters.

But I'm not sure, so I'd like to know what others think.

> If that's correct, apreq filter should be placed on the list of filters, 
> *after* filters that convert network packed/encoded data into normal 
> data (ssl, deflate filters), 

Yes...

> but *before* any special purpose filters (.e.g. XSL transform filters,
> utf8, etc) because the content handler wants the body as it was sent
> by the client, without making any transformations on it.

This may not always be possible/desirable.  A perfect example is
the botched empty-file-upload bug of 0.9.7 (it's missing a CRLF
in any empty file upload block.)  Instead of writing a special-case 
parser to deal with this, (or worse, allow our mfd parser to work around 
it) it's far better to have an upstream filter just restore the 
missing CRLFs.

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

"William A. Rowe, Jr." <wr...@rowe-clan.net> writes:

> At 11:18 AM 8/26/2002, Joe Schaefer wrote:

[...]

> >For instance, it would be a bad thing if the content-handler injects
> >apreq at the end of the filter chain, then "does something" to cause
> >apreq to prefetch some post DATA, and *then* wants to inject utf-8
> >somewhere upstream from the apreq filter.
> 
> That's easy... when you insert a filter, you can choose to insert it before
> or after another filter.  Any filter that wants apreq results before processing
> it's own input filtering MUST insert itself behind the apreq filter, after 
> calling
> the fn to inject and initialize the apreq filter.

That's not the case I'm worried about.  I'm worried about the case
where the to-be-inserted filter wants to modify the input stream 
*before* apreq starts parsing it.  The to-be-inserted filter isn't
interested in the apreq data whatsoever.

For example, someone may write a filter whose job is to run a SAX-ish
XSL transform on the incoming "text/xml" data.  (Perhaps even as a fixup
for apreq's xml parser).  We had better not have prefetched any of the 
POST before that filter is injected upstream.

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 11:18 AM 8/26/2002, Joe Schaefer wrote:
>I think the apreq filter belongs somewhere towards the end of the
>input filters (closer to the content-handler than the HTTP_INPUT filter).
>What concerns me here is that whatever mechanism we use to prefetch
>some of the POST data may cause a problem with other filters that are
>injected by the content handler.
>
>For instance, it would be a bad thing if the content-handler injects
>apreq at the end of the filter chain, then "does something" to cause
>apreq to prefetch some post DATA, and *then* wants to inject utf-8
>somewhere upstream from the apreq filter.

That's easy... when you insert a filter, you can choose to insert it before
or after another filter.  Any filter that wants apreq results before processing
it's own input filtering MUST insert itself behind the apreq filter, after 
calling
the fn to inject and initialize the apreq filter.

Bill

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

"William A. Rowe, Jr." <wr...@rowe-clan.net> writes:

> At 02:07 AM 8/26/2002, Stas Bekman wrote:

[...]

> >The things that I see weird about this is that the normal filter is not 
> >supposed to call ap_get_brigade more than once. our apreq_ filter calls 
> >ap_get_brigade more than once, because if it doesn't, there is no way to 
> >consume the data (the response handler) will usually not ask for the raw 
> >body. So apreq_ is really a semi-filter, since it acts as a filter and 
> >consumer at the same time.
> 
> I'm suggesting we have a broken assumption... until you have pulled x bytes
> of data, you won't have x bytes of information.

Right; please see the appended steps (g), (h), and (i) that I sent in
my followup to Stas.

> This all implies that the body must be slurped by the handler, complete,
> before the filters can trust that they have all the variables.  That's
> why I'd  a status flag, NOT_READ, IN_PROGRESS, COMPLETE and NO_BODY
> (in that order so that <COMPLETE has a well-defined meaning.)

+1 to adopting Bill's conventions for the status flag.

> Because filters work in-line, this means a filter authors will have to trust
> handler authors to slurp the client's body before sending their response.
> Since we can't trust that, a given filter can insist we continue to read and
> set aside the client's body until it's complete [or hits some arbitrary len]
> so that filter can 'react'.

OK, that makes sense.  *Now* I see why we may need to set aside
some of the client's body.

> That could happen after the handler sends the first output brigade.
> Since the filter isn't prepared to process that brigade without
> complete knowledge of the post input body, it will have to block on
> some apreq_complete_get_brigade() sort of call.  That apreq filter
> will have to set aside the read so that it's fulfilled when the handler
> resumes reading the client body.

Got it.

[...]

> This adds one interesting question... where in the filter stack does the
> apreq filter belong?  After other filters transform the input?  I'd presume
> so [if we are trying to normalize the input to, say, utf-8, we would want to
> do so before we parse the input into apreq variables.]

I think the apreq filter belongs somewhere towards the end of the 
input filters (closer to the content-handler than the HTTP_INPUT filter).
What concerns me here is that whatever mechanism we use to prefetch
some of the POST data may cause a problem with other filters that are
injected by the content handler.

For instance, it would be a bad thing if the content-handler injects
apreq at the end of the filter chain, then "does something" to cause
apreq to prefetch some post DATA, and *then* wants to inject utf-8
somewhere upstream from the apreq filter.

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 02:07 AM 8/26/2002, Stas Bekman wrote:
>Joe Schaefer wrote:
>>Joe Schaefer <jo...@sunstarsys.com> writes:
>>[...]
>>
>>>I think the apreq filter can/should operate in a completely
>>>transparent way, since all it has to do is read a copy of the buckets
>>>into the apreq_list _as the upstream_ _filters dictate_.  Every time
>>>our filter is invoked, it can make a stab at parsing the apreq_list
>>>data, so the list should never get very big.
>>
>>Um, you may need to s/upstream/downstream/g in everything I wrote in
>>the aforementioned post.  It'd be nice if what I write actually matched 
>>the picture in my head :-)
>
>The things that I see weird about this is that the normal filter is not 
>supposed to call ap_get_brigade more than once. our apreq_ filter calls 
>ap_get_brigade more than once, because if it doesn't, there is no way to 
>consume the data (the response handler) will usually not ask for the raw 
>body. So apreq_ is really a semi-filter, since it acts as a filter and 
>consumer at the same time.

I'm suggesting we have a broken assumption... until you have pulled x bytes
of data, you won't have x bytes of information.

This all implies that the body must be slurped by the handler, complete,
before the filters can trust that they have all the variables.  That's why I'd
a status flag, NOT_READ, IN_PROGRESS, COMPLETE and NO_BODY
(in that order so that <COMPLETE has a well-defined meaning.)

Because filters work in-line, this means a filter authors will have to trust
handler authors to slurp the client's body before sending their response.
Since we can't trust that, a given filter can insist we continue to read and
set aside the client's body until it's complete [or hits some arbitrary len]
so that filter can 'react'.

That could happen after the handler sends the first output brigade.
Since the filter isn't prepared to process that brigade without
complete knowledge of the post input body, it will have to block on
some apreq_complete_get_brigade() sort of call.  That apreq filter
will have to set aside the read so that it's fulfilled when the handler
resumes reading the client body.

>Not sure why have you added a note about s/upstream/downstream/g, any 
>filter cares only about the upstream filter (which may block), because 
>that's where the data is coming from. it passes through the data to the 
>downstream filter, but it doesn't care about it.

This adds one interesting question... where in the filter stack does the
apreq filter belong?  After other filters transform the input?  I'd presume
so [if we are trying to normalize the input to, say, utf-8, we would want to
do so before we parse the input into apreq variables.]

Bill

Re: dev question: apreq 2 as a filter?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 03:09 AM 8/26/2002, Issac Goldstand wrote:

> > The things that I see weird about this is that the normal filter is not
> > supposed to call ap_get_brigade more than once. our apreq_ filter calls
> > ap_get_brigade more than once, because if it doesn't, there is no way to
> > consume the data (the response handler) will usually not ask for the raw
> > body. So apreq_ is really a semi-filter, since it acts as a filter and
> > consumer at the same time.
>
>Not necessarily.  My example from yesterday proposed two distinct operating
>"modes"; one which will do this, and the other default one, which will not.

Note that all handlers are required to consume the post body.  The dev@httpd
list already had to pound through that issue more than once.  This should
assure we are safe.

In the end-game, there will be a very limited number of 'handlers'.  The 
handler
to serve filesystem documents, a handler to generate autoindexes, handlers
to serve data from SQL or .tar.gz content warehouses, etc.  For the most part,
many 'handlers' today are really filters around a data store.  Those cases
should become true filters, then we have a very limited number of real handlers
to enforce a given behavior upon.

If that behavior is that the client must read the body, then that is the 
rule.
In the general case today, this apreq input filter needs to be smart enough
to read and set aside the input until the handler is willing to consume it.

Bill

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> "Issac Goldstand" <ma...@beamartyr.net> writes:
> 
> [...]
> 
> 
>>>the user of apreq() generally has no idea that ap_get_brigade() exists
>>>at all. It just calls apreq_*() methods and the handler part of the
>>>apreq calls ap_get_brigade.
>>>
>>>Think of it this way: the mod_perl 1.0 handlers using Apache::Request
>>>should work under 2.0 unmodified. Now you get the idea.
>>
>>No, quite the contrary, you proved my point...  Consider the two possible
>>scenarios:
>>
>>1) Old apreq_1-style: A simple handler calls $q->parse, or
>>$q->param('somevalue').  In this case, the user is making a request from the
>>"warehouse manager".  If the entry is not "in stock", apreq will continue to
>>call ap_get_brigade() until the warehouse manager can return something
>>(either a value or an error if the parser set "warehouse full" in response
>>to EOS).
>>2) New Apache 2-style:  A more complex handler or filter wants to start
>>reading in the post body.  Since content-handlers are supposed to read the
>>data, a programmer might call ap_get_brigade() on his own.  
> 
>                      ^^^^^
> 
> Or s/he might not, which is certainly the case for the current user 
> base.  It is important that we not *force* them to deal with the 
> additional complexity that a filter-based implementation will require.
> If they *want* to, that's a different story :-).  That's what I see 
> Stas as saying here.

Yes, the user API should be very simple and transparent. If you call 
$q->parse, which you really don't need it, as it'll be done behind the 
scenes, it'll block till it gets all the data.

> btw- let's not call our existing API "apreq 1" style.  I'm very 
> happy to consider completely revamping our current implementation,
> and even broadening the scope of apreq beyond httpd.  ( Based on
> the dev@httpd comments so far, it looks to me like the parser-related 
> code may get "generified" and wind up going in an APR-* project. 
> I'm cool with that :-)
> 
> However, forced modifications to the core API aren't likely to get 
> my vote.  To be specific, I'd be against any implementation that 
> would *require* significant modification of the existing test suite.
> So far, I don't see any problem here, and based on the discussion so
> far, I'm not expecting this will ever become a problem.

I'm fine with any of your decisions. My point is that end user should 
have a similar experience to apreq 1, in terms of simplicity of usage 
and ideally basic API (param, upload, etc) shouldn't change. If it does, 
we (the perl side) will make the necessary adjustments so it'll stay the 
same. So no problem here.



__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

"Issac Goldstand" <ma...@beamartyr.net> writes:

[...]

> > the user of apreq() generally has no idea that ap_get_brigade() exists
> > at all. It just calls apreq_*() methods and the handler part of the
> > apreq calls ap_get_brigade.
> >
> > Think of it this way: the mod_perl 1.0 handlers using Apache::Request
> > should work under 2.0 unmodified. Now you get the idea.
> 
> No, quite the contrary, you proved my point...  Consider the two possible
> scenarios:
> 
> 1) Old apreq_1-style: A simple handler calls $q->parse, or
> $q->param('somevalue').  In this case, the user is making a request from the
> "warehouse manager".  If the entry is not "in stock", apreq will continue to
> call ap_get_brigade() until the warehouse manager can return something
> (either a value or an error if the parser set "warehouse full" in response
> to EOS).
> 2) New Apache 2-style:  A more complex handler or filter wants to start
> reading in the post body.  Since content-handlers are supposed to read the
> data, a programmer might call ap_get_brigade() on his own.  
                     ^^^^^

Or s/he might not, which is certainly the case for the current user 
base.  It is important that we not *force* them to deal with the 
additional complexity that a filter-based implementation will require.
If they *want* to, that's a different story :-).  That's what I see 
Stas as saying here.

btw- let's not call our existing API "apreq 1" style.  I'm very 
happy to consider completely revamping our current implementation,
and even broadening the scope of apreq beyond httpd.  ( Based on
the dev@httpd comments so far, it looks to me like the parser-related 
code may get "generified" and wind up going in an APR-* project. 
I'm cool with that :-)

However, forced modifications to the core API aren't likely to get 
my vote.  To be specific, I'd be against any implementation that 
would *require* significant modification of the existing test suite.
So far, I don't see any problem here, and based on the discussion so
far, I'm not expecting this will ever become a problem.

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Issac Goldstand <ma...@beamartyr.net>.

----- Original Message -----
From: "Stas Bekman" <st...@stason.org>
To: "Issac Goldstand" <ig...@followmecorp.com>
Cc: "Joe Schaefer" <jo...@sunstarsys.com>; "William A. Rowe, Jr."
<wr...@rowe-clan.net>; "Issac Goldstand" <ma...@beamartyr.net>; "apreq
list" <ap...@httpd.apache.org>
Sent: Tuesday, August 27, 2002 2:16 PM
Subject: Re: dev question: apreq 2 as a filter?


> Issac Goldstand wrote:
> > ----- Original Message -----
> > From: "Stas Bekman" <st...@stason.org>
> > To: "Joe Schaefer" <jo...@sunstarsys.com>
> > Cc: "William A. Rowe, Jr." <wr...@rowe-clan.net>; "Issac Goldstand"
> > <ma...@beamartyr.net>; "apreq list" <ap...@httpd.apache.org>
> > Sent: Tuesday, August 27, 2002 6:29 AM
> > Subject: Re: dev question: apreq 2 as a filter?
> >
> >
> >
> >>Joe Schaefer wrote:
> >>
> >>>Stas Bekman <st...@stason.org> writes:
> >>>
> >>>
> >>>
> >>>>Joe Schaefer wrote:
> >>>>
> >>>>
> >>>>>Joe Schaefer <jo...@sunstarsys.com> writes:
> >>>>>
> >>>>>[...]
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>I think the apreq filter can/should operate in a completely
> >>>>>>transparent way, since all it has to do is read a copy of the
buckets
> >>>>>>into the apreq_list _as the upstream_ _filters dictate_.  Every time
> >>>>>>our filter is invoked, it can make a stab at parsing the apreq_list
> >>>>>>data, so the list should never get very big.
> >>>>>
> >>>>>
> >>>>>Um, you may need to s/upstream/downstream/g in everything I wrote in
> >>>>>the aforementioned post.  It'd be nice if what I write actually
matched
> >>>>>the picture in my head :-)
> >>>>
> >>>>The things that I see weird about this is that the normal filter is
not
> >>>>supposed to call ap_get_brigade more than once.
> >>>
> >>>
> >>>It is, I just didn't spell it out.  In the filter section, there
> >>>should've been a
> >>>
> >>>  (g) content-handler calls ap_get_brigade again, and winds up
> >>>      engaging the apreq filter again.  The filter picks up where
> >>>      it last left off.
> >>
> >>it's really the apreq part of the content handler is the one that calls
> >>ap_get_brigade, the user code only calls apreq_*() calls and shouldn't
> >>have to do anything with ap_get_brigade.
> >
> >
> > Wait - I thought that the *user* calls ap_get_brigade() which causes
data to
> > pass through the apreq filter?  The only time apreq should directly call
> > ap_get_brigade() on its own is in reponse to a query made by the
"warehouse
> > manager" if the entry in question is not marked "in stock"...
>
> the user of apreq() generally has no idea that ap_get_brigade() exists
> at all. It just calls apreq_*() methods and the handler part of the
> apreq calls ap_get_brigade.
>
> Think of it this way: the mod_perl 1.0 handlers using Apache::Request
> should work under 2.0 unmodified. Now you get the idea.

No, quite the contrary, you proved my point...  Consider the two possible
scenarios:

1) Old apreq_1-style: A simple handler calls $q->parse, or
$q->param('somevalue').  In this case, the user is making a request from the
"warehouse manager".  If the entry is not "in stock", apreq will continue to
call ap_get_brigade() until the warehouse manager can return something
(either a value or an error if the parser set "warehouse full" in response
to EOS).
2) New Apache 2-style:  A more complex handler or filter wants to start
reading in the post body.  Since content-handlers are supposed to read the
data, a programmer might call ap_get_brigade() on his own.  This presents no
problem as the data will go through our filter anyway, which will in turn
happily pass the data to the apreq_parser.  Technically, as far as I've
understood, this is actually proper behavior for Apache 2 handlers...  Am I
missing something here?

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Issac Goldstand wrote:
> ----- Original Message -----
> From: "Stas Bekman" <st...@stason.org>
> To: "Joe Schaefer" <jo...@sunstarsys.com>
> Cc: "William A. Rowe, Jr." <wr...@rowe-clan.net>; "Issac Goldstand"
> <ma...@beamartyr.net>; "apreq list" <ap...@httpd.apache.org>
> Sent: Tuesday, August 27, 2002 6:29 AM
> Subject: Re: dev question: apreq 2 as a filter?
> 
> 
> 
>>Joe Schaefer wrote:
>>
>>>Stas Bekman <st...@stason.org> writes:
>>>
>>>
>>>
>>>>Joe Schaefer wrote:
>>>>
>>>>
>>>>>Joe Schaefer <jo...@sunstarsys.com> writes:
>>>>>
>>>>>[...]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I think the apreq filter can/should operate in a completely
>>>>>>transparent way, since all it has to do is read a copy of the buckets
>>>>>>into the apreq_list _as the upstream_ _filters dictate_.  Every time
>>>>>>our filter is invoked, it can make a stab at parsing the apreq_list
>>>>>>data, so the list should never get very big.
>>>>>
>>>>>
>>>>>Um, you may need to s/upstream/downstream/g in everything I wrote in
>>>>>the aforementioned post.  It'd be nice if what I write actually matched
>>>>>the picture in my head :-)
>>>>
>>>>The things that I see weird about this is that the normal filter is not
>>>>supposed to call ap_get_brigade more than once.
>>>
>>>
>>>It is, I just didn't spell it out.  In the filter section, there
>>>should've been a
>>>
>>>  (g) content-handler calls ap_get_brigade again, and winds up
>>>      engaging the apreq filter again.  The filter picks up where
>>>      it last left off.
>>
>>it's really the apreq part of the content handler is the one that calls
>>ap_get_brigade, the user code only calls apreq_*() calls and shouldn't
>>have to do anything with ap_get_brigade.
> 
> 
> Wait - I thought that the *user* calls ap_get_brigade() which causes data to
> pass through the apreq filter?  The only time apreq should directly call
> ap_get_brigade() on its own is in reponse to a query made by the "warehouse
> manager" if the entry in question is not marked "in stock"...

the user of apreq() generally has no idea that ap_get_brigade() exists 
at all. It just calls apreq_*() methods and the handler part of the 
apreq calls ap_get_brigade.

Think of it this way: the mod_perl 1.0 handlers using Apache::Request 
should work under 2.0 unmodified. Now you get the idea.

>>>  (h) the content-handler repeats (g) until it has whatever portion
>>>      of the POST data it wants.
>>>
>>>  (i) the content-handler wants some data from our warehouse.  The
>>>      apreq library calls apreq_request_parse to complete the parsing
>>>      of POST data (should it need to), and *then* fetches the data
>>>      requested.
>>>
>>>In the model I'm  presenting, the apreq filter *never* consumes
>>>more data than the downstream filter has requested of it.  If
>>>the downstream filter asks for 2KB, we should read in at most
>>>2KB, and use the filter's state mechanism to keep track of where
>>>we left off.  However, as soon as the content-handler wants something
>>>from the warehouse, it must abandoned any future claims to the remaining
>>>POST data, since apreq needs the full amount in order to access the
>>>warehouse and may call apreq_request_parse (prior to any access) to
>>>enforce that.
>>
>>ok. just need to remember to apreq now is really two parts.
> 
> 
> Even 3 parts according to my approach - the filter, the parser and the
> warehouse manager.

that's a secondary separation. The first one into the filter and the 
response handler part, the latter is then can be split further.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Issac Goldstand <ig...@followmecorp.com>.

----- Original Message -----
From: "Stas Bekman" <st...@stason.org>
To: "Joe Schaefer" <jo...@sunstarsys.com>
Cc: "William A. Rowe, Jr." <wr...@rowe-clan.net>; "Issac Goldstand"
<ma...@beamartyr.net>; "apreq list" <ap...@httpd.apache.org>
Sent: Tuesday, August 27, 2002 6:29 AM
Subject: Re: dev question: apreq 2 as a filter?


> Joe Schaefer wrote:
> > Stas Bekman <st...@stason.org> writes:
> >
> >
> >>Joe Schaefer wrote:
> >>
> >>>Joe Schaefer <jo...@sunstarsys.com> writes:
> >>>
> >>>[...]
> >>>
> >>>
> >>>
> >>>>I think the apreq filter can/should operate in a completely
> >>>>transparent way, since all it has to do is read a copy of the buckets
> >>>>into the apreq_list _as the upstream_ _filters dictate_.  Every time
> >>>>our filter is invoked, it can make a stab at parsing the apreq_list
> >>>>data, so the list should never get very big.
> >>>
> >>>
> >>>Um, you may need to s/upstream/downstream/g in everything I wrote in
> >>>the aforementioned post.  It'd be nice if what I write actually matched
> >>>the picture in my head :-)
> >>
> >>The things that I see weird about this is that the normal filter is not
> >>supposed to call ap_get_brigade more than once.
> >
> >
> > It is, I just didn't spell it out.  In the filter section, there
> > should've been a
> >
> >   (g) content-handler calls ap_get_brigade again, and winds up
> >       engaging the apreq filter again.  The filter picks up where
> >       it last left off.
>
> it's really the apreq part of the content handler is the one that calls
> ap_get_brigade, the user code only calls apreq_*() calls and shouldn't
> have to do anything with ap_get_brigade.

Wait - I thought that the *user* calls ap_get_brigade() which causes data to
pass through the apreq filter?  The only time apreq should directly call
ap_get_brigade() on its own is in reponse to a query made by the "warehouse
manager" if the entry in question is not marked "in stock"...

> >   (h) the content-handler repeats (g) until it has whatever portion
> >       of the POST data it wants.
> >
> >   (i) the content-handler wants some data from our warehouse.  The
> >       apreq library calls apreq_request_parse to complete the parsing
> >       of POST data (should it need to), and *then* fetches the data
> >       requested.
> >
> > In the model I'm  presenting, the apreq filter *never* consumes
> > more data than the downstream filter has requested of it.  If
> > the downstream filter asks for 2KB, we should read in at most
> > 2KB, and use the filter's state mechanism to keep track of where
> > we left off.  However, as soon as the content-handler wants something
> > from the warehouse, it must abandoned any future claims to the remaining
> > POST data, since apreq needs the full amount in order to access the
> > warehouse and may call apreq_request_parse (prior to any access) to
> > enforce that.
>
> ok. just need to remember to apreq now is really two parts.

Even 3 parts according to my approach - the filter, the parser and the
warehouse manager.

> > [...]
> >
> >
> >>Not sure why have you added a note about s/upstream/downstream/g, any
> >>filter cares only about the upstream filter (which may block), because
> >>that's where the data is coming from. it passes through the data to the
> >>downstream filter, but it doesn't care about it.
> >
> >
> > Does this clear things up now?
>
> I guess so. Having a spec will help, as the current emails with
> scenarios are getting longer and longer and many repeat themselves using
> different wordings :)
>

Agreed.

  Issac

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> 
>>Joe Schaefer wrote:
>>
>>>Joe Schaefer <jo...@sunstarsys.com> writes:
>>>
>>>[...]
>>>
>>>
>>>
>>>>I think the apreq filter can/should operate in a completely
>>>>transparent way, since all it has to do is read a copy of the buckets
>>>>into the apreq_list _as the upstream_ _filters dictate_.  Every time
>>>>our filter is invoked, it can make a stab at parsing the apreq_list
>>>>data, so the list should never get very big.
>>>
>>>
>>>Um, you may need to s/upstream/downstream/g in everything I wrote in
>>>the aforementioned post.  It'd be nice if what I write actually matched 
>>>the picture in my head :-)
>>
>>The things that I see weird about this is that the normal filter is not 
>>supposed to call ap_get_brigade more than once. 
> 
> 
> It is, I just didn't spell it out.  In the filter section, there
> should've been a 
> 
>   (g) content-handler calls ap_get_brigade again, and winds up
>       engaging the apreq filter again.  The filter picks up where 
>       it last left off.

it's really the apreq part of the content handler is the one that calls 
ap_get_brigade, the user code only calls apreq_*() calls and shouldn't 
have to do anything with ap_get_brigade.

>   (h) the content-handler repeats (g) until it has whatever portion
>       of the POST data it wants.
> 
>   (i) the content-handler wants some data from our warehouse.  The
>       apreq library calls apreq_request_parse to complete the parsing
>       of POST data (should it need to), and *then* fetches the data 
>       requested.
>         
> In the model I'm  presenting, the apreq filter *never* consumes 
> more data than the downstream filter has requested of it.  If 
> the downstream filter asks for 2KB, we should read in at most 
> 2KB, and use the filter's state mechanism to keep track of where 
> we left off.  However, as soon as the content-handler wants something 
> from the warehouse, it must abandoned any future claims to the remaining 
> POST data, since apreq needs the full amount in order to access the 
> warehouse and may call apreq_request_parse (prior to any access) to 
> enforce that.

ok. just need to remember to apreq now is really two parts.

> [...]
> 
> 
>>Not sure why have you added a note about s/upstream/downstream/g, any 
>>filter cares only about the upstream filter (which may block), because 
>>that's where the data is coming from. it passes through the data to the 
>>downstream filter, but it doesn't care about it.
> 
> 
> Does this clear things up now?

I guess so. Having a spec will help, as the current emails with 
scenarios are getting longer and longer and many repeat themselves using 
different wordings :)

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Issac Goldstand <ma...@beamartyr.net>.

----- Original Message -----
From: "Stas Bekman" <st...@stason.org>
To: "Joe Schaefer" <jo...@sunstarsys.com>
Cc: "William A. Rowe, Jr." <wr...@rowe-clan.net>; "Issac Goldstand"
<ma...@beamartyr.net>; "apreq list" <ap...@httpd.apache.org>
Sent: Monday, August 26, 2002 10:07 AM
Subject: Re: dev question: apreq 2 as a filter?


> Joe Schaefer wrote:
> > Joe Schaefer <jo...@sunstarsys.com> writes:
> >
> > [...]
> >
> >
> >>I think the apreq filter can/should operate in a completely
> >>transparent way, since all it has to do is read a copy of the buckets
> >>into the apreq_list _as the upstream_ _filters dictate_.  Every time
> >>our filter is invoked, it can make a stab at parsing the apreq_list
> >>data, so the list should never get very big.
> >
> >
> > Um, you may need to s/upstream/downstream/g in everything I wrote in
> > the aforementioned post.  It'd be nice if what I write actually matched
> > the picture in my head :-)
>
> The things that I see weird about this is that the normal filter is not
> supposed to call ap_get_brigade more than once. our apreq_ filter calls
> ap_get_brigade more than once, because if it doesn't, there is no way to
> consume the data (the response handler) will usually not ask for the raw
> body. So apreq_ is really a semi-filter, since it acts as a filter and
> consumer at the same time.

Not necessarily.  My example from yesterday proposed two distinct operating
"modes"; one which will do this, and the other default one, which will not.

  Issac

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Stas Bekman <st...@stason.org> writes:

> Joe Schaefer wrote:
> > Joe Schaefer <jo...@sunstarsys.com> writes:
> > 
> > [...]
> > 
> > 
> >>I think the apreq filter can/should operate in a completely
> >>transparent way, since all it has to do is read a copy of the buckets
> >>into the apreq_list _as the upstream_ _filters dictate_.  Every time
> >>our filter is invoked, it can make a stab at parsing the apreq_list
> >>data, so the list should never get very big.
> > 
> > 
> > Um, you may need to s/upstream/downstream/g in everything I wrote in
> > the aforementioned post.  It'd be nice if what I write actually matched 
> > the picture in my head :-)
> 
> The things that I see weird about this is that the normal filter is not 
> supposed to call ap_get_brigade more than once. 

It is, I just didn't spell it out.  In the filter section, there
should've been a 

  (g) content-handler calls ap_get_brigade again, and winds up
      engaging the apreq filter again.  The filter picks up where 
      it last left off.

  (h) the content-handler repeats (g) until it has whatever portion
      of the POST data it wants.

  (i) the content-handler wants some data from our warehouse.  The
      apreq library calls apreq_request_parse to complete the parsing
      of POST data (should it need to), and *then* fetches the data 
      requested.

In the model I'm  presenting, the apreq filter *never* consumes 
more data than the downstream filter has requested of it.  If 
the downstream filter asks for 2KB, we should read in at most 
2KB, and use the filter's state mechanism to keep track of where 
we left off.  However, as soon as the content-handler wants something 
from the warehouse, it must abandoned any future claims to the remaining 
POST data, since apreq needs the full amount in order to access the 
warehouse and may call apreq_request_parse (prior to any access) to 
enforce that.

[...]

> Not sure why have you added a note about s/upstream/downstream/g, any 
> filter cares only about the upstream filter (which may block), because 
> that's where the data is coming from. it passes through the data to the 
> downstream filter, but it doesn't care about it.

Does this clear things up now?

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> Joe Schaefer <jo...@sunstarsys.com> writes:
> 
> [...]
> 
> 
>>I think the apreq filter can/should operate in a completely
>>transparent way, since all it has to do is read a copy of the buckets
>>into the apreq_list _as the upstream_ _filters dictate_.  Every time
>>our filter is invoked, it can make a stab at parsing the apreq_list
>>data, so the list should never get very big.
> 
> 
> Um, you may need to s/upstream/downstream/g in everything I wrote in
> the aforementioned post.  It'd be nice if what I write actually matched 
> the picture in my head :-)

The things that I see weird about this is that the normal filter is not 
supposed to call ap_get_brigade more than once. our apreq_ filter calls 
ap_get_brigade more than once, because if it doesn't, there is no way to 
consume the data (the response handler) will usually not ask for the raw 
body. So apreq_ is really a semi-filter, since it acts as a filter and 
consumer at the same time.

Not sure why have you added a note about s/upstream/downstream/g, any 
filter cares only about the upstream filter (which may block), because 
that's where the data is coming from. it passes through the data to the 
downstream filter, but it doesn't care about it.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Joe Schaefer <jo...@sunstarsys.com> writes:

[...]

> I think the apreq filter can/should operate in a completely
> transparent way, since all it has to do is read a copy of the buckets
> into the apreq_list _as the upstream_ _filters dictate_.  Every time
> our filter is invoked, it can make a stab at parsing the apreq_list
> data, so the list should never get very big.

Um, you may need to s/upstream/downstream/g in everything I wrote in
the aforementioned post.  It'd be nice if what I write actually matched 
the picture in my head :-)

ENOCAFFEINE, sorry about that.

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

"William A. Rowe, Jr." <wr...@rowe-clan.net> writes:

[...]

> Let me be clear, I would strongly oppose (veto? dunno) the suggestion of
> two truly distinct mechansims for apreq to operate.  

So would I.

> Why?  Because we earn two places for lingering bugs and security holes
> instead of one.

Exactly right.

> As a filter, any filter and the final handler can share the apreq results.
> Even if we provide a 1.3-style lookalike, it should still be implemented
> in terms of the same filter we are discussing.

That's what I'm after also.

> >Joe, you seem to be seeing some third possibility which I must be missing,
> >as your comment doesn't fit either of these scenarios...  (Please don't
> >flame me too bad if I'm making some stupid error - this is the first time
> >I'm being brave enough to comment on anything Apache2 API related :-))
> 
> No, I read his suggestion as two courses, and suggest we narrow it down
> to the one truly generic solution.

I don't see why you're reading me that way.  I think the apreq filter
can/should operate in a completely transparent way, since all it has to 
do is read a copy of the buckets into the apreq_list _as the upstream_
_filters dictate_.  Every time our filter is invoked, it can make a
stab at parsing the apreq_list data, so the list should never get
very big.

Let me try to explain how we handle a file upload right now, and 
how I think we can do it using a filter.  I'm just trying to flesh
out a bit of the details so we can focus on where our viewpoints
diverge.

NOW: (apreq_list as a sink)

  1) apreq_request_new() sets up the parser stack and returns 
     the address of our warehouse.
  2) the content-handler does some intermediary work unrelated to
     apreq ...
  3) it now wants access to the warehouse; winds up calling 
     apreq_request_parse()
     a) the apreq_parser_mfd parser is engaged
     b) the mfd parser initializes itself from the request headers
     c) the parser asks apreq_list_read to locate a header block
        * apreq_list_read calls ap_get_brigade, asking for ~ 8KB
        * apreq_list_read flattens the resultant brigade into its list
        * apreq_list_read clears the brigade
        * apreq_list_read scans the list for a CRLF CRLF marker.
        * if it hasn't found one yet, it repeats the cycle.
     d) the parser parses the header block and determines that
        we're about to read a file upload.
     e) the parser enters a for(;;) loop, calling apreq_list_read
        until it returns 0 bytes.
        * apreq_list_read calls ap_get_brigade, asking for ~8KB 
        * apreq_list_read flattens the brigade into the list
        * apreq_list_read destroys the buckets in the brigade
        * apreq_list_read scans the list for an "end of data" marker.
        * apreq_list_read returns whatever its got so far 
          (up to the "end of data" marker).
        * the parser writes that returned data to a tempfile.
     f) goto (c), which exits the parser and sets the req->status = OK
        since there's no headers left to parse.

FILTER: (apreq_list as a ``non blocking'' pass-thru filter)

  1) apreq_request_new() injects the apreq filter, sets up the parser
     stack inside the filter, and returns the address of our warehouse.
  2) the content-handler does some intermediary work unrelated to
     apreq ...
  3) it makes a call to ap_get_brigade, which engages the apreq filter
     a) the filter engages the mfd parser
     b) the parser initializes itself from the request headers
    c1) the parser enters a MFD_HEADER state.
    c2) in MFD_HEADER state, the parser asks apreq_list_read to 
        locate a header block
        * apreq_list_read calls ap_get_brigade, asking for some 
          amount depending on the upstream filter
        * apreq_list_read flattens the resultant brigade into its list
        * apreq_list_read clears its brigade, but leaves the buckets alone
        * apreq_list_read scans the list for a CRLF CRLF marker.
        * if it hasn't found one yet, returns an "EWOULDBLOCK"
          condition to the parser.
    d1) on a successful return, the parser parses the headers
        from the list, and enters a MFD_DATA state.  Otherwise, 
        it returns control to the upstream parser here.
    d2) in MFD_DATA state, the parser determines we're about to
        read a file upload.
     e) the parser asks apreq_list_read to fetch a block of data:
        * apreq_list_read calls ap_get_brigade, asking for some 
          amount depending on the upstream filter
        * apreq_list_read flattens the resultant brigade into its list
        * apreq_list_read clears its brigade, but leaves the buckets alone
        * apreq_list_read scans the list for an "end of data" marker.
        * return whatever its got so far (up to the "end of data" marker).
        * the parser writes that return data to a temp file
    f1) if the list returned 0 bytes, goto (c1).  In this case,
        we'll need to reduce the amount of we ask for in c2

    f2) Otherwise return control to the upstream parser here.

In this scenario, the apreq filter never consumes more data than the
upstream filter requests.  Even if the file upload is huge, the
associated apreq_list will never get larger than ~ 32KB, and the
mfd parser will be writing the data blocks directly to disk.

I readily admit I still don't understand how ap_get_brigade
really works, and am still muddy about the relationship is between 
filters, buckets, and brigades, so some of my hopes for the
apeq filter may be somwehat naive.

How does your proposal differ?

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Stas Bekman <st...@stason.org> writes:

> William A. Rowe, Jr. wrote:
> > At 02:46 AM 8/23/2002, Stas Bekman wrote:

[...]

> >> No, the problem I'm referring to is how to invoke the filter in first 
> >> place. It won't be invoked if the response handler won't call 
> >> ap_get_brigade. Hmm, I think I know how this should work.
> >>
> >> Any time anybody does any enquiry from apreq, we check a flag whether 
> >> we have the data consumed and parsed (which is done already). If it 
> >> wasn't consumed yet, apreq inserts its input filter and performs the 
> >> ap_get_brigade call.
> > 
> > 
> > Up to some, sane limit.  I wouldn't want us pulling more than 64k or so 
> > without
> > some extra thought.
> 
> of course.

I don't agree.  IMO (using your terminology) the warehouse should 
be off-limits until the POST data has been parsed *completely*.  That
means *only* the content handler should be making any enquiries.  

Furthermore, if the content handler wants to call ap_get_brigade 
itself to get at a portion of the POST stream, it should do that
*before* ever visiting our warehouse.  Otherwise apreq_request_parse
should just gobble it all up.

I still think this invocation issue you're grappling with is a red 
herring for apreq.  The content-handler simply must adhere to the 
HTTP protocol- if it decides to ignore the request body when it
shouldn't have, that's not the apreq library's fault.   Moreover, 
AFAICT this problem does not apprear to be exclusive to our hypothetical 
apreq filter.

[...]

> How do we protect from injecting the filter too late, if something has 
> already pulled the data in? just document this potential problem?

Right- that's a (hopefully) detectable error.

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

William A. Rowe, Jr. wrote:
> At 02:46 AM 8/23/2002, Stas Bekman wrote:
> 
>> Joe Schaefer wrote:
>>
>>> Stas Bekman <st...@stason.org> writes:
>>> [...]
>>>
>>>> as you can see the input filter that saw the body was invoked 
>>>> *after* the response phase has finished. So my question was, how to 
>>>> force the connection filter to request the next brigades which 
>>>> include the body, if nobody else does that. This part can be very 
>>>> tricky if you understand what I mean. I hope Bill can see the 
>>>> problem here, unless I miss something.
>>>
>>>
>>> I see the problem.  However, don't we have the exact same problem
>>> with the current code?  I mean, if the reported Content-Length is
>>> too big, WE don't attempt to read any POST data.  We also give up
>>> if we've accumulated too much data.
>>
>>
>> No, the problem I'm referring to is how to invoke the filter in first 
>> place. It won't be invoked if the response handler won't call 
>> ap_get_brigade. Hmm, I think I know how this should work.
>>
>> Any time anybody does any enquiry from apreq, we check a flag whether 
>> we have the data consumed and parsed (which is done already). If it 
>> wasn't consumed yet, apreq inserts its input filter and performs the 
>> ap_get_brigade call.
> 
> 
> Up to some, sane limit.  I wouldn't want us pulling more than 64k or so 
> without
> some extra thought.

of course.

>> Bill, please correct me if I'm wrong as I see the corrected picture in 
>> my mind:
>>
>> apreq is split into 2 parts: the warehouse and the filter.
>>
>> The warehouse is invoked from HTTP response handler by simply 
>> performing *any* call into apreq_, which essentially asks for 
>> something. the warehouse looks whether the body has been consumed 
>> already, if it was and parsed it answers the quiery. If the data 
>> wasn't consumed yet, the warehouse inserts apreq filter as the last 
>> request input filter and immediately calls ap_get_brigade till it gets 
>> EOS bucket or it decides to terminate the sucking action (e.g. because 
>> of POST limit was exceeded).
> 
> 
> Sounds sane.
> 
>> The filter is really just a sucker which feeds the warehouse which 
>> does the parsing and storing of the parsed data.
> 
> 
> That was the direction I was thinking.

cool

>> hmm, for some reason I think that we end up using the current apreq 
>> model, just that it gets feeded from its own filter, which can be 
>> eliminated altogether.
> 
> 
> And that POST data is still passed down the filter chain to be consumed
> in other interesting ways by modules like cgi [passed on to the cgi app.]
> It really isn't consumed, it's more like your snoop filter.

As I suggested before this can be configurable, it'll probably save some 
  memory if you know that you don't want the body anywhere but in the 
apreq's warehouse.

>> the point is that you cannot invoke the apreq filter by itself, 
>> somebody has to invoke it (inserting is not enough), that somebody is 
>> the response handler, so we return to where we have started, not 
>> really needing any filter stuff at all.
> 
> 
> Agreed, I don't want folks inserting it themselves.  You might end up with
> three copies in the filter stack.  They simply need to call the apreq_ 
> method
> which will then inject the filter as-needed.  Still, several modules 
> [filters]
> can all look at the same body, and we still pass the POST data on.  This
> is significantly more thorough than the current apreq model.

ok

How do we protect from injecting the filter too late, if something has 
already pulled the data in? just document this potential problem?

>>> In the 1.3-ish past, I'd assumed that the proper course of action for 
>>> these situations was to instruct apache to shut down the connection.  
>>> Otherwise (say with keepalives on) the client will
>>> send the post data and apache will treat it as a new, malformed http 
>>> request.
>>
>>
>> I think that this part is of a later concern, but as Bill has 
>> mentioned before discard_request_body() will probably take care of it.
>>
>> For future optimizations, I can see the situation where the lazy mode 
>> can be used, e.g. don't consume the whole body as long as you have 
>> satisfied the query. e.g. the form data followed the file upload, but 
>> the form wasn't filled properly so we don't care about the file 
>> because we want to return user the form to complete again.
> 
> 
> For reasons I stated before, such a module is not a healthy module.  Let us
> presume [for now] that the body is sucked by the handler in time for us to
> react and deal with the POSTed body.

ok



__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Stas Bekman <st...@stason.org> writes:

[...]

> as you can see the input filter that saw the body was invoked *after* 
> the response phase has finished. So my question was, how to force the 
> connection filter to request the next brigades which include the body, 
> if nobody else does that. This part can be very tricky if you understand 
> what I mean. I hope Bill can see the problem here, unless I miss something.

I see the problem.  However, don't we have the exact same problem
with the current code?  I mean, if the reported Content-Length is
too big, WE don't attempt to read any POST data.  We also give up
if we've accumulated too much data.

In the 1.3-ish past, I'd assumed that the proper course of action for 
these situations was to instruct apache to shut down the 
connection.  Otherwise (say with keepalives on) the client will
send the post data and apache will treat it as a new, malformed 
http request.

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> [...]
> 
> 
>>Also I think that we should keep the old mechanism as well. In case 
>>someone wants to decide at run time whether to run apreq or not. The 
>>more flexible apreq is the better.
> 
> 
> I'm viewing the current implementation it in this way (please suspend 
> your disbelief until the end :) -
> 
> --------------------------------------------------
> When a content-handler calls apreq_request_new, that does two things: 
> creates a placeholder for the parsed data (apreq_request_t *), and 
> installs the "virtual apreq filter" at the very end of the input filters 
> chain.  The "virtual apreq filter" holds the stack of registered parsers 
> (imagine we've replaced req->parsers with req->filter in apreq_request_t).
> 
> The next thing the content-handler does is call apreq_request_parse 
> (implicitly or explicitly).  Its only job is to pull all the data buckets 
> through the input filters.  The content-handler will block here until 
> apreq_request_parse's job is done.
> 
> Although this is a fairly bizarre portrait of how the current code
> "works",  do you see any conceptual problems here?  If not, then reworking
> the "virtual apreq filter" into a real input filter should give us the
> current mechanism for free.

I think all we need is to split the feeder from the parser, so the 
parser always gets the request object and something that it can stick 
the parsed data into (or return it), but the parser could be feed via 
various ways: filter or implicit data passed, if the client has already 
copied r->content so it's impossible to run through filters again.

> --------------------------------------------------
> 
> Extending support from content-handlers to filters:
> 
> If we s/content-handler/output filter/g above, we need to figure out

output filter? you mean input, no? apreq can play only as request input 
filter.

> how to make apreq_request_new locate the parsed apreq_request_t object.
> If the "apreq filter" object is still around, maybe we could pull it
> from that?

you store the parsed data in the request pool's context. see mod_deflate 
for an example (though this one stores in the connection pool's context, 
but it's similar).

> I'm still pondering the s/content-handler/input filter/g case.
> At the moment I'm wondering why another input filter would ever
> want to call apreq_request_parse().

I don't think that's what Bill meant. I think Bill was talking about 
apreq filter to just be there and call apreq_request_parse, when it has 
consumed or copied the request body.

>>I have one blind spot though. If a response phase doesn't call 
>>ap_get_brigade to get the body, the request input filters won't be 
>>invoked. The connection input filters will be invoked, when httpd will 
>>suck the rest of the request in before generating the output, but... 
>>that's too late, the response phase will be over by that time. So how do 
>>we do that?
> 
> 
> I don't see what you're concerned about here.  Can you make up
> an example for me?

Certainly. Take the SnoopFilter example and now configure it as:

Listen 8008
<VirtualHost _default_:8008>
     PerlModule MyApache::FilterSnoop
     PerlModule MyApache::Dump

     # Connection filters
     PerlInputFilterHandler  MyApache::FilterSnoop::connection
     PerlOutputFilterHandler MyApache::FilterSnoop::connection

     <Location /dump>
         SetHandler perl-script
         PerlResponseHandler MyApache::Dump
#        PerlInputFilterHandler  MyApache::FilterSnoop::request
#        PerlOutputFilterHandler MyApache::FilterSnoop::request
     </Location>

</VirtualHost>

and change MyApache::Dump not to read the request's body and leave it 
unconsumed:

sub content {
     return "";
}

now when doing:

echo "mod_perl rules" | POST 'http://localhost:8008/dump?foo=1&bar=2'

you are going to see something like this:

 >>> connection output filter
     o bucket 1: TRANSIENT
[args:
foo=1&bar=2
content:

]

<<< connection input filter
     o bucket 1: HEAP
[mod_perl rules
]

as you can see the input filter that saw the body was invoked *after* 
the response phase has finished. So my question was, how to force the 
connection filter to request the next brigades which include the body, 
if nobody else does that. This part can be very tricky if you understand 
what I mean. I hope Bill can see the problem here, unless I miss something.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

William A. Rowe, Jr. wrote:
> At 12:39 AM 8/23/2002, Stas Bekman wrote:
> 
>> Joe Schaefer wrote:
>>
>>> Stas Bekman <st...@stason.org> writes:
>>> [...]
>>>
>>>> Also I think that we should keep the old mechanism as well. In case 
>>>> someone wants to decide at run time whether to run apreq or not. The 
>>>> more flexible apreq is the better.
>>>
>>>
>>> I'm viewing the current implementation it in this way (please suspend 
>>> your disbelief until the end :) -
>>> --------------------------------------------------
>>> When a content-handler calls apreq_request_new, that does two things: 
>>> creates a placeholder for the parsed data (apreq_request_t *), and 
>>> installs the "virtual apreq filter" at the very end of the input 
>>> filters chain.  The "virtual apreq filter" holds the stack of 
>>> registered parsers (imagine we've replaced req->parsers with 
>>> req->filter in apreq_request_t).
>>
> 
> First, the module (content-handler or other filter) simply needs to call
> some apreq_capture() function that can do all that extra work.  Other
> modules performing the same call simply hook into the already-inserted
> filter.
> 
> Second, you must do this call before you actually begin request processing,
> e.g. at the very beginning of your handler, or better yet, back in the 
> pre-handler
> processing phases.  All filters that need apreq data will have to do so 
> in the
> pre-handler phases or in the insert_filter phase.
> 
> The filter's apreq_capture() function itself can do all the prep work 
> and take
> the responsibility off of the handler/other filters.

looks like you are saying the same thing as I do. Meaning that you don't 
really need it to be a filter. Right?

Also any chance that we can stick the parsed data into the request 
object? just like now we have r->args, we can have r->parsed_body, and 
apreq manipulating them both.

if you use meta buckets, you have a problem to fish the data out during 
the response phase. since the body data is interesting only during the 
request phases + request output filters, we better associate it with the 
request object.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

>>>output filter? you mean input, no? apreq can play only as request input 
>>>filter.
>>
>>Correct.
> 
> 
> Actually I am referring to the nut that uses apreq everywhere,
> and wants to use an output filter to pretty-print the already-parsed 
> apreq data at the bottom of every page.  Given the number of people
> that complain on the modperl list that Apache::Request has "lost" the
> post data in their stacked content handlers, do we really want to 
> exclude this possibility?

That's simple. All request output filters will see that data if the 
response handler does.


__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

"William A. Rowe, Jr." <wr...@rowe-clan.net> writes:

[...]

> Second, you must do this call before you actually begin request processing,
> e.g. at the very beginning of your handler, or better yet, back in the 
> pre-handler
> processing phases.  All filters that need apreq data will have to do so in the
> pre-handler phases or in the insert_filter phase.

Thanks for clearing this up for me.

> The filter's apreq_capture() function itself can do all the prep work and take
> the responsibility off of the handler/other filters.
> 
> >>The next thing the content-handler does is call apreq_request_parse 
> >>(implicitly or explicitly).  Its only job is to pull all the data buckets 
> >>through the input filters.  The content-handler will block here until 
> >>apreq_request_parse's job is done.
> 
> This should be an optional phase.  In the usual case, simply consuming the
> post data through ap_get_brigade will have the very same effect.

Agreed.


[...]

> >output filter? you mean input, no? apreq can play only as request input 
> >filter.
> 
> Correct.

Actually I am referring to the nut that uses apreq everywhere,
and wants to use an output filter to pretty-print the already-parsed 
apreq data at the bottom of every page.  Given the number of people
that complain on the modperl list that Apache::Request has "lost" the
post data in their stacked content handlers, do we really want to 
exclude this possibility?

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

[rerouting Bill's reply to apreq-dev, with my followup]

William A. Rowe, Jr. wrote:

>> looks like you are saying the same thing as I do. Meaning that you 
>> don't really need it to be a filter. Right?
> 
> 
> No.  I'm saying that a parser has multiple uses, while what we are 
> trying to
> accomplish is apreq-filter the post body, just once, for all of the 
> filters that
> want to review the post data.
> 
> So yes, a filter works best.  And yes, there might be other uses for the 
> parsers,
> why lock them into doing one and only one sort of parsing mechanics?

+1

>> Also any chance that we can stick the parsed data into the request 
>> object? just like now we have r->args, we can have r->parsed_body, and 
>> apreq manipulating them both.
> 
> 
> Again, metadata buckets might work even better than adding to r-> foo.
> 
> Be warned that several httpd'ers are bent on sacking most of the r-> 
> structure
> because much of its data would have better homes elsewhere.

understood.

>> if you use meta buckets, you have a problem to fish the data out 
>> during the response phase. since the body data is interesting only 
>> during the request phases + request output filters, we better 
>> associate it with the request object.
> 
> 
> I see that as one possible argument, if we were to have folks slurping 
> the post
> data during the request-header processing.  I still need to be convinced 
> that
> metadata can't solve such problems :-)

I haven't played with metabuckets yet, so I trust your thoughts ;)



__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Stas Bekman <st...@stason.org> writes:

[...]

> Also I think that we should keep the old mechanism as well. In case 
> someone wants to decide at run time whether to run apreq or not. The 
> more flexible apreq is the better.

I'm viewing the current implementation it in this way (please suspend 
your disbelief until the end :) -

--------------------------------------------------
When a content-handler calls apreq_request_new, that does two things: 
creates a placeholder for the parsed data (apreq_request_t *), and 
installs the "virtual apreq filter" at the very end of the input filters 
chain.  The "virtual apreq filter" holds the stack of registered parsers 
(imagine we've replaced req->parsers with req->filter in apreq_request_t).

The next thing the content-handler does is call apreq_request_parse 
(implicitly or explicitly).  Its only job is to pull all the data buckets 
through the input filters.  The content-handler will block here until 
apreq_request_parse's job is done.

Although this is a fairly bizarre portrait of how the current code
"works",  do you see any conceptual problems here?  If not, then reworking
the "virtual apreq filter" into a real input filter should give us the
current mechanism for free.
--------------------------------------------------

Extending support from content-handlers to filters:

If we s/content-handler/output filter/g above, we need to figure out
how to make apreq_request_new locate the parsed apreq_request_t object.
If the "apreq filter" object is still around, maybe we could pull it
from that?

I'm still pondering the s/content-handler/input filter/g case.
At the moment I'm wondering why another input filter would ever
want to call apreq_request_parse().

> I have one blind spot though. If a response phase doesn't call 
> ap_get_brigade to get the body, the request input filters won't be 
> invoked. The connection input filters will be invoked, when httpd will 
> suck the rest of the request in before generating the output, but... 
> that's too late, the response phase will be over by that time. So how do 
> we do that?

I don't see what you're concerned about here.  Can you make up
an example for me?

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> "William A. Rowe, Jr." <wr...@rowe-clan.net> writes:
> 
> 
>>At 12:23 PM 8/22/2002, Joe Schaefer wrote:
>>
>>>OK, I think it's starting to gel now.  The input filter's
>>>control flow (in C) centers around ap_get_brigade.  I think
>>>the upshot for us means that converting the parsers to filters
>>>amounts to
>>>
>>>  1) reworking apreq_list_read to read from an arbitrary filter,
>>>     not just r->filters_in.  It also has to pass along the brigade
>>>     instead of clearing it.  The necessary changes to apreq_list.[ch]
>>>     are trivial.
>>>
>>>  2) literally removing the for(;;) loops from the parsers in
>>>     apreq_parser.c.  All parsers take their input from apreq_list,
>>>     so the only modifications would be to have them operate as
>>>     callbacks.  I don't think that's much of an issue at all.
>>
>>It sounds like you have it right on target ;-)

Seconded.

Also I think that we should keep the old mechanism as well. In case 
someone wants to decide at run time whether to run apreq or not. The 
more flexible apreq is the better.

>>Finally, about the storage for the returned body chunks.  I'm not clear
>>why you propose the connection pool?  
> 
> 
> I don't recall ever proposing that; maybe when I was talking about
> the control flow of Stas' perl examples?  Can you be more specific?

Bill is talking about my suggestion to store the parsed data in the 
connection pool, which was obviously wrong, because the body is parsed 
during request phases and therefore should be stored in the request pool.

The is one question regarding the request pool. apreq works with query 
string as well, should it then invoke r->args when it needs that data? I 
thought it could grab it from the request headers since it's a filter, no?

I have one blind spot though. If a response phase doesn't call 
ap_get_brigade to get the body, the request input filters won't be 
invoked. The connection input filters will be invoked, when httpd will 
suck the rest of the request in before generating the output, but... 
that's too late, the response phase will be over by that time. So how do 
we do that?

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

William A. Rowe, Jr. wrote:

>> Also I think that we should keep the old mechanism as well. In case 
>> someone wants to decide at run time whether to run apreq or not. The 
>> more flexible apreq is the better.
> 
> Well, if it's done correctly, they could believe they are invoking it
> at 'run time' while really inserting the filter.  It would appear that it's
> the old fashioned method, but it would be more secure if we have one
> mechanism to audit.

That might not always work, because you don't know if the body wasn't 
parsed already even if partially.

>> The is one question regarding the request pool. apreq works with query 
>> string as well, should it then invoke r->args when it needs that data? 
>> I thought it could grab it from the request headers since it's a 
>> filter, no?
> 
> 
> I would still choose r->args.  This filter would go in between the 
> HTTP_IN filter,
> and the handler plus other content filters.  It would never see the 
> headers in
> raw form.

sounds fine to me.

>> I have one blind spot though. If a response phase doesn't call 
>> ap_get_brigade to get the body, the request input filters won't be 
>> invoked. The connection input filters will be invoked, when httpd will 
>> suck the rest of the request in before generating the output, but... 
>> that's too late, the response phase will be over by that time. So how 
>> do we do that?
> 
> 
> Yup.  It isn't pretty.  But if we are clever, we may be able to force
> the body to be read once we are inserted in the chain, up to some
> arbitrary length (say 64kb, for example).  The apreq filter would then
> set aside the data if the read was for only 0 bytes, so the next read
> from ap_get_brigade (for the real body, from the handler) would return
> the data we set aside.

can you do that? I don't remember if there is such a hook

> This points to another optimization.  Wouldn't it be nice to be able
> to optimize ap_discard_request_body() by simply passing a note
> down to the apreq filter (if it's present and did its work) to toss the
> data once it has parsed the input?

it depends. if the apreq filter copies the data it parses and passes the 
brigade unmodified, there is no need to do anything at all, as the 
generic thing will just work. If on the other hand apreq consumes the 
body and doesn't leave anything in the brigades, then yes, it should 
short-cut and bring to a faster completion of the request assuming that 
the body is big (e.g. HEAD requests)



__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

"William A. Rowe, Jr." <wr...@rowe-clan.net> writes:

> At 12:23 PM 8/22/2002, Joe Schaefer wrote:
> >OK, I think it's starting to gel now.  The input filter's
> >control flow (in C) centers around ap_get_brigade.  I think
> >the upshot for us means that converting the parsers to filters
> >amounts to
> >
> >   1) reworking apreq_list_read to read from an arbitrary filter,
> >      not just r->filters_in.  It also has to pass along the brigade
> >      instead of clearing it.  The necessary changes to apreq_list.[ch]
> >      are trivial.
> >
> >   2) literally removing the for(;;) loops from the parsers in
> >      apreq_parser.c.  All parsers take their input from apreq_list,
> >      so the only modifications would be to have them operate as
> >      callbacks.  I don't think that's much of an issue at all.
> 
> It sounds like you have it right on target ;-)

Ahh, glad to hear that.

> About your 'slurping everything from a huge response, e.g. upload'
> issue, we obviously need some threshhold where a given post variable
> will be tagged as present, but too large to process.

I'm no longer concerned that this will even be an issue,
but we can discuss that when you have more free time (and
I've thought about it a little bit more). 

[...]

> Finally, about the storage for the returned body chunks.  I'm not clear
> why you propose the connection pool?  

I don't recall ever proposing that; maybe when I was talking about
the control flow of Stas' perl examples?  Can you be more specific?

-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> [...]
> 
> 
>>I'll mention again this URL:
>>http://perl.apache.org/docs/2.0/user/handlers/handlers.html#All_in_One_Filter
>>which demonstrates that filters are quite simple once you understand how 
>>they work, and that's the goal of that URL -- to help you understand that.
> 
> 
> 
> I agree that Dump.pm is very simple, But I don't *like* the 
> idea of using a store-and-forward type filter (which is only 
> called one time per request, and consumes its entire input brigade
> before passing control to the next filter in the chain).  
> 
> The FilterSnoop.pm code looks much more promising to me,
> which appears to only consume $readbytes of data per call.
> In that example, what I'd naively hope to happen is that apache 
> would execute the filters in this order:
> 
> snoop ("connection", ...
> snoop ("request", ...
> snoop ("connection", ...
> snoop ("request", ...
> snoop ("connection", ...
> snoop ("request", ...
> 
> but NOT in a store-and-forward 'ish
> 
> snoop ("connection", ...
> snoop ("connection", ...
> snoop ("connection", ...
> snoop ("request", ...
> snoop ("request", ...
> snoop ("request", ...
> 
> It looks to me (based on the webpage) like the second case 
> is what's really happening.  Is that right?

I guess my explanations aren't good enough :( Dump is a response 
handler, it has nothing to do with filters. it just reads the query 
string and the body and echos them back as a response. it could just say 
"hello". The point of Dump is that it calls $r->content which invokes 
request input filters. If you don't call $r->content request input 
filter will never be called at all.

All filters are inside FilterSnoop, try removing the request filters and 
see how the connection filters work alone, then just the request 
filters, then both. Filters never consume more then one brigade unless 
you want to buffer up, usually they process the brigade and forward it 
further.

Also remember that the C implementation will be a "bit" longer, so it's 
a good idea to grap the concepts with perl implementations ;)

Though currently there is a problem with perl filters, as you don't have 
access to connection context, so you cannot setaside data. which is 
easily done in C, because you have the raw access to filters. I guess 
this will be fixed in perl at some point. But for apreq this doesn't 
matter, as it's pure C.

one more thing. Look at the two diagrams
http://perl.apache.org/docs/2.0/user/handlers/handlers.html#HTTP_Request_Cycle_Phases
and
http://perl.apache.org/docs/2.0/user/handlers/handlers.html#Connection_Cycle_Phases

connection filters see all the data, including headers. request filters 
see only the request and respond *bodies*. so this losely can be shown as:

connection_filter(headers_in)
connection_filter(body_in)
request_filter(body_in)
...
request_filter(body_out)
connection_filter(headers_out)
connection_filter(body_out)

in reality filters interleave because they are stacked and each brigade 
goes through all filters in the stack.

when you grok how this works and know how to improve my explanations 
patches or just comments are very welcome.

Give me some more time, I'll add more filter examples tomorrow.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Stas Bekman <st...@stason.org> writes:

[...]

> I'll mention again this URL:
> http://perl.apache.org/docs/2.0/user/handlers/handlers.html#All_in_One_Filter
> which demonstrates that filters are quite simple once you understand how 
> they work, and that's the goal of that URL -- to help you understand that.

I agree that Dump.pm is very simple, But I don't *like* the 
idea of using a store-and-forward type filter (which is only 
called one time per request, and consumes its entire input brigade
before passing control to the next filter in the chain).  

The FilterSnoop.pm code looks much more promising to me,
which appears to only consume $readbytes of data per call.
In that example, what I'd naively hope to happen is that apache 
would execute the filters in this order:

snoop ("connection", ...
snoop ("request", ...
snoop ("connection", ...
snoop ("request", ...
snoop ("connection", ...
snoop ("request", ...

but NOT in a store-and-forward 'ish

snoop ("connection", ...
snoop ("connection", ...
snoop ("connection", ...
snoop ("request", ...
snoop ("request", ...
snoop ("request", ...

It looks to me (based on the webpage) like the second case 
is what's really happening.  Is that right?

> Ryan's book should be of help too
> http://www.amazon.com/exec/obidos/ASIN/0072223448
> it covers the filters development details.

Thanks again, Stas- I'll have a look at that as well.
-- 
Joe Schaefer

Re: dev question: apreq 2 as a filter?

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> Joshua Moore-Oliva <ch...@mediapow.com> writes:
> 
> [...]
> 
> 
>>In addition, considering that it appears there are no filter-expert
>>developers on the apreq team, 
> 
> 
> Stas know apache 2 inside and out; we're fortunate to have him.

I wish I knew apache 2 inside-out... I just needed to go down to the 
source to understand how things work, so I know it somewhat from 
stepping through with gdb. But I mostly do the mod_perl side, so I'm far 
from being an expert on the apache internals. In fact I doubt there is 
a person who knows 2.0 inside-out, it's a huge beast. May be Ryan and a 
few other folks who worked on it for the last 4 years are the only ones...

I'll mention again this URL:
http://perl.apache.org/docs/2.0/user/handlers/handlers.html#All_in_One_Filter
which demonstrates that filters are quite simple once you understand how 
they work, and that's the goal of that URL -- to help you understand that.

Ryan's book should be of help too
http://www.amazon.com/exec/obidos/ASIN/0072223448
it covers the filters development details.

>>and presuming there is a good reason to port apreq to act as a filter,
>>should we not concentrate on getting an official 2.0 compatible
>>version of apreq out there to increase use of apreq and lure
>>developers into our circle?
> 
> 
> Hold that thought- we're now waiting to see what the apache
> developers finally say about the current code.  If we do
> incorporate a filter API into the parsers, it will have 
> absolutely no effect on apreq_cookie, and probably *only* 
> impact the apreq_parser*- related stuff in apreq_request.  
> In all likelihood, a current user (aka "alpha-tester" :) 
> won't even notice the change.

Seconded.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: dev question: apreq 2 as a filter?

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Joshua Moore-Oliva <ch...@mediapow.com> writes:

[...]

> In addition, considering that it appears there are no filter-expert
> developers on the apreq team, 

Stas know apache 2 inside and out; we're fortunate to have him.

> and presuming there is a good reason to port apreq to act as a filter,
> should we not concentrate on getting an official 2.0 compatible
> version of apreq out there to increase use of apreq and lure
> developers into our circle?

Hold that thought- we're now waiting to see what the apache
developers finally say about the current code.  If we do
incorporate a filter API into the parsers, it will have 
absolutely no effect on apreq_cookie, and probably *only* 
impact the apreq_parser*- related stuff in apreq_request.  
In all likelihood, a current user (aka "alpha-tester" :) 
won't even notice the change.

-- 
Joe Schaefer