You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apreq-dev@httpd.apache.org by Eli Marmor <ma...@netmask.it> on 2004/07/30 14:51:36 UTC

Earliest Hook to Inspect POST Params

Hi,

Sorry if it's a newbie question, but I wondered what is the earliest
hook that can be used for inspection of the POST parameters (apreq2, C).

I'm not creating any content (so a normal handler is not needed), and
I'm transparent and not modifying the request/response (so no filter is
needed), but just logging the information and want to know what is the
earliest phase that is already safe enough to parse the POST
parameters.

If there is an existing example, you may point me at that example
instead of explaining.

Thanks,
-- 
Eli Marmor
marmor@netmask.it
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.:   +972-9-766-1020          8 Yad-Harutzim St.
Fax.:   +972-9-766-1314          P.O.B. 7004
Mobile: +972-50-23-7338          Kfar-Saba 44641, Israel

Re: Earliest Hook to Inspect POST Params

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Stas Bekman <st...@stason.org> writes:

> Joe Schaefer wrote:
> > Stas Bekman <st...@stason.org> writes:
> >
> >>Joe Schaefer wrote:
> > [...]
> >
> >>>No actual data gets copied by mod_apreq, only the buckets are copied.
> >>
> >>So what happens if the buckets are destroyed in the response handler phase?
> > Any buckets that actually need to be kept around (eg buckets
> > representing a file upload) are set aside by the parser.
> 
> Meaning that if a consumer has manually parsed the buckets and
> destroyed them, the set-aside buckets' content will be fully copied.

Incorrect- the data in a heap bucket, which is the bucket type that
normally comes down the input filter chain, is *refcounted*.  Destroying 
a heap bucket is just like destroying a reference in perl: the object 
it points to doesn't go away until its refcount is zero, and making
a copy of the reference does not induce a copy of the object, it just
bumps the refcount.  Also note: setaside is a noop on heap buckets,
we only need to call apr_bucket_setaside because it's not a noop
with other bucket types.

-- 
Joe Schaefer


Re: Earliest Hook to Inspect POST Params

Posted by Stas Bekman <st...@stason.org>.
Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> 
>>Joe Schaefer wrote:
>>
>>>Stas Bekman <st...@stason.org> writes:
>>>
>>>
>>>>Joe Schaefer wrote:
>>>
>>>[...]
>>>
>>>
>>>>>No actual data gets copied by mod_apreq, only the buckets are copied.
>>>>
>>>>So what happens if the buckets are destroyed in the response handler phase?
>>>
>>>Any buckets that actually need to be kept around (eg buckets
>>>representing a file upload) are set aside by the parser.
>>
>>Meaning that if a consumer has manually parsed the buckets and
>>destroyed them, the set-aside buckets' content will be fully copied.
> 
> 
> Incorrect- the data in a heap bucket, which is the bucket type that
> normally comes down the input filter chain, is *refcounted*.  Destroying 
> a heap bucket is just like destroying a reference in perl: the object 
> it points to doesn't go away until its refcount is zero, and making
> a copy of the reference does not induce a copy of the object, it just
> bumps the refcount.  Also note: setaside is a noop on heap buckets,
> we only need to call apr_bucket_setaside because it's not a noop
> with other bucket types.

Excellent then!

-- 
__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Earliest Hook to Inspect POST Params

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Stas Bekman <st...@stason.org> writes:

> Joe Schaefer wrote:
> > Stas Bekman <st...@stason.org> writes:
> >
> >>Joe Schaefer wrote:
> > [...]
> >
> >>>No actual data gets copied by mod_apreq, only the buckets are copied.
> >>
> >>So what happens if the buckets are destroyed in the response handler phase?
> > Any buckets that actually need to be kept around (eg buckets
> > representing a file upload) are set aside by the parser.
> 
> Meaning that if a consumer has manually parsed the buckets and
> destroyed them, the set-aside buckets' content will be fully copied.

Incorrect- the data in a heap bucket, which is the bucket type that
normally comes down the input filter chain, is *refcounted*.  Destroying 
a heap bucket is just like destroying a reference in perl: the object 
it points to doesn't go away until its refcount is zero, and making
a copy of the reference does not induce a copy of the object, it just
bumps the refcount.  Also note: setaside is a noop on heap buckets,
we only need to call apr_bucket_setaside because it's not a noop
with other bucket types.

-- 
Joe Schaefer


Re: Earliest Hook to Inspect POST Params

Posted by Stas Bekman <st...@stason.org>.
Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> 
>>Joe Schaefer wrote:
> 
> 
> [...]
> 
> 
>>>No actual data gets copied by mod_apreq, only the buckets are copied.
>>
>>So what happens if the buckets are destroyed in the response handler phase?
> 
> 
> Any buckets that actually need to be kept around (eg buckets
> representing a file upload) are set aside by the parser.

Meaning that if a consumer has manually parsed the buckets and destroyed 
them, the set-aside buckets' content will be fully copied.

>>Also copying just the buckets could be quite an overhead too, when
>>there is a lot of incoming data.
> 
> Nope, because the copies are fed right into the parser, which 
> usually consumes them, so they go right back into the bucket 
> allocator for reuse.  Unless the parser truly needs to set them 
> aside (because they represent parsed data, eg a file upload), 
> the copies are normally available for reuse on the next 
> ap_get_brigade cycle.

I think we are talking about different things here. I'm talking about a 
consumer of POST data not using apreq API, and apreq API might be 
invoked at the latter stage. In which case *all* the POST buckets should 
have been copied and set-aside by now. I'm not talking about temp bucket 
allocation when you traverse a single bucket brigade.

-- 
__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Earliest Hook to Inspect POST Params

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Stas Bekman <st...@stason.org> writes:

> Joe Schaefer wrote:

[...]

> > No actual data gets copied by mod_apreq, only the buckets are copied.
> 
> So what happens if the buckets are destroyed in the response handler phase?

Any buckets that actually need to be kept around (eg buckets
representing a file upload) are set aside by the parser.

> Also copying just the buckets could be quite an overhead too, when
> there is a lot of incoming data.

Nope, because the copies are fed right into the parser, which 
usually consumes them, so they go right back into the bucket 
allocator for reuse.  Unless the parser truly needs to set them 
aside (because they represent parsed data, eg a file upload), 
the copies are normally available for reuse on the next 
ap_get_brigade cycle.

-- 
Joe Schaefer


Re: Earliest Hook to Inspect POST Params

Posted by Stas Bekman <st...@stason.org>.
Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> [...]
> 
> 
>>I'd think that if any consumer has requested the POST data bypassing
>>the apreq API (even though apreq's filter is installed), apreq 
>>should do nothing. 
>           ^^^^^^^
> 
> That's what it does- almost.  When nobody actually *uses* libapreq2
> until a post-content-handler phase, the apreq filter just hands the 
> parser a copy (nb it just copies the buckets, *not* the data) of the 
> incoming buckets.
> 
> 
>>Otherwise the data gets unnecessary copied.
> 
> 
> No actual data gets copied by mod_apreq, only the buckets are copied.

So what happens if the buckets are destroyed in the response handler phase?

Also copying just the buckets could be quite an overhead too, when there 
is a lot of incoming data.

-- 
__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Earliest Hook to Inspect POST Params

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Stas Bekman <st...@stason.org> writes:

[...]

> I'd think that if any consumer has requested the POST data bypassing
> the apreq API (even though apreq's filter is installed), apreq 
> should do nothing. 
            ^^^^^^^

That's what it does- almost.  When nobody actually *uses* libapreq2
until a post-content-handler phase, the apreq filter just hands the 
parser a copy (nb it just copies the buckets, *not* the data) of the 
incoming buckets.

> Otherwise the data gets unnecessary copied.

No actual data gets copied by mod_apreq, only the buckets are copied.

-- 
Joe Schaefer


Re: Earliest Hook to Inspect POST Params

Posted by Stas Bekman <st...@stason.org>.
Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> 
>>Joe Schaefer wrote:
>>
>>>Eli Marmor <ma...@netmask.it> writes:
>>
>>[...]
>>
>>>>I'm not creating any content (so a normal handler is not needed), and
>>>>I'm transparent and not modifying the request/response (so no filter is
>>>>needed), but just logging the information and want to know what is the
>>>>earliest phase that is already safe enough to parse the POST
>>>>parameters.
>>>
>>>Sounds to me like you really want to write a logging handler, which
>>>is sort of the opposite question.
>>>  "What's the latest hook that can be used?"
>>>Any hook that runs before the content handler does should be fine.
>>>If you're writing a log handler, all your pre-content-handler hook needs to
>>>do is call
>>>  req = apreq_request(r, NULL);
>>>This will register mod_apreq's input filter.  By the time your logging
>>>handler runs, the request body will be parsed, so the same call will
>>>provide your log handler with the fully parsed data.
>>
>>That's very risky, unless you have a full control over the modules that you
>>use. Anybody can grab the POST data before the log phase is happening
>>w/o using libapreq2 and you will be left with no POST when the logging
>>phase will come. 
> 
> 
> Could you give a specific example about when you believe this is risky?
> (I've left my full quote intact so you can point out the problem).
> 
> mod_apreq will always see exactly the same POST data that the 
> content-handler sees; and it does not matter what the content 
> handler decides to do with the data.

In which case my comment was incorrect. I thought that it's not enough 
to register a filter, but one needs to invoke libapreq's API to get the 
POST data, and otherwise the filter won't do anything.

Thanks Joe for correcting me. But do you think that this default 
behavior is the optimal one? I'd think that if any consumer has 
requested the POST data bypassing the apreq API (even though apreq's 
filter is installed), apreq should do nothing. Otherwise the data gets 
unnecessary copied. Or am I wrong again?

-- 
__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Earliest Hook to Inspect POST Params

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Stas Bekman <st...@stason.org> writes:

> Joe Schaefer wrote:
> > Eli Marmor <ma...@netmask.it> writes:
> [...]
> >>I'm not creating any content (so a normal handler is not needed), and
> >>I'm transparent and not modifying the request/response (so no filter is
> >>needed), but just logging the information and want to know what is the
> >>earliest phase that is already safe enough to parse the POST
> >>parameters.
> > Sounds to me like you really want to write a logging handler, which
> > is sort of the opposite question.
> >   "What's the latest hook that can be used?"
> > Any hook that runs before the content handler does should be fine.
> > If you're writing a log handler, all your pre-content-handler hook needs to
> > do is call
> >   req = apreq_request(r, NULL);
> > This will register mod_apreq's input filter.  By the time your logging
> > handler runs, the request body will be parsed, so the same call will
> > provide your log handler with the fully parsed data.
> 
> That's very risky, unless you have a full control over the modules that you
> use. Anybody can grab the POST data before the log phase is happening
> w/o using libapreq2 and you will be left with no POST when the logging
> phase will come. 

Could you give a specific example about when you believe this is risky?
(I've left my full quote intact so you can point out the problem).

mod_apreq will always see exactly the same POST data that the 
content-handler sees; and it does not matter what the content 
handler decides to do with the data.

-- 
Joe Schaefer


Re: Earliest Hook to Inspect POST Params

Posted by Stas Bekman <st...@stason.org>.
Joe Schaefer wrote:
> Eli Marmor <ma...@netmask.it> writes:
[...]
>>I'm not creating any content (so a normal handler is not needed), and
>>I'm transparent and not modifying the request/response (so no filter is
>>needed), but just logging the information and want to know what is the
>>earliest phase that is already safe enough to parse the POST
>>parameters.
> 
> 
> Sounds to me like you really want to write a logging handler, which
> is sort of the opposite question.
> 
>   "What's the latest hook that can be used?"
> 
> Any hook that runs before the content handler does should be fine.
> If you're writing a log handler, all your pre-content-handler hook 
> needs to do is call
> 
>   req = apreq_request(r, NULL);
> 
> This will register mod_apreq's input filter.  By the time your logging 
> handler runs, the request body will be parsed, so the same call will 
> provide your log handler with the fully parsed data.

That's very risky, unless you have a full control over the modules that 
you use. Anybody can grab the POST data before the log phase is 
happening w/o using libapreq2 and you will be left with no POST when the 
logging phase will come.


-- 
__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Earliest Hook to Inspect POST Params

Posted by Eli Marmor <ma...@netmask.it>.
First I want to thank you both (Joe and Stas).

I'm not really a newbie as I wrote, and actually I think that I was the
first one to encourage you to take advantage of the filtering feature
of Apache, when RB was still busy in the design of the bucket-brigades.

(by the way, I must note that from what I've seen so far, apreq2 could
not be designed better...)

But I still feel unsafe about when the POST data is ready for
inspection, and don't want to code something that will work in a simple
case (i.e. my test) and will fail in the "field" (which is still my
desktop too... ;-)

Joe Schaefer wrote:

> Sounds to me like you really want to write a logging handler, which
> is sort of the opposite question.
> 
>   "What's the latest hook that can be used?"

I'm afraid that Stas understood my purpose better, but it was my shame
- I probably didn't describe my meaning well:

Imagine 2 xterms - one with "tail -f" of the output of the stuff I'm
trying to code, and the other with "tail -f" of the responses. Now, if
there is a delay with a response, I can throw my eyes to the 1st xterm
and see what request caused this delay. So the logging must be done as
early as possible, before anything else can screw things up.

I know that there are other ways to do it, but this way allows me to
integrate it with other stuff I have.

> > If there is an existing example, you may point me at that example
> > instead of explaining.
> 
> There are a few example modules in httpd-apreq-2/env/c-modules

Yes, I know them. But the modules I looked at, were standard filters
which modified the content of the response, so I thought that things
might be different with a read-only module.

Anyway, your answers really helped me. I think that I'll leave the
exact phase to parse the POST data to a configuration parameter that I
can set before any test. All I still need to decide is what the default
will be...

Thanks again,
-- 
Eli Marmor
marmor@netmask.it
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.:   +972-9-766-1020          8 Yad-Harutzim St.
Fax.:   +972-9-766-1314          P.O.B. 7004
Mobile: +972-50-23-7338          Kfar-Saba 44641, Israel

Re: Earliest Hook to Inspect POST Params

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Eli Marmor <ma...@netmask.it> writes:

[...]

> But this note, saying
> "don't worry, you should not be concerned about adding the filter
> because apreq_request() does it for you",

That's one way to read it; another is

  "You should NOT try to add the filter yourself,
   because you may fuck things up by adding an extra
   apreq filter after some other module already added it."

-- 
Joe Schaefer


Re: Earliest Hook to Inspect POST Params

Posted by Eli Marmor <ma...@netmask.it>.
Joe Schaefer wrote:
> 
> Eli Marmor <ma...@netmask.it> writes:
> 
> > Now a more serious question:
> >
> > If I'm a (reverse) proxy, and I want to parse POST requests passing
> > through me, where is the right place to call apreq_request()?
>               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> It depends on who is asking for the parsed data.  If an output filter
> wants to use apreq, it should register a filter_init function that calls
> apreq_request().  See FAQ.pod and env/c-modules/apreq_output_filter_test
> for details.

Oops, you're right...
Silly of me...
It seems that I must follow the CVS more closely (the FAQ is quite new)

By the way, I saw somewhere a note that "there is no need to
AddInputFilter APREQ explicitly", which is very confusing in my eyes
and leads to questions like mine (without this note, I would add the
filter explicitly, and everything would work. But this note, saying
"don't worry, you should not be concerned about adding the filter
because apreq_request() does it for you", may cause people think that
there is a miracle that saves them from the need to add the filter,
which is not the case...)

Thanks,
-- 
Eli Marmor
marmor@netmask.it
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.:   +972-9-766-1020          8 Yad-Harutzim St.
Fax.:   +972-9-766-1314          P.O.B. 7004
Mobile: +972-50-23-7338          Kfar-Saba 44641, Israel

Re: Earliest Hook to Inspect POST Params

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Eli Marmor <ma...@netmask.it> writes:

> Now a more serious question:
> 
> If I'm a (reverse) proxy, and I want to parse POST requests passing
> through me, where is the right place to call apreq_request()?
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It depends on who is asking for the parsed data.  If an output filter
wants to use apreq, it should register a filter_init function that calls 
apreq_request().  See FAQ.pod and env/c-modules/apreq_output_filter_test
for details.

-- 
Joe Schaefer


Re: Earliest Hook to Inspect POST Params

Posted by Eli Marmor <ma...@netmask.it>.
Now a more serious question:

If I'm a (reverse) proxy, and I want to parse POST requests passing
through me, where is the right place to call apreq_request()?

Currently I have an output filter (doing other things); When reaching
the EOS, it's too late (in any case, the "body" member of the structure
pointer returned by apreq_request is NULL even if there was a body,
which makes sense, because it's too late).

So where exactly should I call apreq_request()?

How do I ensure that the input filter would run before the call, so
when apreq_request() is called, it already has something to parse?

Thanks,
-- 
Eli Marmor
marmor@netmask.it
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.:   +972-9-766-1020          8 Yad-Harutzim St.
Fax.:   +972-9-766-1314          P.O.B. 7004
Mobile: +972-50-23-7338          Kfar-Saba 44641, Israel

Re: Earliest Hook to Inspect POST Params

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Eli Marmor <ma...@netmask.it> writes:

> Hi,
> 
> Sorry if it's a newbie question, but I wondered what is the earliest
> hook that can be used for inspection of the POST parameters (apreq2, C).

It should be safe to use mod_apreq as soon as the request headers are 
parsed (which happens during the "Request Parsing Phase"):

    http://httpd.apache.org/docs-2.0/developer/request.html

> I'm not creating any content (so a normal handler is not needed), and
> I'm transparent and not modifying the request/response (so no filter is
> needed), but just logging the information and want to know what is the
> earliest phase that is already safe enough to parse the POST
> parameters.

Sounds to me like you really want to write a logging handler, which
is sort of the opposite question.

  "What's the latest hook that can be used?"

Any hook that runs before the content handler does should be fine.
If you're writing a log handler, all your pre-content-handler hook 
needs to do is call

  req = apreq_request(r, NULL);

This will register mod_apreq's input filter.  By the time your logging 
handler runs, the request body will be parsed, so the same call will 
provide your log handler with the fully parsed data.

> If there is an existing example, you may point me at that example
> instead of explaining.


There are a few example modules in httpd-apreq-2/env/c-modules


-- 
Joe Schaefer