You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Aaron Bannert <aa...@clove.org> on 2002/05/07 22:09:06 UTC

Question about input filters and request bodies

I have a module that needs to suck in the request body. I'm trying to
use ap_get_brigade to do this, as this seems most natural (as opposed
to the ap_*_client_block() family). The semantic that I'm looking for
is something simple like this: "retrieve all the buckets that represent
a request body". I would expect there to be a way to determine if an
input body was not present, without causing the server to block on the
socket waiting for input. As it is I see no simple way to do this. The
only parallel I see is much more complicated (call ap_get_brigade with
the AP_MODE_READBYTES mode, forcing buckets in to n-sized chunks even
if that is not an optimal form, stopping when an EOS comes across (?) ).

Is there a better way to deal with this? Shouldn't the most prevalent mode
of an input filter be something as simple as AP_MODE_GIVE_ME_A_BUCKET?

-aaron

Re: Mod_deflate

Posted by Joshua Slive <jo...@slive.ca>.
On Tue, 7 May 2002, Ian Holsman wrote:

> as for excluding browsers you will need to set the 'no-gzip' variable
> (via a browsermatch I guess)

Hmmm... Is that documented anywhere?

Joshua.


Re: Mod_deflate

Posted by Ian Holsman <ia...@apache.org>.

Jobarr wrote:
> I have been using mod_gzip with Apache 1.3.x and I would like to continue to
> compress my html, so I am looking into mod_deflate with Apache 2. I compiled
> it myself and it seems to be working (running 2.0.36 on Windows XP/2000). I
> know it is "experimental", but are there any real known problems with it
> that would prevent me from using it? Also, with mod_gzip, I had to manually
> exclude some browsers because they could not support the compression even
> though they claimed they could. Is this possible with this module in its
> current state or is it needed?
> 

I think mod_deflate just got moved out of experimental into filters in 
37-dev. (with some minor patches for performance added)

as for excluding browsers you will need to set the 'no-gzip' variable
(via a browsermatch I guess)

Cheers
Ian

> thanks
> -Jobarr
> 



Re: Mod_deflate

Posted by Jobarr <jo...@herzeleid.com>.
----- Original Message -----
> Try mod_gzip for Apache 2.x

But has Mod_gzip even been updated to work with the latest Apache2 API and
whatnot? I didn't think it had yet.
-Jobarr


RE: Mod_deflate

Posted by Igor Sysoev <is...@rambler-co.ru>.
On Wed, 8 May 2002, Peter J. Cranstone wrote:

> Try mod_gzip for Apache 2.x
> 
> http://www.remotecommunications.com/apache/mod_gzip/archive1.htm
> 
> There are still a number of browsers out there that do not support HTTP
> 1.1 content encoding. They only "safe" thing to do is manually exclude
> (filter) them. 

MSIE 4.x can handle gzipped content but has two seldom bugs.
mod_gzip has no way to handle them so you can only disable entirely MSIE 4.x.

Igor Sysoev
http://sysoev.ru


RE: Mod_deflate

Posted by "Peter J. Cranstone" <cr...@msn.com>.
Jobarr,

Try mod_gzip for Apache 2.x

http://www.remotecommunications.com/apache/mod_gzip/archive1.htm

There are still a number of browsers out there that do not support HTTP
1.1 content encoding. They only "safe" thing to do is manually exclude
(filter) them. 

As for how mod_deflate does this I'm not sure. It got moved out of
experimental very quickly and I'm not sure how much testing has taken
place.

Regards,


Peter J. Cranstone


-----Original Message-----
From: Jobarr [mailto:jobarr@herzeleid.com] 
Sent: Tuesday, May 07, 2002 7:25 PM
To: dev@httpd.apache.org
Subject: Mod_deflate

I have been using mod_gzip with Apache 1.3.x and I would like to
continue to
compress my html, so I am looking into mod_deflate with Apache 2. I
compiled
it myself and it seems to be working (running 2.0.36 on Windows
XP/2000). I
know it is "experimental", but are there any real known problems with it
that would prevent me from using it? Also, with mod_gzip, I had to
manually
exclude some browsers because they could not support the compression
even
though they claimed they could. Is this possible with this module in its
current state or is it needed?

thanks
-Jobarr


Mod_deflate

Posted by Jobarr <jo...@herzeleid.com>.
I have been using mod_gzip with Apache 1.3.x and I would like to continue to
compress my html, so I am looking into mod_deflate with Apache 2. I compiled
it myself and it seems to be working (running 2.0.36 on Windows XP/2000). I
know it is "experimental", but are there any real known problems with it
that would prevent me from using it? Also, with mod_gzip, I had to manually
exclude some browsers because they could not support the compression even
though they claimed they could. Is this possible with this module in its
current state or is it needed?

thanks
-Jobarr


Re: How I Think Filters Should Work

Posted by Greg Stein <gs...@lyra.org>.
On Thu, May 09, 2002 at 05:18:42PM -0700, Aaron Bannert wrote:
>...
> The basic pattern for any input filter (which is pull-based at the moment
> in Apache) would be the following:
> 
> 1. retrieve next "abstract data unit"
> 2. inspect "abstract data unit", can we operate on it?
> 3. if yes, operate_on(unit) and pass the result to the next filter.
> 4. if no, pass the current unit to the next filter.
> 5. go to #1
> 
> In this model, the operate_on() behavior has been separated from the
> mechanics of passing data around. I believe this would improve filter

That's fine, as long as you ensure that the retrieval can be bounded. When
the HTTP processor realizes that it can only read 100 more bytes from the
next-filter, then you're outside of "abstract data unit" and into "concrete
100 bytes."

Due to the presence of the Upgrade: header, an HTTP processing filter must
always be per-request, and must never read past the end of its request. That
enforces a number of limitations on your design.

[ unless you go for "pushback", which I believe is a poor design. ]

What would be neat is to have a connection-level filter that does HTTP
processing, but can be signalled to morph itself into a simple buffer. For
example, let's say that filter pulls 10k from next-filter ("pull" here,
remember). It parses up the data into some headers and a 500 byte body. It
has 9k leftover, which it holds to the side.

Now, the request processor sees an "Upgrade" and switches protocols to
something else entirely. The connection filter gets demoted to a simple
buffer, returning the 9k without processing. When the buffer is empty, it
removes itself from the filter stack.

The implication here is that filters need to register with particular hooks
in the server. In particular, with a hook to state that a protocol change
has occurred on <this> connection (also implying an input and an output
filter stack). The protocol-related filters in the stack can then take
appropriate action (in the above example, to disable HTTP processing and
just be a buffer). Other subsystems may have also registered with the hook
and will *install* new protocol handler filters.

You could even use this protocol-change hook to set up the initial HTTP
processing filters. Go from "null" protocol to "http", and that installs
your chunking, http processing, etc. It could even be the mechanism which
tells the MPM to call ap_run_request (??) (the app-level thing which starts
sucking input from the filter stack and processing it).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: How I Think Filters Should Work

Posted by Greg Stein <gs...@lyra.org>.
On Thu, May 09, 2002 at 05:18:42PM -0700, Aaron Bannert wrote:
>...
> The basic pattern for any input filter (which is pull-based at the moment
> in Apache) would be the following:
> 
> 1. retrieve next "abstract data unit"
> 2. inspect "abstract data unit", can we operate on it?
> 3. if yes, operate_on(unit) and pass the result to the next filter.
> 4. if no, pass the current unit to the next filter.
> 5. go to #1
> 
> In this model, the operate_on() behavior has been separated from the
> mechanics of passing data around. I believe this would improve filter

That's fine, as long as you ensure that the retrieval can be bounded. When
the HTTP processor realizes that it can only read 100 more bytes from the
next-filter, then you're outside of "abstract data unit" and into "concrete
100 bytes."

Due to the presence of the Upgrade: header, an HTTP processing filter must
always be per-request, and must never read past the end of its request. That
enforces a number of limitations on your design.

[ unless you go for "pushback", which I believe is a poor design. ]

What would be neat is to have a connection-level filter that does HTTP
processing, but can be signalled to morph itself into a simple buffer. For
example, let's say that filter pulls 10k from next-filter ("pull" here,
remember). It parses up the data into some headers and a 500 byte body. It
has 9k leftover, which it holds to the side.

Now, the request processor sees an "Upgrade" and switches protocols to
something else entirely. The connection filter gets demoted to a simple
buffer, returning the 9k without processing. When the buffer is empty, it
removes itself from the filter stack.

The implication here is that filters need to register with particular hooks
in the server. In particular, with a hook to state that a protocol change
has occurred on <this> connection (also implying an input and an output
filter stack). The protocol-related filters in the stack can then take
appropriate action (in the above example, to disable HTTP processing and
just be a buffer). Other subsystems may have also registered with the hook
and will *install* new protocol handler filters.

You could even use this protocol-change hook to set up the initial HTTP
processing filters. Go from "null" protocol to "http", and that installs
your chunking, http processing, etc. It could even be the mechanism which
tells the MPM to call ap_run_request (??) (the app-level thing which starts
sucking input from the filter stack and processing it).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: How I Think Filters Should Work

Posted by Greg Stein <gs...@lyra.org>.
On Thu, May 09, 2002 at 05:57:16PM -0700, Justin Erenkrantz wrote:
>...
> I really think you're talking about a push-based filter system.
> However, it seems that there was a conscious decision to use
> pull for input-filters.  I wasn't around when that discussion
> was made.  I'd like to hear the rationale for using pull for
> input filters.

Historical. Handlers "pull" input data. Thus, the input filter stack also
needed to be a pull mechanism.

In apr-serf, I've advocated providing both models to the application. The
app can push content at the network, or the network can pull content from
the app. Also, the app can pull input from the network, or the network can
push input at the app.

Note that a push-based filter stack can be used in a pull-fashion. When the
app wants to pull content, the subsystem tells the network endpoint to push
data into the filter stack. The data is then captured on the other side, and
an appropriate amount is returned to the app (and the rest is buffered off
to the side.

>...
> Sending metadata down is a big change.  Again, I *think* this was
> discussed before, but was determined that this wasn't the right way.

No. We think it is right, but it was too big of a change for Apache. Too
much code simply likes to write to r->output_headers.

>...
> (If we do this for input filters, I think we need to do the
> same for output filters.)

The filter stack "should" transport all metadata. The request_rec is an
out-of-band data delivery that hurts us quite a bit in a filter-stack world.

> > around in my head for a long time. When they become clear enough I will
> > write up a more formal and concise proposal on how I think the future
> > filter system should work (possible for 2.1 or beyond). I think the
> > apr-serf project is a perfect place to play with some of these ideas. I
> > would appreciate any constructive comments to the above. ]

I would totally agree. My hope is that apr-serf can establish a new
substrate for the filter systems. It is only a client, though, so it would
be used by proxy, but not by the MPM/listener stuff in Apache (the filter
stack code would be; just not the standard HTTP client endpoints).

> I'm not sure I'm happy that so early in the 2.0 series that we're
> already concerned about the input filtering.  I don't think it's
> ever been "right" - probably because it was ignored for so long.
> It's showing now.  If this prevents people from writing good
> input filters, I think we need to fix this sooner rather than
> later.  -- justin

The input stuff works, but it could probably be better. At a minimum, it
probably makes some sense to have a mode that says "give me as much of the
request as you feel cozy giving me." That would allow the input filters to
return a SOCKET rather than a bunch of 8k buckets. However, to really make
it work (at all, and "best"), we would need a variant of the SOCKET bucket.
It would allow us to share the apr_socket_t and apply a read-limit on the
thing. Thus, you could say "here are 1000 bytes, read from a socket." That
would give you delayed read from the socket (and possible later optimization
of doing a sendfile() from the socket fd into a file fd), yet apply the
appropriate request-boundary limitations.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: How I Think Filters Should Work

Posted by Greg Ames <gr...@apache.org>.
sorry for the fat finger post.

Justin Erenkrantz wrote:

> (As Manoj kidded me last night, you and I seem to retrace old
> discussions coming to the same conclusions Ryan and he did.)
> So, I think we need some of the old people to tell us why we
> aren't doing this.

dang!  if Manoj & Ryan are old people, I'm a friggin' mummy.

Greg

Re: How I Think Filters Should Work

Posted by Greg Stein <gs...@lyra.org>.
On Thu, May 09, 2002 at 05:57:16PM -0700, Justin Erenkrantz wrote:
>...
> I really think you're talking about a push-based filter system.
> However, it seems that there was a conscious decision to use
> pull for input-filters.  I wasn't around when that discussion
> was made.  I'd like to hear the rationale for using pull for
> input filters.

Historical. Handlers "pull" input data. Thus, the input filter stack also
needed to be a pull mechanism.

In apr-serf, I've advocated providing both models to the application. The
app can push content at the network, or the network can pull content from
the app. Also, the app can pull input from the network, or the network can
push input at the app.

Note that a push-based filter stack can be used in a pull-fashion. When the
app wants to pull content, the subsystem tells the network endpoint to push
data into the filter stack. The data is then captured on the other side, and
an appropriate amount is returned to the app (and the rest is buffered off
to the side.

>...
> Sending metadata down is a big change.  Again, I *think* this was
> discussed before, but was determined that this wasn't the right way.

No. We think it is right, but it was too big of a change for Apache. Too
much code simply likes to write to r->output_headers.

>...
> (If we do this for input filters, I think we need to do the
> same for output filters.)

The filter stack "should" transport all metadata. The request_rec is an
out-of-band data delivery that hurts us quite a bit in a filter-stack world.

> > around in my head for a long time. When they become clear enough I will
> > write up a more formal and concise proposal on how I think the future
> > filter system should work (possible for 2.1 or beyond). I think the
> > apr-serf project is a perfect place to play with some of these ideas. I
> > would appreciate any constructive comments to the above. ]

I would totally agree. My hope is that apr-serf can establish a new
substrate for the filter systems. It is only a client, though, so it would
be used by proxy, but not by the MPM/listener stuff in Apache (the filter
stack code would be; just not the standard HTTP client endpoints).

> I'm not sure I'm happy that so early in the 2.0 series that we're
> already concerned about the input filtering.  I don't think it's
> ever been "right" - probably because it was ignored for so long.
> It's showing now.  If this prevents people from writing good
> input filters, I think we need to fix this sooner rather than
> later.  -- justin

The input stuff works, but it could probably be better. At a minimum, it
probably makes some sense to have a mode that says "give me as much of the
request as you feel cozy giving me." That would allow the input filters to
return a SOCKET rather than a bunch of 8k buckets. However, to really make
it work (at all, and "best"), we would need a variant of the SOCKET bucket.
It would allow us to share the apr_socket_t and apply a read-limit on the
thing. Thus, you could say "here are 1000 bytes, read from a socket." That
would give you delayed read from the socket (and possible later optimization
of doing a sendfile() from the socket fd into a file fd), yet apply the
appropriate request-boundary limitations.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: How I Think Filters Should Work

Posted by Greg Ames <gr...@apache.org>.
Justin Erenkrantz wrote:
> 
> On Thu, May 09, 2002 at 05:18:42PM -0700, Aaron Bannert wrote:
> > Let me be more precise. I'm not saying that we shouldn't use
> > brigades. What I'm saying is we shouldn't be dealing with specific types
> > of data at this level. Right now, by requiring a filter to request
> > "bytes" or "lines", we are seriously constraining the performance of
> > the filters. A filter should only inspect the types of the buckets it
> > retrieves and then move on. The bytes should only come in to play once
> > we have actually retrieved a bucket of a certain type that we are able
> > to process.
> 
> I really think you're talking about a push-based filter system.
> However, it seems that there was a conscious decision to use
> pull for input-filters.  I wasn't around when that discussion
> was made.  I'd like to hear the rationale for using pull for
> input filters.
> 
> > HEADER
> > HEADER
> > HEADER
> > DATA (extra data read past headers)
> > SOCKET
> > EOS
> 
> Sending metadata down is a big change.  Again, I *think* this was
> discussed before, but was determined that this wasn't the right way.
> I think we're going down a path that was discussed before.
> (As Manoj kidded me last night, you and I seem to retrace old
> discussions coming to the same conclusions Ryan and he did.)
> So, I think we need some of the old people to tell us why we
> aren't doing this.
> 
> (If we do this for input filters, I think we need to do the
> same for output filters.)
> 
> > around in my head for a long time. When they become clear enough I will
> > write up a more formal and concise proposal on how I think the future
> > filter system should work (possible for 2.1 or beyond). I think the
> > apr-serf project is a perfect place to play with some of these ideas. I
> > would appreciate any constructive comments to the above. ]
> 
> I'm not sure I'm happy that so early in the 2.0 series that we're
> already concerned about the input filtering.  I don't think it's
> ever been "right" - probably because it was ignored for so long.
> It's showing now.  If this prevents people from writing good
> input filters, I think we need to fix this sooner rather than
> later.  -- justin

Re: How I Think Filters Should Work

Posted by Justin Erenkrantz <je...@apache.org>.
On Thu, May 09, 2002 at 05:18:42PM -0700, Aaron Bannert wrote:
> Let me be more precise. I'm not saying that we shouldn't use
> brigades. What I'm saying is we shouldn't be dealing with specific types
> of data at this level. Right now, by requiring a filter to request
> "bytes" or "lines", we are seriously constraining the performance of
> the filters. A filter should only inspect the types of the buckets it
> retrieves and then move on. The bytes should only come in to play once
> we have actually retrieved a bucket of a certain type that we are able
> to process.

I really think you're talking about a push-based filter system.
However, it seems that there was a conscious decision to use
pull for input-filters.  I wasn't around when that discussion
was made.  I'd like to hear the rationale for using pull for
input filters.

> HEADER
> HEADER
> HEADER
> DATA (extra data read past headers)
> SOCKET
> EOS

Sending metadata down is a big change.  Again, I *think* this was
discussed before, but was determined that this wasn't the right way.
I think we're going down a path that was discussed before.
(As Manoj kidded me last night, you and I seem to retrace old
discussions coming to the same conclusions Ryan and he did.)
So, I think we need some of the old people to tell us why we
aren't doing this.

(If we do this for input filters, I think we need to do the
same for output filters.)

> around in my head for a long time. When they become clear enough I will
> write up a more formal and concise proposal on how I think the future
> filter system should work (possible for 2.1 or beyond). I think the
> apr-serf project is a perfect place to play with some of these ideas. I
> would appreciate any constructive comments to the above. ]

I'm not sure I'm happy that so early in the 2.0 series that we're
already concerned about the input filtering.  I don't think it's
ever been "right" - probably because it was ignored for so long.
It's showing now.  If this prevents people from writing good
input filters, I think we need to fix this sooner rather than
later.  -- justin

How I Think Filters Should Work

Posted by Aaron Bannert <aa...@clove.org>.
> > That just sounds like the same thing with a blocking or non-blocking*
> > flag. To be honest, I don't see how any input filters would need anything
> > except one bucket at a time. If the filter doesn't need it, it passes
> > it downstream, otherwise it chugs and spits out other buckets. What else
> > is there?
> 
> Yuck.  I think it'd be possible for input filters to buffer up or
> modify data and then pass them up with multiple buckets in a
> brigade rather than one bucket.  Think of a mod_deflate input
> filter.  -- justin

Let me be more precise. I'm not saying that we shouldn't use
brigades. What I'm saying is we shouldn't be dealing with specific types
of data at this level. Right now, by requiring a filter to request
"bytes" or "lines", we are seriously constraining the performance of
the filters. A filter should only inspect the types of the buckets it
retrieves and then move on. The bytes should only come in to play once
we have actually retrieved a bucket of a certain type that we are able
to process.

Furthermore, we should be using a dynamic type system, and liberally
creating new bucket types as we invent new implementations. Filters need
not know which filters are upstream or downstream from them, but they
should have been strategically placed to consume certain buckets from
upstream filters and to produce certain buckets required by downstream
filters.


[Warning: long-winded brainstorm follows:]


I want a typical filter chain to look like this:

input_source  --->  protocol filters  -->  sub-protocol filters  --> handlers

an input socket would produce this:

SOCKET
EOS

an http header parser filter would produce these:

HEADER
HEADER
HEADER
DATA (extra data read past headers)
SOCKET
EOS

an http request parser would only work at the request level, performing
dechunking, dealing with content-length, and dealing with pipelined
requests. It would produce these:

BEGIN_OF_REQUEST
HEADERS
BEGIN_OF_BODY_DATA
BODY_DATA
BODY_DATA
BODY_DATA
BODY_DATA
END_OF_BODY_DATA
TRAILERS...
END_OF_REQUEST
... and so on

a multipart input handler would then pass all types except BODY_DATA,
which it could use to produce:

...
MULTIPART_SECTION_BEGIN
BODY_DATA
MULTIPART_SECTION_END
...

or a magic mime filter could simply buffer enough BODY_DATA buckets until
it knew the type, prepending a MIME_TYPE to the front and sending
the whole thing downstream.

...
MIME_TYPE
BODY_DATA
BODY_DATA
...


The basic pattern for any input filter (which is pull-based at the moment
in Apache) would be the following:

1. retrieve next "abstract data unit"
2. inspect "abstract data unit", can we operate on it?
3. if yes, operate_on(unit) and pass the result to the next filter.
4. if no, pass the current unit to the next filter.
5. go to #1

In this model, the operate_on() behavior has been separated from the
mechanics of passing data around. I believe this would improve filter
performance as well as simplifying the implementation details that
module authors must understand. I also think this would dramatically
improve the extendability of the Apache Filters system.

[Sorry for the long brain dump. Some of these ideas have been floating
around in my head for a long time. When they become clear enough I will
write up a more formal and concise proposal on how I think the future
filter system should work (possible for 2.1 or beyond). I think the
apr-serf project is a perfect place to play with some of these ideas. I
would appreciate any constructive comments to the above. ]

-aaron



Re: Question about input filters and request bodies

Posted by Justin Erenkrantz <je...@apache.org>.
On Tue, May 07, 2002 at 01:26:54PM -0700, Aaron Bannert wrote:
> > > Is there a better way to deal with this? Shouldn't the most prevalent mode
> > > of an input filter be something as simple as AP_MODE_GIVE_ME_A_BUCKET?
> > 
> > That would be a nice mode. An alternative is to peek, find nothing, then
> > block on getting *some* input.
> 
> That just sounds like the same thing with a blocking or non-blocking*
> flag. To be honest, I don't see how any input filters would need anything
> except one bucket at a time. If the filter doesn't need it, it passes
> it downstream, otherwise it chugs and spits out other buckets. What else
> is there?

Yuck.  I think it'd be possible for input filters to buffer up or
modify data and then pass them up with multiple buckets in a
brigade rather than one bucket.  Think of a mod_deflate input
filter.  -- justin

Re: Question about input filters and request bodies

Posted by Aaron Bannert <aa...@clove.org>.
> > Is there a better way to deal with this? Shouldn't the most prevalent mode
> > of an input filter be something as simple as AP_MODE_GIVE_ME_A_BUCKET?
> 
> That would be a nice mode. An alternative is to peek, find nothing, then
> block on getting *some* input.

That just sounds like the same thing with a blocking or non-blocking*
flag. To be honest, I don't see how any input filters would need anything
except one bucket at a time. If the filter doesn't need it, it passes
it downstream, otherwise it chugs and spits out other buckets. What else
is there?


* Note: non-blocking mode is really only useful if there is a way to
multiplex the I/O from multiple filter chains, and we don't have that.

-aaron

Re: Question about input filters and request bodies

Posted by Greg Stein <gs...@lyra.org>.
On Tue, May 07, 2002 at 01:09:06PM -0700, Aaron Bannert wrote:
> I have a module that needs to suck in the request body. I'm trying to
> use ap_get_brigade to do this, as this seems most natural (as opposed
> to the ap_*_client_block() family). The semantic that I'm looking for
> is something simple like this: "retrieve all the buckets that represent
> a request body". I would expect there to be a way to determine if an
> input body was not present, without causing the server to block on the
> socket waiting for input. As it is I see no simple way to do this. The
> only parallel I see is much more complicated (call ap_get_brigade with
> the AP_MODE_READBYTES mode, forcing buckets in to n-sized chunks even
> if that is not an optimal form, stopping when an EOS comes across (?) ).
> 
> Is there a better way to deal with this? Shouldn't the most prevalent mode
> of an input filter be something as simple as AP_MODE_GIVE_ME_A_BUCKET?

That would be a nice mode. An alternative is to peek, find nothing, then
block on getting *some* input.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Question about input filters and request bodies

Posted by Justin Erenkrantz <je...@apache.org>.
On Thu, May 09, 2002 at 03:37:47PM -0700, Aaron Bannert wrote:
> On Thu, May 09, 2002 at 03:16:48PM -0700, Justin Erenkrantz wrote:
> > No.  I think the best way to handle this situation is to teach
> > HTTP_IN about what requests may have input bodies.  If the request
> > doesn't indicate a body, an EOS bucket is returned.
> 
> That sounds reasonable enough to me. From the perspective of a module
> author, will I be able to assume that an EOS will occur at the end of
> a brigade? It would be nice to simply loop over the incoming buckets
> (fetching more brigades as necessary) and procsesing until I reach an EOS.
> Please stop me if this is not the pattern you're envisioning.

We're agreeing here.  =)

If there *is* a body right now, that's how it should be.  The
only difference is that we'll return EOS when we believe there
is no body.

> Looks correct to me. I think if there is a body, unless the RFC explicitly
> forbids us from accepting a body with a particular method, then we should
> pass it down the filter stack. I don't at all see how multipart/byterange
> would work with this (but then again I still think that we aren't using
> sentinel buckets liberally enough in our filter implementations.)

That's possible as well.  So, if there is a C-L or T-E, we process
the body (regardless of method), but if there isn't a specified
C-L or T-E, we assume there is no body.  That might work out
better, in fact.  -- justin

Re: Question about input filters and request bodies

Posted by Aaron Bannert <aa...@clove.org>.
On Thu, May 09, 2002 at 03:16:48PM -0700, Justin Erenkrantz wrote:
> No.  I think the best way to handle this situation is to teach
> HTTP_IN about what requests may have input bodies.  If the request
> doesn't indicate a body, an EOS bucket is returned.

That sounds reasonable enough to me. From the perspective of a module
author, will I be able to assume that an EOS will occur at the end of
a brigade? It would be nice to simply loop over the incoming buckets
(fetching more brigades as necessary) and procsesing until I reach an EOS.
Please stop me if this is not the pattern you're envisioning.

> The rules I can come up with the top of my head are (need to be
> double-checked):
> - GET and HEAD can't have bodies (I think there are others that may
>   do this.)  The RFC says that the bodies should be ignored.  So,
>   there may need to be some way to discard the bodies.  But, it also
>   says that proxies SHOULD forward them.  We may have to think about
>   this.
> - Must have either C-L or a valid T-E.
> - Have to support multipart/byteranges (we aren't doing this now!)
> 
> So, something like this in ap_http_filter:
> if (!ctx) {
>     if method is GET or HEAD { return EOS bucket in brigade; }
>     if T-E { set body type to be T-E }
>     else if C-L { set body length to C-L }
>     else { return EOS bucket in brigade; }
>     ..normal init..
> }
> 
> ..normal code...
> 
> What do you think of this?  I think this means making BODY_NONE
> state invalid.  It was primarily there before because we were
> using ap_getline in HTTP_IN - now we use the brigade calls, so this
> state is no longer needed.  -- justin

Looks correct to me. I think if there is a body, unless the RFC explicitly
forbids us from accepting a body with a particular method, then we should
pass it down the filter stack. I don't at all see how multipart/byterange
would work with this (but then again I still think that we aren't using
sentinel buckets liberally enough in our filter implementations.)

-aaron

Re: Question about input filters and request bodies

Posted by Justin Erenkrantz <je...@apache.org>.
On Tue, May 07, 2002 at 01:09:06PM -0700, Aaron Bannert wrote:
> I have a module that needs to suck in the request body. I'm trying to
> use ap_get_brigade to do this, as this seems most natural (as opposed
> to the ap_*_client_block() family). The semantic that I'm looking for
> is something simple like this: "retrieve all the buckets that represent
> a request body". I would expect there to be a way to determine if an
> input body was not present, without causing the server to block on the
> socket waiting for input. As it is I see no simple way to do this. The
> only parallel I see is much more complicated (call ap_get_brigade with
> the AP_MODE_READBYTES mode, forcing buckets in to n-sized chunks even
> if that is not an optimal form, stopping when an EOS comes across (?) ).
> 
> Is there a better way to deal with this? Shouldn't the most prevalent mode
> of an input filter be something as simple as AP_MODE_GIVE_ME_A_BUCKET?

No.  I think the best way to handle this situation is to teach
HTTP_IN about what requests may have input bodies.  If the request
doesn't indicate a body, an EOS bucket is returned.

The rules I can come up with the top of my head are (need to be
double-checked):
- GET and HEAD can't have bodies (I think there are others that may
  do this.)  The RFC says that the bodies should be ignored.  So,
  there may need to be some way to discard the bodies.  But, it also
  says that proxies SHOULD forward them.  We may have to think about
  this.
- Must have either C-L or a valid T-E.
- Have to support multipart/byteranges (we aren't doing this now!)

So, something like this in ap_http_filter:
if (!ctx) {
    if method is GET or HEAD { return EOS bucket in brigade; }
    if T-E { set body type to be T-E }
    else if C-L { set body length to C-L }
    else { return EOS bucket in brigade; }
    ..normal init..
}

..normal code...

What do you think of this?  I think this means making BODY_NONE
state invalid.  It was primarily there before because we were
using ap_getline in HTTP_IN - now we use the brigade calls, so this
state is no longer needed.  -- justin