You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Justin Erenkrantz <je...@ebuilt.com> on 2001/12/31 11:40:19 UTC

Input Filtering and Pushback.

I've attempted to think some more about how we should handle 
"pushback" in the input filtering (say that a caller does a 
speculative read - i.e. MIME-continuation in ap_[r]getline).

I think the best strategy would be to introduce a new input filter
mode (call it PEEK and rename the current PEEK mode to EATCRLF) that 
will indicate to input filters that this data may not be consumed
and that it should hold on to the data if possible so that it can
be read again (on a non-PEEK call).  Using this strategy wisely 
should remove all instances of a req_cfg->bb (such as what 
ap_[r]getline uses).  Data may only be stored inside the filter
chain (no external brigades with unread input are allowed!).

What follows is my high-level algorithm on how ap_[r]getline would 
work with this strategy.  Please provide any feedback or thoughts
you may have on this.  Thanks.  -- justin

ap_getline with buckets and brigades:
1. call ap_get_brigade with 0 to indicate want LF-line.
   - ap_get_brigade may return with a line that does not have
     a LF-ending (i.e. len == HUGE_STRING_LEN).
2. Check to see if we got an error value back (!= APR_SUCCESS)
3. Call apr_brigade_length to determine length of brigade.
4. Do we have enough space in our caller's buffer to handle 
   this?  (If we're not doing the allocation for them, that is.)
5. Run through all buckets in the brigade and read/copy into
   the caller's buffer.  (What happened to zero-copy?)
6. Does last bucket's data (APR_BRIGADE_LAST) have LF char?
   - If no LF, go to step 1.
7. We now have a LF-line (otherwise, we'd error out).
8. Attempt to compress the EOL by removing any CR or whitespace.
   - This would maintain backwards compatibility, but may no 
     longer be worthwhile.
9. Are we asked to handling folding?
   - If not, then translate the buffer from ascii and return.
10. Call ap_get_brigade with PEEK.
   - Should this PEEK call block?  Is that an option that the
     caller to ap_get_brigade specifies?  I think ap_getline
     would indeed want it to be a blocking PEEK.
   - This peek is not the same as current AP_MODE_PEEK (which 
     just eats CRLFs).  It would require some modifications to 
     the current input filters to ensure that each filter does
     *NOT* delete the returned data from its input stack, but
     instead copies the returned buckets.  It seems that this 
     would be possible by doing a bucket copy (which should 
     always succeed since we are dealing with already-read 
     data).
        - Let's consider how core_input_filter would handle this:
          Have it treat it as a *readbytes != 0 case with the 
          following exception: it will copy *all* buckets in b
          back to ctx->b before returning.
        - mod_ssl: It would convert it from an AP_MODE_PEEK read
          to a *readbytes (AP_IOBUFSIZE) and read the data 
          normally.  In ssl_io_filter_Input, it will do as 
          core_input_filter does and leave the data in ctx->b 
          to be read again (and return it too).  Note that 
          core_input_filter will *not* see this read as a PEEK, 
          but as a normal ap_get_brigade call (*readbytes != 0).
        - HTTP_IN: Gets out of its way and lets it be passed 
          down without any verification on its part.  (Should
          it be concerned about reading past the end-of-request??)
        - xlate_in_filter: Can save the buffer in its context
          for later reading.
11. Was any data read?
   - If not, translate the buffer from ascii and return.
12. Is the first character a tab or space?
   - If not, translate the buffer from ascii and return.
13. Go to step 1.


Re: Input Filtering and Pushback.

Posted by Brian Pane <br...@cnet.com>.
Ryan Bloom wrote:

>We don't do zero-copy on input.
>

We've been getting closer to zero-copy on input with recent
optimizations like ap_rgetline().  We still do at least two
copies of each request header.  I don't know whether it's
worthwhile to try to further reduce the copying, but I'll
find out the next time I have a chance to look at profiler
data.  (Now that a lot of bigger performance problems in 2.0
have been solved, little things like this sometimes account
for a large enough percentage of CPU time to be worth fixing.)

--Brian



Re: Input Filtering and Pushback.

Posted by Greg Stein <gs...@lyra.org>.
[ I'm still trying to catch up on several weeks of email... just a quick
  note on something here... ]

On Mon, Dec 31, 2001 at 10:02:41AM -0800, Ryan Bloom wrote:
> On Monday 31 December 2001 02:40 am, Justin Erenkrantz wrote:
>...
> > I think the best strategy would be to introduce a new input filter
> > mode (call it PEEK and rename the current PEEK mode to EATCRLF) that 
> 
> I would recommend not re-using the PEEK mode, because that will
> cause confusion to people who have written input filters already.

Regardless, the existing PEEK ought to be renamed EATCRLF. Or even tossed
entirely somehow.

>...
> > 5. Run through all buckets in the brigade and read/copy into
> >    the caller's buffer.  (What happened to zero-copy?)
> 
> We don't do zero-copy on input.  The real benefit to zero-copy
> is on output, because the data tends to be larger.

There *should* be a benefit on input, too. Think about a PUT of a gigabyte
file that you want to dump onto the disk. In the best of all worlds, this
would map into a sendfile() from the socket to a file descriptor.

>...
> > 10. Call ap_get_brigade with PEEK.
> >    - Should this PEEK call block?  Is that an option that the
> >      caller to ap_get_brigade specifies?  I think ap_getline
> >      would indeed want it to be a blocking PEEK.
> 
> If we are going to conitnue to extend input filtering, we should
> probably split the mode from the blocking.

Absolutely agree.

Whether you have two parameters, or a single param with bitfields... *shrug*
IMO, I like two params.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Input Filtering and Pushback.

Posted by Ryan Bloom <rb...@covalent.net>.
On Monday 31 December 2001 02:40 am, Justin Erenkrantz wrote:
> I've attempted to think some more about how we should handle 
> "pushback" in the input filtering (say that a caller does a 
> speculative read - i.e. MIME-continuation in ap_[r]getline).

First of all, this isn't pushback at all.  Pushback means that you are
taking data that you have read, and are handing it back to a lower
filter.  That doesn't exist in Apache's filtering model, and it isn't
necessary.

> I think the best strategy would be to introduce a new input filter
> mode (call it PEEK and rename the current PEEK mode to EATCRLF) that 

I would recommend not re-using the PEEK mode, because that will
cause confusion to people who have written input filters already.

> will indicate to input filters that this data may not be consumed
> and that it should hold on to the data if possible so that it can
> be read again (on a non-PEEK call).  Using this strategy wisely 
> should remove all instances of a req_cfg->bb (such as what 
> ap_[r]getline uses).  Data may only be stored inside the filter
> chain (no external brigades with unread input are allowed!).
> 
> What follows is my high-level algorithm on how ap_[r]getline would 
> work with this strategy.  Please provide any feedback or thoughts
> you may have on this.  Thanks.  -- justin
> 
> ap_getline with buckets and brigades:
> 1. call ap_get_brigade with 0 to indicate want LF-line.
>    - ap_get_brigade may return with a line that does not have
>      a LF-ending (i.e. len == HUGE_STRING_LEN).
> 2. Check to see if we got an error value back (!= APR_SUCCESS)
> 3. Call apr_brigade_length to determine length of brigade.
> 4. Do we have enough space in our caller's buffer to handle 
>    this?  (If we're not doing the allocation for them, that is.)
> 5. Run through all buckets in the brigade and read/copy into
>    the caller's buffer.  (What happened to zero-copy?)

We don't do zero-copy on input.  The real benefit to zero-copy
is on output, because the data tends to be larger.

> 6. Does last bucket's data (APR_BRIGADE_LAST) have LF char?
>    - If no LF, go to step 1.
> 7. We now have a LF-line (otherwise, we'd error out).
> 8. Attempt to compress the EOL by removing any CR or whitespace.
>    - This would maintain backwards compatibility, but may no 
>      longer be worthwhile.
> 9. Are we asked to handling folding?
>    - If not, then translate the buffer from ascii and return.
> 10. Call ap_get_brigade with PEEK.
>    - Should this PEEK call block?  Is that an option that the
>      caller to ap_get_brigade specifies?  I think ap_getline
>      would indeed want it to be a blocking PEEK.

If we are going to conitnue to extend input filtering, we should
probably split the mode from the blocking.

>    - This peek is not the same as current AP_MODE_PEEK (which 
>      just eats CRLFs).  It would require some modifications to 
>      the current input filters to ensure that each filter does
>      *NOT* delete the returned data from its input stack, but
>      instead copies the returned buckets.  It seems that this 
>      would be possible by doing a bucket copy (which should 
>      always succeed since we are dealing with already-read 
>      data).
>         - Let's consider how core_input_filter would handle this:
>           Have it treat it as a *readbytes != 0 case with the 
>           following exception: it will copy *all* buckets in b
>           back to ctx->b before returning.
>         - mod_ssl: It would convert it from an AP_MODE_PEEK read
>           to a *readbytes (AP_IOBUFSIZE) and read the data 
>           normally.  In ssl_io_filter_Input, it will do as 
>           core_input_filter does and leave the data in ctx->b 
>           to be read again (and return it too).  Note that 
>           core_input_filter will *not* see this read as a PEEK, 
>           but as a normal ap_get_brigade call (*readbytes != 0).
>         - HTTP_IN: Gets out of its way and lets it be passed 
>           down without any verification on its part.  (Should
>           it be concerned about reading past the end-of-request??)
>         - xlate_in_filter: Can save the buffer in its context
>           for later reading.
> 11. Was any data read?
>    - If not, translate the buffer from ascii and return.
> 12. Is the first character a tab or space?
>    - If not, translate the buffer from ascii and return.
> 13. Go to step 1.

Since the PEEK function is returning data now, you will need to compare
the data read through the PEEK mode to the next chunk of data read to
be sure that they are identical.

Other than that, it looks good.

Ryan

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------