You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Graham Dumpleton <gr...@gmail.com> on 2008/01/30 01:28:46 UTC

Reading of input after headers sent and 100-continue.

A question about HTTP output filter and 100-continue.

The HTTP output filter will send a 100 result back to a client when
the first attempt to read input occurs and an Except header with
100-continue was received. Ie., from http_filters.c we have:

        /* Since we're about to read data, send 100-Continue if needed.
         * Only valid on chunked and C-L bodies where the C-L is > 0. */
        if ((ctx->state == BODY_CHUNK ||
            (ctx->state == BODY_LENGTH && ctx->remaining > 0)) &&
            f->r->expecting_100 && f->r->proto_num >= HTTP_VERSION(1,1)) {
            char *tmp;
            apr_bucket_brigade *bb;

            tmp = apr_pstrcat(f->r->pool, AP_SERVER_PROTOCOL, " ",
                              ap_get_status_line(100), CRLF CRLF, NULL);
            bb = apr_brigade_create(f->r->pool, f->c->bucket_alloc);
            e = apr_bucket_pool_create(tmp, strlen(tmp), f->r->pool,
                                       f->c->bucket_alloc);
            APR_BRIGADE_INSERT_HEAD(bb, e);
            e = apr_bucket_flush_create(f->c->bucket_alloc);
            APR_BRIGADE_INSERT_TAIL(bb, e);

            ap_pass_brigade(f->c->output_filters, bb);
        }

Now, if one is generating response content prior to having read any
input, if one hasn't buffered the response output, by virtue of
explicitly flushing it in some way, this will trigger any response
headers which have been set up to be sent.

The problem then is if only after having sent some response content
and triggering the response headers to be sent one actually goes to
read the input, then the HTTP output filter above is still sending the
100 status response string. In other words, the 100 response status
string is appearing in the middle of the actual response content.

My question then is, what should a handler do if it is trying to
generate response content (non buffered), before having attempted to
read any input, ie., what is the correct way to stop Apache still
sending the 100 status response for the 100-continue header? I know
that setting r->expecting_100 to 0 at time that first response content
is being sent will prevent it, but is there something else that should
be done instead.

BTW, this is partly theoretical in that have no actual code that is
doing this, but technically in systems like mod_python or mod_wsgi
where one doesn't know what the Python application code running on top
is doing, a user could trigger this situation.

This is occurring when testing with Apache 2.2.4.

Any ideas appreciated.

Graham

Re: Reading of input after headers sent and 100-continue.

Posted by Graham Dumpleton <gr...@gmail.com>.
For those on the Python web sig who might be thinking they missed part
of the conversation, you have. This is the second half of a
conversation started on Apache modules-dev list about Apache
100-continue processing. If interested, you can see the first half of
the conversation at:

  http://mail-archives.apache.org/mod_mbox/httpd-modules-dev/200801.mbox/browser

Graham

On 31/01/2008, Brian Smith <br...@briansmith.org> wrote:
> Graham Dumpleton wrote:
> > Effectively, if a 200 response came back, it seems to suggest
> > that the client still should send the request body, just that
> > it 'SHOULD NOT wait for an indefinite period'. It doesn't say
> > explicitly for the client that it shouldn't still send the
> > request body if another response code comes back.
>
> This behavior is to support servers that don't understand the Expect:
> header.
>
> Basically, if the server responds with a 100, the client must send the
> request body. If the server responds with a 4xx or 5xx, the client must
> not send the request body. If the server responds with a 2xx or a 3xx,
> then the client should must send (the rest of) the request body, on the
> assumption that the server doesn't understand "Expect:". To be
> completely compliant, a server should always respond with a 100 in front
> of a 2xx or 3xx, I guess. Thanks for clarifying that for me. I guess the
> rules make sense after all.
>
> > So technically, if the client has to still send the request
> > content, something could still read it. It would not be ideal
> > that there is a delay depending on what the client does, but
> > would still be possible from what I read of this section.
>
> You are right. To avoid confusion, you should probably force mod_wsgi to
> send a 100-continue in front of any 2xx or 3xx response.
>
> > It MUST NOT perform the requested method if it returns a final status
> code.
>
> The implication is that the only time it will avoid sending a 100 is
> when it is sending a 4xx, and it should never perform the requested
> method if it already said the method failed. The only excuse for not
> sending a 100 is that you don't know about "Expect: 100-continue". But,
> that can't be true if you are reading this part of the spec!
>
> >        """If it responds with a final status
> >         code, it MAY close the transport connection or it MAY continue
> >         to read and discard the rest of the request."""
>
> If the client receives a 2xx or 3xx without a 100 first, it has to send
> the request body (well, depending on which 3xx it is, that is not true).
> But, the server doesn't have to read it! But, again, the assumption is
> that the server will only send a response without a 100 if it is a 4xx
> or 5xx.
>
> > It seems by what you are saying that if 100-continue is
> > present this wouldn't be allowed, and that to ensure correct
> > behaviour the handler would have to read at least some of the
> > request body before sending back the response headers.
>
> You are right, I was wrong.
>
> > > Since ap_http_filter is an input filter only, it should be
> > enough to
> > > just avoid reading from the input brigade. (AFAICT, anyway.)
> >
> > In other words block the handler from reading, potentially
> > raise an error in the process. Except to be fair and
> > consistent, you would have to apply the same rule even if
> > 100-continue isn't present. Whether that would break some
> > existing code in doing that is the concern I have, even if it
> > is some simple test program that just echos back the request
> > body as the response body.
>
> Technically, even if the server returns a 4xx, it can still read the
> request body, but it might not get anything or it might only get part of
> it. I guess, the change to the WSGI spec that is needed is to say that
> the gateway must not send the "100 continue" if it has already sent some
> headers, and that it should send a "100 continue" before any 2xx or 3xx
> code, which is basically what James Knight suggested (sorry James). The
> gateway must indicate EOF if only a partial request body was received. I
> don't think the gateway should be required to provide any of the partial
> request content on a 4xx, though.
>
> - Brian
>
>

RE: Reading of input after headers sent and 100-continue.

Posted by Brian Smith <br...@briansmith.org>.
Graham Dumpleton wrote:
> Effectively, if a 200 response came back, it seems to suggest 
> that the client still should send the request body, just that 
> it 'SHOULD NOT wait for an indefinite period'. It doesn't say 
> explicitly for the client that it shouldn't still send the 
> request body if another response code comes back.

This behavior is to support servers that don't understand the Expect:
header. 

Basically, if the server responds with a 100, the client must send the
request body. If the server responds with a 4xx or 5xx, the client must
not send the request body. If the server responds with a 2xx or a 3xx,
then the client should must send (the rest of) the request body, on the
assumption that the server doesn't understand "Expect:". To be
completely compliant, a server should always respond with a 100 in front
of a 2xx or 3xx, I guess. Thanks for clarifying that for me. I guess the
rules make sense after all.

> So technically, if the client has to still send the request 
> content, something could still read it. It would not be ideal 
> that there is a delay depending on what the client does, but 
> would still be possible from what I read of this section.

You are right. To avoid confusion, you should probably force mod_wsgi to
send a 100-continue in front of any 2xx or 3xx response.

> It MUST NOT perform the requested method if it returns a final status
code.

The implication is that the only time it will avoid sending a 100 is
when it is sending a 4xx, and it should never perform the requested
method if it already said the method failed. The only excuse for not
sending a 100 is that you don't know about "Expect: 100-continue". But,
that can't be true if you are reading this part of the spec!

>        """If it responds with a final status
>         code, it MAY close the transport connection or it MAY continue
>         to read and discard the rest of the request."""

If the client receives a 2xx or 3xx without a 100 first, it has to send
the request body (well, depending on which 3xx it is, that is not true).
But, the server doesn't have to read it! But, again, the assumption is
that the server will only send a response without a 100 if it is a 4xx
or 5xx.

> It seems by what you are saying that if 100-continue is 
> present this wouldn't be allowed, and that to ensure correct 
> behaviour the handler would have to read at least some of the 
> request body before sending back the response headers.

You are right, I was wrong. 

> > Since ap_http_filter is an input filter only, it should be 
> enough to 
> > just avoid reading from the input brigade. (AFAICT, anyway.)
> 
> In other words block the handler from reading, potentially 
> raise an error in the process. Except to be fair and 
> consistent, you would have to apply the same rule even if 
> 100-continue isn't present. Whether that would break some 
> existing code in doing that is the concern I have, even if it 
> is some simple test program that just echos back the request 
> body as the response body.

Technically, even if the server returns a 4xx, it can still read the
request body, but it might not get anything or it might only get part of
it. I guess, the change to the WSGI spec that is needed is to say that
the gateway must not send the "100 continue" if it has already sent some
headers, and that it should send a "100 continue" before any 2xx or 3xx
code, which is basically what James Knight suggested (sorry James). The
gateway must indicate EOF if only a partial request body was received. I
don't think the gateway should be required to provide any of the partial
request content on a 4xx, though.

- Brian


Re: Reading of input after headers sent and 100-continue.

Posted by Graham Dumpleton <gr...@gmail.com>.
On 31/01/2008, Brian Smith <br...@briansmith.org> wrote:
>
>
> > -----Original Message-----
> > From: Graham Dumpleton [mailto:graham.dumpleton@gmail.com]
> > Sent: Tuesday, January 29, 2008 4:29 PM
> > To: modules-dev@httpd.apache.org
> > Subject: Reading of input after headers sent and 100-continue.
> >
> > The HTTP output filter will send a 100 result back to a
> > client when the first attempt to read input occurs and an
> > Except header with 100-continue was received. Ie., from
> > http_filters.c we have:
> >
> > if ((ctx->state == BODY_CHUNK ||
> >   (ctx->state == BODY_LENGTH && ctx->remaining > 0)) &&
> >   f->r->expecting_100 && f->r->proto_num >= HTTP_VERSION(1,1)) {
>
> This is from ap_http_filter(). If you look at http_core.c, you can see
> that it is registered as an input filter, not an output filter.

I knew what I meant, it just didn't come out right. I blame the keyboard. :-)

> So, if
> you never read from the input brigade, the "100 continue" will never be
> sent. I'm not sure if the module needs to just ignore the input brigade,
> or actively throw it away, though.
>
> > The problem then is if only after having sent some response
> > content and triggering the response headers to be sent one
> > actually goes to read the input, then the HTTP output filter
> > above is still sending the 100 status response string. In
> > other words, the 100 response status string is appearing in
> > the middle of the actual response content.
>
> "Doctor, it hurts when I do this!" :)
>
> If a module is sending a response before a 100 continue has been sent,
> then it shouldn't read from the input brigade, because it is going
> against the HTTP spec.

Can you point to the specific bit of the HTTP specification which says that.

Section 8.2.3 would to me appear to have slightly conflicting statements.

In particular it says:

"""Because of the presence of older implementations, the protocol
allows ambiguous situations in which a client may send "Expect: 100-
continue" without receiving either a 417 (Expectation Failed) status
or a 100 (Continue) status. Therefore, when a client sends this header
field to an origin server (possibly via a proxy) from which it has
never seen a 100 (Continue) status, the client SHOULD NOT wait for an
indefinite period before sending the request body."""

Effectively, if a 200 response came back, it seems to suggest that the
client still should send the request body, just that it 'SHOULD NOT
wait for an indefinite period'. It doesn't say explicitly for the
client that it shouldn't still send the request body if another
response code comes back.

This is what I have seen with curl as a client. If one sends back a
200 response without reading any input, curl still sends the request
content, but one does notice a slight pause as some timeout occurs
only at which point it sends the request content. In other words, curl
doesn't send it as soon as it sees the 200 response, but it does still
send it.

So technically, if the client has to still send the request content,
something could still read it. It would not be ideal that there is a
delay depending on what the client does, but would still be possible
from what I read of this section.

But then, later it says:

"""    Upon receiving a request which includes an Expect request-header
        field with the "100-continue" expectation, an origin server MUST
        either respond with 100 (Continue) status and continue to read
        from the input stream, or respond with a final status code. The
        origin server MUST NOT wait for the request body before sending
        the 100 (Continue) response. If it responds with a final status
        code, it MAY close the transport connection or it MAY continue
        to read and discard the rest of the request.  It MUST NOT
        perform the requested method if it returns a final status code."""

The critical bit here I guess is:

       """If it responds with a final status
        code, it MAY close the transport connection or it MAY continue
        to read and discard the rest of the request."""

This suggests that the server can discard the request body if handler
didn't try and read it before returning a response. What it means by:

     """It MUST NOT
        perform the requested method if it returns a final status code."""

I am not quite sure because if the response headers was returned by
the handler you are already in the process of performing the requested
method, so how can you not now do it.

What is also a bit worrying to me is that what might be allowed by a
handler for a request can be changed based on the presence of
100-continue, something which is out of the control of the handler and
the web server receiving the request.

Specifically, if 100-continue is not present and the client therefore
sent the request body anyway, then technically nothing to stop the
handler reading the input after the response headers have been sent.
For example, the handler may generate response headers for same
content length and only then starting reading input and returning it
as the response body.

It seems by what you are saying that if 100-continue is present this
wouldn't be allowed, and that to ensure correct behaviour the handler
would have to read at least some of the request body before sending
back the response headers.

Thus it doesn't seem that clear to me what can and cant be done unless
there is some other section in the RFC which describes it.

> > My question then is, what should a handler do if it is trying
> > to generate response content (non buffered), before having
> > attempted to read any input, ie., what is the correct way to
> > stop Apache still sending the 100 status response for the
> > 100-continue header? I know that setting r->expecting_100 to
> > 0 at time that first response content is being sent will
> > prevent it, but is there something else that should be done
> > instead?
>
> Since ap_http_filter is an input filter only, it should be enough to
> just avoid reading from the input brigade. (AFAICT, anyway.)

In other words block the handler from reading, potentially raise an
error in the process. Except to be fair and consistent, you would have
to apply the same rule even if 100-continue isn't present. Whether
that would break some existing code in doing that is the concern I
have, even if it is some simple test program that just echos back the
request body as the response body.

> > BTW, this is partly theoretical in that have no actual code
> > that is doing this, but technically in systems like
> > mod_python or mod_wsgi where one doesn't know what the Python
> > application code running on top is doing, a user could
> > trigger this situation.
>
> The module can provide an interface to the input and output brigades
> that prevents the application from doing this. mod_wsgi is doing this
> already. As I mentioned on the Web-SIG list, it is difficult to have an
> uniform, automatic mechanism for doing this for all request methods, or
> even a uniform way of doing it for a particular method. So, it basically
> has to be left up to the handler/application.

All too confusing. :-(

Graham

RE: Reading of input after headers sent and 100-continue.

Posted by Brian Smith <br...@briansmith.org>.
 

> -----Original Message-----
> From: Graham Dumpleton [mailto:graham.dumpleton@gmail.com] 
> Sent: Tuesday, January 29, 2008 4:29 PM
> To: modules-dev@httpd.apache.org
> Subject: Reading of input after headers sent and 100-continue.
> 
> The HTTP output filter will send a 100 result back to a 
> client when the first attempt to read input occurs and an 
> Except header with 100-continue was received. Ie., from 
> http_filters.c we have:
> 
> if ((ctx->state == BODY_CHUNK ||
>   (ctx->state == BODY_LENGTH && ctx->remaining > 0)) &&
>   f->r->expecting_100 && f->r->proto_num >= HTTP_VERSION(1,1)) {

This is from ap_http_filter(). If you look at http_core.c, you can see
that it is registered as an input filter, not an output filter. So, if
you never read from the input brigade, the "100 continue" will never be
sent. I'm not sure if the module needs to just ignore the input brigade,
or actively throw it away, though.

> The problem then is if only after having sent some response 
> content and triggering the response headers to be sent one 
> actually goes to read the input, then the HTTP output filter 
> above is still sending the 100 status response string. In 
> other words, the 100 response status string is appearing in 
> the middle of the actual response content.

"Doctor, it hurts when I do this!" :)

If a module is sending a response before a 100 continue has been sent,
then it shouldn't read from the input brigade, because it is going
against the HTTP spec. 

> My question then is, what should a handler do if it is trying 
> to generate response content (non buffered), before having 
> attempted to read any input, ie., what is the correct way to 
> stop Apache still sending the 100 status response for the 
> 100-continue header? I know that setting r->expecting_100 to 
> 0 at time that first response content is being sent will 
> prevent it, but is there something else that should be done
> instead?

Since ap_http_filter is an input filter only, it should be enough to
just avoid reading from the input brigade. (AFAICT, anyway.)

> BTW, this is partly theoretical in that have no actual code 
> that is doing this, but technically in systems like 
> mod_python or mod_wsgi where one doesn't know what the Python 
> application code running on top is doing, a user could 
> trigger this situation.

The module can provide an interface to the input and output brigades
that prevents the application from doing this. mod_wsgi is doing this
already. As I mentioned on the Web-SIG list, it is difficult to have an
uniform, automatic mechanism for doing this for all request methods, or
even a uniform way of doing it for a particular method. So, it basically
has to be left up to the handler/application.

- Brian