You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mod_python-dev@quetz.apache.org by Alex Cichowski <al...@gmail.com> on 2008/07/10 16:33:53 UTC

Chunked request content patch

Hi mod_python-dev,

I have created a patch for mod_python that fixes up the following issues:

  MODPYTHON-212
  req.read() with no arguments doesn't return all data where input 
filter inserts extra data in input stream

  MODPYTHON-222
  Support for chunked transfer encoding on request content

It does this by:

(1) Making read() and readline() ignore the content length in the case 
where no size argument is specified, and instead call 
ap_get_client_block() in a loop until it returns zero. The result buffer 
is dynamically expanded as necessary with _PyString_Resize().

(2) Adding a request.set_read_policy() call to specify whether 
REQUEST_NO_BODY, REQUEST_CHUNKED_ERROR or REQUEST_CHUNKED_DECHUNK is 
passed to ap_setup_client_block() on the first read()/readline() call. 
The default is REQUEST_CHUNKED_DECHUNK, but could be set to 
REQUEST_CHUNKED_ERROR for compatibility with the current behaviour.

This patch is against the mod_python-3.3.1 release. Please find it 
attached as "modpython-212-222-fix.patch". I have tested it manually and 
run test.py on it as well.


 --- Possible Objections ---

Aside from content-length-related objections that are obviously 
addressed by the above, I also noted the following objection to enabling 
the use of REQUEST_CHUNKED_DECHUNK on the mailing list:

Graham Dumpleton, 2007-02-23,
http://www.webservertalk.com/archive309-2007-2-1820928.html

 > [...] when using REQUEST_CHUNKED_DECHUNK it is important
 > that the read buffer be large enough to handle any chunk being read. Ie.,
 > from Apache source code:
 >
 > [...]
 > * In order to use the last two options, the caller MUST provide a buffer
 > * large enough to hold a chunk-size line, including any extensions.
 >
 > If chunked is used the mod_python source can't know what the buffer size
 > it would need to use is as that is going to be dependent on the 
higher level
 > application and what it is doing.

If you look carefully here, for example,

http://mail-archives.apache.org/mod_mbox/httpd-cvs/199611.mbox/
%3C199611251122.DAA06400@taz.hyperreal.com%3E

you can see old get_client_block() code that reads the chunk header line 
into the client-provided buffer before applying chunk decoding, hence 
the requirement that "the caller MUST provide a buffer large enough to 
hold a chunk-size line, including any extensions". However, the current 
ap_get_client_block() has completely replaced this with some APR "bucket 
brigade" magic. It seems to me that they just forgot to remove this comment.


 --- Substitute Patch ---

If this patch is not satisfactory, I would suggest at least the changes 
in the attached patch "requestobject.c.fix1.patch". This corrects the 
following problems in req_read():

(1) The first ap_get_client_block() call will currently overwrite any 
data retrieved by the preceding "If anything left in the readline 
buffer" code.

(2) The return value of the first call to ap_get_client_block() is not 
checked for errors (though it does look like it will try again in the 
loop below, where error checking is in fact done).

An additional problem NOT corrected by "requestobject.c.fix1.patch" (but 
corrected by the main patch):

(3) It seems that req_readline() allocates a buffer the size of the 
whole response just to return one line if the size argument is unspecified.


 --- Mailing List Problems ---

By the way, it looks like the mod_python web page needs to be updated 
regarding the address of the -dev email list. I had to go through the 
following redirects to actually find the list:

"<py...@httpd.apache.org>: This mailing list has moved to 
python-dev at quetz.apache.org."

"<py...@quetz.apache.org>: This mailing list has moved to 
mod_python-dev at quetz.apache.org."


Regards,
Alex


Re: Chunked request content patch

Posted by Alex Cichowski <al...@gmail.com>.
Graham Dumpleton wrote:
> Can you explain more how it would be a security problem?
>   
I doubt it would really be much of a problem, I'm just guessing at 
obscure things that could break. An example would be something like...
 - Malicious client sends chunked request to server that doesn't expect it.
 - Server passes request.clength to its custom C module.
 - I don't know what clength is when the content-length is unspecified, 
but let's say it's zero.
 - The C module mallocs clength (= zero) bytes.
 - Server calls request.read(), and passes the data block to the C module.
 - C module appends the block to its buffer, causing a buffer-overrun.

I don't really think it's the responsibility of mod_python to protect 
users from this, and as I mentioned before I suspect this type of attack 
may already be possible even without chunked requests. I'm just saying 
it's the only type of example I can think of that would be broken by 
making the default "dechunk" rather than "error on chunked request", so 
I don't think making the default "dechunk" should be a problem.

>> I thought someone already submitted a patch because read(len) used to return
>> < len bytes in the past, and it was considered to be breaking the file-like
>> object API.
>>     
> Never been the case with mod_python that I know of.
Just dug it up again:
  http://www.modpython.org/pipermail/mod_python/2000-August/011329.html
Obviously this patch was accepted long ago. I guess it doesn't really 
matter as long as we don't change the behaviour of read() in blocking mode.

>> And if you restrict it to returning either len bytes or a "would block"
>> exception, it would not solve the original problem I was intending to solve
>> - having to process and respond to < len bytes from the client before the
>> client will send any more.
>>     
> Yes it would. When blocking turned off, the intention would be that
> read() with or without arguments would always return only what it
> found. The returning of flag to indicate it would block, would only
> occur if not data at all available. Thus, disabling blocking would
> make it behave like a non blocking socket.
>   
I was addressing the possible design where read() was not permitted to 
return partial data, ever. OK, if you give me nonblocking reads and 
partial reads when I set a "nonblocking" flag, that will solve my 
problem and I will be fine. And non-blocking mode is certainly a very 
useful feature. But I still think they are two independent features. In 
my application, I just want to ensure I have processed and responded to 
all client data that is available in the buffer, otherwise the client 
may hang waiting for the response and therefore send no more data to the 
server, yielding deadlock. And I would actually prefer the read to block 
- if you attach both the "nonblocking" and "partial read" feature to the 
"nonblocking" API flag, it is an incovenience for me since I'm just 
going to have to sleep and read again whenever I get a "would block" 
return. Yes, this is a minor inconvenience, but I believe this 
demonstrates that we are talking about two different features here, and 
that it is useful to enable partial reads without enabling non-blocking 
mode.

> Rather than an exception, the other option is to return None instead.
> Thus, returning empty string means end of input, and returning None
> means would block.
>   
Ah, yes, I like idea of None a lot. Especially since this is not really 
an exceptional condition - in fact one might expect that a "would block" 
return would be more common than actually returning data. I think it is 
a good general principle that normal behaviour should not be represented 
with exceptions.

If I get some time I might be able to try implementing nonblocking mode 
and/or timeouts, but I have no idea what the Apache API for this is. 
Could you point me at the API I would need to use?

Alex




Re: Chunked request content patch

Posted by Graham Dumpleton <gr...@gmail.com>.
2008/7/19 Alex Cichowski <al...@gmail.com>:
> OK, I will investigate the LimitRequestBody problem.
>
> Graham Dumpleton wrote:
>>
>> What probably needs to be investigated is whether the default can't
>> just be REQUEST_CHUNKED_DECHUNK all the time and code behave just as
>> it did before. If this can be done, then all that needs to be solved
>> is your ability to do non blocking reads.
>>
>
> I suppose the only problem with making it the default is exposing security
> holes in people's apps that were previously protected against by
> CHUNKED_ERROR, since it's unlikely anyone will rely on getting a chunked
> failure in normal operation. The only thing I can think of is something
> depending on the content length (request.clength), but I would guess it's
> already possible for attackers to pass in requests without content lengths
> by using "Connection: close" or HTTP/1.0. Don't have any ideas of what else
> to investigate.

Can you explain more how it would be a security problem?

For mod_wsgi at least, a WSGI application is required to use the
Content-Length because it isn't meant to read more than that amount of
content. Thus, in WSGI you cannot call read() with no arguments to
read all available input. Thus in conforming WSGI applications, if
something sent content as chunked, they would pick it up and reject it
with an error about no content length.

So, only applications specifically written to be non conforming WSGI
applications would do a read() with no arguments and be able to read
all input of chunked response. This assumes of course that client has
sent all request content at once and not expecting a response before
sending more. In this latter case, will need to use the setblocking()
feature I mentioned.

>> For non blocking reads, I rather don't like the idea of a special read
>> function. What I have instead thought about for this in the past for
>> mod_wsgi, even though support for chunked and non blocking is outside
>> of WGSI specification, is to have a setblocking() function just like
>> sockets do. Technically one may even be able to implement the more
>> general settimeout() since timeouts can be specified for Apache
>> connections if you do it right.
>>
>
> I thought someone already submitted a patch because read(len) used to return
> < len bytes in the past, and it was considered to be breaking the file-like
> object API.

Never been the case with mod_python that I know of. In early pre 1.0
builds of mod_wsgi I was playing with that at one point for read()
with no arguments, since that was outside of WSGI specification
anyway, but was made to conform when 1.0 came out.

> And if you restrict it to returning either len bytes or a "would block"
> exception, it would not solve the original problem I was intending to solve
> - having to process and respond to < len bytes from the client before the
> client will send any more.

Yes it would. When blocking turned off, the intention would be that
read() with or without arguments would always return only what it
found. The returning of flag to indicate it would block, would only
occur if not data at all available. Thus, disabling blocking would
make it behave like a non blocking socket.

> Perhaps setblocking()/settimeout() is orthogonal to partial read support. In
> some cases you might like to read() in non-blocking mode to retrieve blocks
> of known size (or get an exception), and in some cases you might like to
> read_partial() in non-blocking mode too.

Intention is that non blocking mode would always be partial reads,
just like non blocking socket. If you want known block size you loop
and if subsequent call after partial read would block, then you would
deal with that.

> I think partial read support is useful even if you don't have full
> non-blocking support, and non-blocking could be significantly more work to
> add.

For mod_wsgi at least, don't believe at this point that non blocking
support is that hard. Just that users of mod_wsgi don't like it when I
start doing stuff which is outside of WSGI itself even though it would
make a difference to their application.

>> The only thing is what exception you use to indicate socket would
>> block and/or timeout. Do you reuse socket exception and use it in same
>> way as sockets do for indicating this, or use a new exception.
>
> I would suggest a new exception, as it looks like mod_python code is
> currently completely isolated from any use of sockets or the Python socket
> module by Apache.

Rather than an exception, the other option is to return None instead.
Thus, returning empty string means end of input, and returning None
means would block.

For mod_wsgi at least, not having an exception may be preferable as
WSGI doesn't currently define exception types for when there are input
errors and so no standardisation. Partly this is because there is no
standard 'wsgi' module which could hold the exception types. So,
returning None would make it easier.

Graham

Re: Chunked request content patch

Posted by Alex Cichowski <al...@gmail.com>.
OK, I will investigate the LimitRequestBody problem.

Graham Dumpleton wrote:
> What probably needs to be investigated is whether the default can't
> just be REQUEST_CHUNKED_DECHUNK all the time and code behave just as
> it did before. If this can be done, then all that needs to be solved
> is your ability to do non blocking reads.
>   
I suppose the only problem with making it the default is exposing 
security holes in people's apps that were previously protected against 
by CHUNKED_ERROR, since it's unlikely anyone will rely on getting a 
chunked failure in normal operation. The only thing I can think of is 
something depending on the content length (request.clength), but I would 
guess it's already possible for attackers to pass in requests without 
content lengths by using "Connection: close" or HTTP/1.0. Don't have any 
ideas of what else to investigate.

> For non blocking reads, I rather don't like the idea of a special read
> function. What I have instead thought about for this in the past for
> mod_wsgi, even though support for chunked and non blocking is outside
> of WGSI specification, is to have a setblocking() function just like
> sockets do. Technically one may even be able to implement the more
> general settimeout() since timeouts can be specified for Apache
> connections if you do it right.
>   
I thought someone already submitted a patch because read(len) used to 
return < len bytes in the past, and it was considered to be breaking the 
file-like object API.

And if you restrict it to returning either len bytes or a "would block" 
exception, it would not solve the original problem I was intending to 
solve - having to process and respond to < len bytes from the client 
before the client will send any more.

Perhaps setblocking()/settimeout() is orthogonal to partial read 
support. In some cases you might like to read() in non-blocking mode to 
retrieve blocks of known size (or get an exception), and in some cases 
you might like to read_partial() in non-blocking mode too.

I think partial read support is useful even if you don't have full 
non-blocking support, and non-blocking could be significantly more work 
to add.

> The only thing is what exception you use to indicate socket would
> block and/or timeout. Do you reuse socket exception and use it in same
> way as sockets do for indicating this, or use a new exception.
>   
I would suggest a new exception, as it looks like mod_python code is 
currently completely isolated from any use of sockets or the Python 
socket module by Apache.

Alex


Re: Chunked request content patch

Posted by Graham Dumpleton <gr...@gmail.com>.
Hmmm, replied back to the wrong list, user versus developers, for this.

Anyway, I'll admit I haven't looked at actual patches yet, but some
more comments about some of what is being addressed.

In doing changes to this area of code, especially if you are going to
want to support chunked encoding somehow, you also probably need to
address:

  http://issues.apache.org/jira/browse/MODPYTHON-240

That is, deal with the LimitRequestBody directive properly.

To do this though, you have to put the check for it before you even
call into the user handler code, preferably even before you leave the
C code.

The check for this is:

    limit = ap_get_limit_req_body(r);

    if (limit && limit < r->remaining) {
        ap_discard_request_body(r);
        return OK;
    }

To be able to call this though, you must have first called
ap_setup_client_block() else the required attributes of request_rec
will not have been set yet.

If you aren't going to support chunked encoding that is fine, as just use:

    status = ap_setup_client_block(r, REQUEST_CHUNKED_ERROR);

    if (status != OK)
        return status;

prior to check of request body length.

If you want to support chunked however it gets more complicated. This
is because in your way of doing things it sounds like the policy of
how request content is handled is set by calling attribute of
mod_python request object. At the point where the initialisation would
now need to go, because of request body length check, no user code
would have been called and so no opportunity for policy to be set.

What probably needs to be investigated is whether the default can't
just be REQUEST_CHUNKED_DECHUNK all the time and code behave just as
it did before. If this can be done, then all that needs to be solved
is your ability to do non blocking reads.

For non blocking reads, I rather don't like the idea of a special read
function. What I have instead thought about for this in the past for
mod_wsgi, even though support for chunked and non blocking is outside
of WGSI specification, is to have a setblocking() function just like
sockets do. Technically one may even be able to implement the more
general settimeout() since timeouts can be specified for Apache
connections if you do it right.

The only thing is what exception you use to indicate socket would
block and/or timeout. Do you reuse socket exception and use it in same
way as sockets do for indicating this, or use a new exception.

I might have another think about how to do this for mod_wsgi case as
probably still applicable to mod_python.

Graham

2008/7/17 Graham Dumpleton <gr...@gmail.com>:
> Thanks for adding the patches.
>
> If I ever get a chance, I'll compare your patches against completely
> rewritten equivalent code I have in mod_wsgi and see how they compare
> as far as satisfying the required functionality. Although, the
> mod_wsgi code doesn't attempt to handled chunked as WSGI specification
> doesn't allow it.
>
> Graham
>
> 2008/7/16 Alex Cichowski <al...@gmail.com>:
>> OK, I have added the patch under MODPYTHON-222 (chunked requests) and
>> referenced it under the others.
>>
>> For 234 (read() SystemError), if Matthew Woodcraft is correct about the
>> unchecked call to ap_get_client_block() being the problem, then yes this
>> patch will fix it.
>>
>> The original patch did not address 211 (readlines() leak), but for
>> completeness I have added a rewrite of readlines() that is more rigorous in
>> its error handling and memory management to the patch in JIRA.
>>
>> The revised patch also includes a new function, read_partial(), which
>> attempts to ensure that it has returned all buffered data before blocking,
>> by returning less than the requested amount where necessary. This is helpful
>> to avoid deadlock when you would like to have the server read and respond to
>> the first part of the client's request before the client sends the second
>> part. I don't know whether it is strictly valid to use HTTP in this way or
>> not, but I think it's nice to have this feature available anyway.
>>
>> Alex
>>
>>
>>
>> Graham Dumpleton wrote:
>>>
>>> Can you possibly add the suggested patches against the issues in JIRA,
>>> otherwise they will get lost in the mailing list.
>>>
>>> Does, your suggested patches address:
>>>
>>>  http://issues.apache.org/jira/browse/MODPYTHON-234
>>>
>>>  http://issues.apache.org/jira/browse/MODPYTHON-211
>>>
>>> Graham
>>>
>>
>>
>