You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by André Warnier <aw...@ice-sa.com> on 2012/02/08 10:14:35 UTC

Interrupting a POST with file upload

This refers to and follows another thread originally entitled "mod perl installed but not 
running", started by Mike Cardeiro.

It seemed better to start a new thread with a subject more to the point of this issue.

Perrin Harkins wrote:
 > On Tue, Feb 7, 2012 at 7:26 PM, André Warnier <aw...@ice-sa.com> wrote:
 >> You can also look at $CGI::POST_MAX in the same documentation.
 >
 > See also LimitRequestBody:
 > http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestbody
 >

As long as we have an expert..

What Mike wants to do (and me too), is to limit the size of a file that a specific user is 
uploading via a POST, in real-time and depending on a limit variable on a per-user, 
per-POST manner.
And he wants to do this in such a way as to interrupt the POST itself, while it is taking 
place (aka if possible while the browser is still sending data to the server), to avoid a 
waste of time and bandwidth when a user is exceeding his quota e.g.

As far as I know, LimitRequestBody is an absolute POST size limit set once and for all in 
the server config, and valid for all POSTs (and PUTs) after server restart. And it is 
calculated on the base of the real bytes being sent by the browser, this including the 
overhead caused by Base64 encoding the content of a file sent for example.
(So that if you set the limit to 1MB, this will actually kick in as soon as the net 
unencoded size of the file being uploaded exceeds 660KB or so.)

Then there is the $CGI_POST_MAX, which may very well be the same server value being 
manipulated by the CGI module, or it may be a private copy by CGI.pm.  What is not really 
clear is if that value is "thread-safe" in all scenarios.

In the normal scenario, when retrieving the uploaded file's handle via the CGI.pm call to 
param(file_input_name) or upload(file_input_name), what one actually gets is a handle onto 
a local temporary file, into which Apache/CGI.pm has already stored the whole content of 
the uploaded file.  By that time, the original file upload from the browser has already 
happened, so doing something at this point would be too late to interrupt the browser POST 
itself (and the bandwidth and time have already been spent).

On the other hand, the CGI.pm documentation seems to say that if one uses the "hook" 
functionality for a file upload, then Apache/CGI.pm do not use a temporary file, and one 
gets a handle directly into the POST body content (so to speak), as it is being received 
by Apache.  And thus this could be a way to achieve what Mike wants.
(I suppose that we can assume that even though we get a handle into the POST body content, 
what we are reading is the decoded data, right ?).

Now the question is, are my above interpretations correct ?




Re: Interrupting a POST with file upload

Posted by André Warnier <aw...@ice-sa.com>.
Torsten Förtsch wrote:
> On Wednesday, 08 February 2012 10:14:35 André Warnier wrote:
>> As far as I know, LimitRequestBody is an absolute POST size limit set once
>> and for all in  the server config, and valid for all POSTs (and PUTs) after
>> server restart.
> 
> If you look at the docs you'll find that LimitRequestBody is valid in "server 
> config, virtual host, directory and .htaccess" contexts. That means you can 
> modify it on a per-request basis via $r->add_config. So, assuming 
> authentication takes place in httpd's authentication phase you can set the 
> limit in a PerlFixupHandler per user.
> 
>> And it is calculated on the base of the real bytes being
>> sent by the browser, this including the overhead caused by Base64 encoding
>> the content of a file sent for example. (So that if you set the limit to
>> 1MB, this will actually kick in as soon as the net unencoded size of the
>> file being uploaded exceeds 660KB or so.)
> 
> True. But with HTTP/1.1 the client can also choose to send the body deflated. 
> Thus, the actual file size may also exceed 1MB.
> 
>> Then there is the $CGI_POST_MAX, which may very well be the same server
>> value being  manipulated by the CGI module, or it may be a private copy by
>> CGI.pm.  What is not really clear is if that value is "thread-safe" in all
>> scenarios.
> 
> CGI.pm is pure perl. So, to make $CGI_POST_MAX shared among threads it has to 
> declare it as such. I doubt that any sane developer would do that.
> 
>> In the normal scenario, when retrieving the uploaded file's handle via the
>> CGI.pm call to  param(file_input_name) or upload(file_input_name), what one
>> actually gets is a handle onto a local temporary file, into which
>> Apache/CGI.pm has already stored the whole content of the uploaded
>> file.  By that time, the original file upload from the browser has already
>> happened, so doing something at this point would be too late to interrupt
>> the browser POST itself (and the bandwidth and time have already been
>> spent).
> 
> True.
> 
>> On the other hand, the CGI.pm documentation seems to say that if one uses
>> the "hook"  functionality for a file upload, then Apache/CGI.pm do not use
>> a temporary file, and one gets a handle directly into the POST body content
>> (so to speak), as it is being received by Apache.  And thus this could be a
>> way to achieve what Mike wants.
> 
> yes and no. It depends upon what exactly you want to limit. On the internet 
> data is buffered by routers, firewalls etc. On your server it is buffered by 
> the kernel. Httpd adds it's own buffering. HTTP is TCP-based. So, there may be 
> retransmits that you won't notice. You certainly may abort the transfer when 
> the CGI.pm hook has received a certain amount of data. But that would not mean 
> that your server or your organization has not yet received the whole body.
> 
> So, if you want to limit the disk usage then yes, you can simply stop writing 
> when the limit is reached. If you want to limit the amount of data your server 
> receives then no.
> 
> Best would be if you could make an educated guess based on the Content-Length 
> request header if the uploaded file will exceed the limit. Most clients send 
> an "Expect: 100-continue" header and thus give the server a chance to decline 
> the request *before* the body is sent. If the body is already on the way the 
> only thing you can do is to close the connection. I don't know if httpd does 
> that immediately or if it reads and discards the whole body.
> 
>> (I suppose that we can assume that even
>> though we get a handle into the POST body content, what we are reading is
>> the decoded data, right ?).
> 
> The code below is the relevant piece of CGI.pm. So, yes, the upload hook gets 
> the data as it is written to the temp file.
> 
>   while (defined($data = $buffer->read)) {
>     if (defined $self->{'.upload_hook'}) {
>       $totalbytes += length($data);
>       &{$self->{'.upload_hook'}}($filename ,$data, $totalbytes,
>                                  $self->{'.upload_data'});
>     }
>     print $filehandle $data if ($self->{'use_tempfile'});
>   }
> 
> Torsten Förtsch
> 

Many, many thanks Torsten. This is all precious and usable information.


Re: Interrupting a POST with file upload

Posted by André Warnier <aw...@ice-sa.com>.
Joe Schaefer wrote:
> I don't think people groked my point very well.  When you POST
> via HTTP/1.1, httpd will send a "Continue: 100" header before it
> starts doing blocking reads on the client socket (any attempts to
> read from the client will trigger this behavior). If you really
> want to interrupt an upload, the time to do it is *before* httpd
> sends that header.  Afterwards httpd commits to reading the entire
> request in *before it lets you send a response* in order to maintain
> protocol compliance.

So basically, it means that there is no way to stop an upload, once the browser has 
started to send the file, right ?

The best you could do, is before you start reading, check if there was a Content-length 
header in the request, and if there was, check if the size it says is lower/equal to what 
you are prepared to accept (aka what this user is still allowed to upload e.g.).
- if it is ok, then start reading
- if it is not ok, send an error response to block the POST, before you read
(preferably a nice one, to let the user know why it does not work)

All this depends on 2 things :
- that there was a Content-length header in the request (which is not necessarily the case 
if "chunked" encoding is allowed on the part of a client)
- that the fact of retrieving the request header does not automatically cause the reading 
and parsing of the whole request body

All the above just guessing, and awaiting confirmation...

(And all the above, and the rest of this discussion, assuming that the POST can be large 
enough that it all matters (like uploading a multi-MB file e.g.))

> 
> 
> For reasons that escape me it doesn't look like mod_perl exposes
> r->remaining, which is the thing to check when looking at the
> pending number of bytes the client wants to send.  If I'm not wrong
> that should be easy enough for us to address. apreq won't read
> anything in in this situation tho, so you're good on that front.
> CGI.pm I'd bet doesn't try to read either if the pending data
> is too big, but I haven't looked at that codebase in a long time.
> 
> 
> ----- Original Message -----
>> From: Vincent Veyron <vv...@wanadoo.fr>
>> To: mike cardeiro <mc...@yahoo.com>
>> Cc: Torsten Förtsch <to...@gmx.net>; "modperl@perl.apache.org" <mo...@perl.apache.org>
>> Sent: Wednesday, February 8, 2012 4:24 PM
>> Subject: Re: Interrupting a POST with file upload
>>
>> Le mercredi 08 février 2012 à 05:53 -0800, mike cardeiro a écrit :
>>>   This is a fantastic list!
>> Agreed.
>>
>> On the same note : I was recently presenting the legal case management
>> app in my sig to an institutional client in the south of France, and the
>> IT guy said that it had a 'fantastic architecture' (I assume he was
>> talking about mod_perl).
>>
>> -- 
>> Vincent Veyron
>> http://marica.fr/
>> Logiciel de gestion des sinistres et des contentieux pour le service juridique
>>
> 


Re: Interrupting a POST with file upload

Posted by Joe Schaefer <jo...@yahoo.com>.
Sorry slight clarification here after rereading httpd source:

If you send anything other than that "Continue: 100" interim
response to the client, httpd will NOT attempt to read the
body, considering it empty.  But even if you do send the
"Continue: 100", httpd will NOT block the response from being
sent until the body has been exhausted.  Instead it will
send your response as expected, and then finalize the request by
calling ap_discard_request_body(r) which WILL exhaust the
POST data in order to retain protocol compliance.  OTOH I
have no idea how browsers deal with the timing issues here,
so buyer beware.  My point still stands: interrupt the POST
prior to sending the Continue, not afterwards.



----- Original Message -----
> From: Joe Schaefer <jo...@yahoo.com>
> To: Vincent Veyron <vv...@wanadoo.fr>; mike cardeiro <mc...@yahoo.com>
> Cc: Torsten Förtsch <to...@gmx.net>; "modperl@perl.apache.org" <mo...@perl.apache.org>
> Sent: Wednesday, February 8, 2012 4:35 PM
> Subject: Re: Interrupting a POST with file upload
> 
> I don't think people groked my point very well.  When you POST
> via HTTP/1.1, httpd will send a "Continue: 100" header before it
> starts doing blocking reads on the client socket (any attempts to
> read from the client will trigger this behavior). If you really
> want to interrupt an upload, the time to do it is *before* httpd
> sends that header.  Afterwards httpd commits to reading the entire
> request in *before it lets you send a response* in order to maintain
> protocol compliance.
> 
> 
> For reasons that escape me it doesn't look like mod_perl exposes
> r->remaining, which is the thing to check when looking at the
> pending number of bytes the client wants to send.  If I'm not wrong
> that should be easy enough for us to address. apreq won't read
> anything in in this situation tho, so you're good on that front.
> CGI.pm I'd bet doesn't try to read either if the pending data
> is too big, but I haven't looked at that codebase in a long time.
> 
> 
> ----- Original Message -----
>>  From: Vincent Veyron <vv...@wanadoo.fr>
>>  To: mike cardeiro <mc...@yahoo.com>
>>  Cc: Torsten Förtsch <to...@gmx.net>; 
> "modperl@perl.apache.org" <mo...@perl.apache.org>
>>  Sent: Wednesday, February 8, 2012 4:24 PM
>>  Subject: Re: Interrupting a POST with file upload
>> 
>>  Le mercredi 08 février 2012 à 05:53 -0800, mike cardeiro a écrit :
>>>    This is a fantastic list!
>> 
>>  Agreed.
>> 
>>  On the same note : I was recently presenting the legal case management
>>  app in my sig to an institutional client in the south of France, and the
>>  IT guy said that it had a 'fantastic architecture' (I assume he was
>>  talking about mod_perl).
>> 
>>  -- 
>>  Vincent Veyron
>>  http://marica.fr/
>>  Logiciel de gestion des sinistres et des contentieux pour le service 
> juridique
>> 
> 

Re: Interrupting a POST with file upload

Posted by Joe Schaefer <jo...@yahoo.com>.
I don't think people groked my point very well.  When you POST
via HTTP/1.1, httpd will send a "Continue: 100" header before it
starts doing blocking reads on the client socket (any attempts to
read from the client will trigger this behavior). If you really
want to interrupt an upload, the time to do it is *before* httpd
sends that header.  Afterwards httpd commits to reading the entire
request in *before it lets you send a response* in order to maintain
protocol compliance.


For reasons that escape me it doesn't look like mod_perl exposes
r->remaining, which is the thing to check when looking at the
pending number of bytes the client wants to send.  If I'm not wrong
that should be easy enough for us to address. apreq won't read
anything in in this situation tho, so you're good on that front.
CGI.pm I'd bet doesn't try to read either if the pending data
is too big, but I haven't looked at that codebase in a long time.


----- Original Message -----
> From: Vincent Veyron <vv...@wanadoo.fr>
> To: mike cardeiro <mc...@yahoo.com>
> Cc: Torsten Förtsch <to...@gmx.net>; "modperl@perl.apache.org" <mo...@perl.apache.org>
> Sent: Wednesday, February 8, 2012 4:24 PM
> Subject: Re: Interrupting a POST with file upload
> 
> Le mercredi 08 février 2012 à 05:53 -0800, mike cardeiro a écrit :
>>   This is a fantastic list!
> 
> Agreed.
> 
> On the same note : I was recently presenting the legal case management
> app in my sig to an institutional client in the south of France, and the
> IT guy said that it had a 'fantastic architecture' (I assume he was
> talking about mod_perl).
> 
> -- 
> Vincent Veyron
> http://marica.fr/
> Logiciel de gestion des sinistres et des contentieux pour le service juridique
> 

Re: Interrupting a POST with file upload

Posted by Vincent Veyron <vv...@wanadoo.fr>.
Le mercredi 08 février 2012 à 05:53 -0800, mike cardeiro a écrit :
>  This is a fantastic list!

Agreed.

On the same note : I was recently presenting the legal case management
app in my sig to an institutional client in the south of France, and the
IT guy said that it had a 'fantastic architecture' (I assume he was
talking about mod_perl).

-- 
Vincent Veyron
http://marica.fr/
Logiciel de gestion des sinistres et des contentieux pour le service juridique


Re: Interrupting a POST with file upload

Posted by mike cardeiro <mc...@yahoo.com>.
> From: Torsten Förtsch <to...@gmx.net>

>
> Best would be if you could make an educated guess based on the Content-Length 
> request header if the uploaded file will exceed the limit. Most clients send 
> an "Expect: 100-continue" header and thus give the server a chance to 
> decline 
> the request *before* the body is sent. If the body is already on the way the 
> only thing you can do is to close the connection. I don't know if httpd does 
> 
> that immediately or if it reads and discards the whole body.

> The code below is the relevant piece of CGI.pm. So, yes, the upload hook gets 
> the data as it is written to the temp file.
> 
>   while (defined($data = $buffer->read)) {
>     if (defined $self->{'.upload_hook'}) {
>       $totalbytes += length($data);
>       &{$self->{'.upload_hook'}}($filename ,$data, $totalbytes,
>                                  $self->{'.upload_data'});
>     }
>     print $filehandle $data if ($self->{'use_tempfile'});
>   }
> 


thanks for all the knowledge Torsten (and everyone else).  This stuff is the holy grail I have been looking for for years.  This is a fantastic list!


Mike Cardeiro


Re: Interrupting a POST with file upload

Posted by Torsten Förtsch <to...@gmx.net>.
On Wednesday, 08 February 2012 10:14:35 André Warnier wrote:
> As far as I know, LimitRequestBody is an absolute POST size limit set once
> and for all in  the server config, and valid for all POSTs (and PUTs) after
> server restart.

If you look at the docs you'll find that LimitRequestBody is valid in "server 
config, virtual host, directory and .htaccess" contexts. That means you can 
modify it on a per-request basis via $r->add_config. So, assuming 
authentication takes place in httpd's authentication phase you can set the 
limit in a PerlFixupHandler per user.

> And it is calculated on the base of the real bytes being
> sent by the browser, this including the overhead caused by Base64 encoding
> the content of a file sent for example. (So that if you set the limit to
> 1MB, this will actually kick in as soon as the net unencoded size of the
> file being uploaded exceeds 660KB or so.)

True. But with HTTP/1.1 the client can also choose to send the body deflated. 
Thus, the actual file size may also exceed 1MB.

> Then there is the $CGI_POST_MAX, which may very well be the same server
> value being  manipulated by the CGI module, or it may be a private copy by
> CGI.pm.  What is not really clear is if that value is "thread-safe" in all
> scenarios.

CGI.pm is pure perl. So, to make $CGI_POST_MAX shared among threads it has to 
declare it as such. I doubt that any sane developer would do that.

> In the normal scenario, when retrieving the uploaded file's handle via the
> CGI.pm call to  param(file_input_name) or upload(file_input_name), what one
> actually gets is a handle onto a local temporary file, into which
> Apache/CGI.pm has already stored the whole content of the uploaded
> file.  By that time, the original file upload from the browser has already
> happened, so doing something at this point would be too late to interrupt
> the browser POST itself (and the bandwidth and time have already been
> spent).

True.

> On the other hand, the CGI.pm documentation seems to say that if one uses
> the "hook"  functionality for a file upload, then Apache/CGI.pm do not use
> a temporary file, and one gets a handle directly into the POST body content
> (so to speak), as it is being received by Apache.  And thus this could be a
> way to achieve what Mike wants.

yes and no. It depends upon what exactly you want to limit. On the internet 
data is buffered by routers, firewalls etc. On your server it is buffered by 
the kernel. Httpd adds it's own buffering. HTTP is TCP-based. So, there may be 
retransmits that you won't notice. You certainly may abort the transfer when 
the CGI.pm hook has received a certain amount of data. But that would not mean 
that your server or your organization has not yet received the whole body.

So, if you want to limit the disk usage then yes, you can simply stop writing 
when the limit is reached. If you want to limit the amount of data your server 
receives then no.

Best would be if you could make an educated guess based on the Content-Length 
request header if the uploaded file will exceed the limit. Most clients send 
an "Expect: 100-continue" header and thus give the server a chance to decline 
the request *before* the body is sent. If the body is already on the way the 
only thing you can do is to close the connection. I don't know if httpd does 
that immediately or if it reads and discards the whole body.

> (I suppose that we can assume that even
> though we get a handle into the POST body content, what we are reading is
> the decoded data, right ?).

The code below is the relevant piece of CGI.pm. So, yes, the upload hook gets 
the data as it is written to the temp file.

  while (defined($data = $buffer->read)) {
    if (defined $self->{'.upload_hook'}) {
      $totalbytes += length($data);
      &{$self->{'.upload_hook'}}($filename ,$data, $totalbytes,
                                 $self->{'.upload_data'});
    }
    print $filehandle $data if ($self->{'use_tempfile'});
  }

Torsten Förtsch

-- 
Need professional modperl support? Hire me! (http://foertsch.name)

Like fantasy? http://kabatinte.net


Re: Interrupting a POST with file upload

Posted by Joe Schaefer <jo...@yahoo.com>.
You probably don't want to do this with a hook if you can
avoid it.  The reason is that once httpd sends the 100 Continue
it will read the entire upload, even after CGI.pm or apreq
has stopped parsing it.



----- Original Message -----
> From: André Warnier <aw...@ice-sa.com>
> To: mod_perl list <mo...@perl.apache.org>
> Cc: 
> Sent: Wednesday, February 8, 2012 4:14 AM
> Subject: Interrupting a POST with file upload
> 
>T his refers to and follows another thread originally entitled "mod perl 
> installed but not running", started by Mike Cardeiro.
> 
> It seemed better to start a new thread with a subject more to the point of this 
> issue.
> 
> Perrin Harkins wrote:
>>  On Tue, Feb 7, 2012 at 7:26 PM, André Warnier <aw...@ice-sa.com> wrote:
>>>  You can also look at $CGI::POST_MAX in the same documentation.
>> 
>>  See also LimitRequestBody:
>>  http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestbody
>> 
> 
> As long as we have an expert..
> 
> What Mike wants to do (and me too), is to limit the size of a file that a 
> specific user is uploading via a POST, in real-time and depending on a limit 
> variable on a per-user, per-POST manner.
> And he wants to do this in such a way as to interrupt the POST itself, while it 
> is taking place (aka if possible while the browser is still sending data to the 
> server), to avoid a waste of time and bandwidth when a user is exceeding his 
> quota e.g.
> 
> As far as I know, LimitRequestBody is an absolute POST size limit set once and 
> for all in the server config, and valid for all POSTs (and PUTs) after server 
> restart. And it is calculated on the base of the real bytes being sent by the 
> browser, this including the overhead caused by Base64 encoding the content of a 
> file sent for example.
> (So that if you set the limit to 1MB, this will actually kick in as soon as the 
> net unencoded size of the file being uploaded exceeds 660KB or so.)
> 
> Then there is the $CGI_POST_MAX, which may very well be the same server value 
> being manipulated by the CGI module, or it may be a private copy by CGI.pm.  
> What is not really clear is if that value is "thread-safe" in all 
> scenarios.
> 
> In the normal scenario, when retrieving the uploaded file's handle via the 
> CGI.pm call to param(file_input_name) or upload(file_input_name), what one 
> actually gets is a handle onto a local temporary file, into which Apache/CGI.pm 
> has already stored the whole content of the uploaded file.  By that time, the 
> original file upload from the browser has already happened, so doing something 
> at this point would be too late to interrupt the browser POST itself (and the 
> bandwidth and time have already been spent).
> 
> On the other hand, the CGI.pm documentation seems to say that if one uses the 
> "hook" functionality for a file upload, then Apache/CGI.pm do not use 
> a temporary file, and one gets a handle directly into the POST body content (so 
> to speak), as it is being received by Apache.  And thus this could be a way to 
> achieve what Mike wants.
> (I suppose that we can assume that even though we get a handle into the POST 
> body content, what we are reading is the decoded data, right ?).
> 
> Now the question is, are my above interpretations correct ?
>