You are viewing a plain text version of this content. The canonical link for it is here.

Posted to modperl@perl.apache.org by John Dunlap <jo...@lariat.co> on 2015/03/27 23:44:28 UTC

Large File Download

I know that it's possible(and arguably best practice) to use Apache to
download large files efficiently and quickly, without passing them through
mod_perl. However, the data I need to download from my application is both
dynamically generated and sensitive so I cannot expose it to the internet
for anonymous download via Apache. So, I'm wondering if mod_perl has a
capability similar to the output stream of a java servlet. Specifically, I
want to return bits and pieces of the file at a time over the wire so that
I can avoid loading the entire file into memory prior to sending it to the
browser. Currently, I'm loading the entire file into memory before sending
it and

Is this possible with mod_perl and, if so, how should I go about
implementing it?

-- 
John Dunlap
*CTO | Lariat *

*Direct:*
*john@lariat.co <jo...@lariat.co>*

*Customer Service:*
877.268.6667
support@lariat.co

Re: Large File Download

Posted by Randolf Richardson <ra...@modperl.pl>.

> Randolf Richardson wrote:
> >> I know that it's possible(and arguably best practice) to use Apache to
> >> download large files efficiently and quickly, without passing them through
> >> mod_perl. However, the data I need to download from my application is both
> >> dynamically generated and sensitive so I cannot expose it to the internet
> >> for anonymous download via Apache. So, I'm wondering if mod_perl has a
> >> capability similar to the output stream of a java servlet. Specifically, I
> >> want to return bits and pieces of the file at a time over the wire so that
> >> I can avoid loading the entire file into memory prior to sending it to the
> >> browser. Currently, I'm loading the entire file into memory before sending
> >> it and
> >>
> >> Is this possible with mod_perl and, if so, how should I go about
> >> implementing it?
> > 
> > 	Yes, it is possible -- instead of loading the entire contents of a 
> > file into RAM, just read blocks in a loop and keep sending them until 
> > you reach EoF (End of File).
> > 
> > 	You can also use $r->flush along the way if you like, but as I 
> > understand it this isn't necessary because Apache HTTPd will send the 
> > data as soon as its internal buffers contain enough data.  Of course, 
> > if you can tune your block size in your loop to match Apache's output 
> > buffer size, then that will probably help.  (I don't know much about 
> > the details of Apache's output buffers because I've not read up too 
> > much on them, so I hope my assumptions about this are correct.)
> > 
> > 	One of the added benefits you get from using a loop is that you can 
> > also implement rate limiting if that becomes useful.  You can 
> > certainly also implement access controls as well by cross-checking 
> > the file being sent with whatever internal database queries you'd 
> > normally use to ensure it's okay to send the file first.
> > 
> 
> You can also :
> 1) write the data to a file
> 2) $r->sendfile(...);
> 3) add a cleanup handler, to delete the file when the request has been served.
> See here for details : http://perl.apache.org/docs/2.0/api/Apache2/RequestIO.html#C_sendfile_
> 
> For this to work, there is an Apache configuration directive which must be set to "on". I 
> believe it is called "UseSendFile".
> Essentially what senfile() does, is to delegate the actual reading and sending of the file 
> to Apache httpd and the underlying OS, using code which is specifically optimised for this 
> purpose.  It is much kore efficient than doing this in a read/write loop by yourself, at 
> the cost of having less fine control over the operation.

	Thank you André, this is an excellent solution.

	One comment I'd like to add is that it appears that steps 1 and 3 
can be eliminated if a file is basically being copied (because the 
data is pre-existing in its entirety in a file already), hence 
resulting in even better performance due to even fewer resources and 
CPU cycles being consumed.

Randolf Richardson - randolf@inter-corporate.com
Inter-Corporate Computer & Network Services, Inc.
Beautiful British Columbia, Canada
http://www.inter-corporate.com/

Re: Large File Download

Posted by Dr James Smith <js...@sanger.ac.uk>.

On 28/03/2015 19:54, Issac Goldstand wrote:
> sendfile is much more efficient than that.  At the most basic level,
> sendfile allows a file to be streamed directly from the block device (or
> OS cache) to the network, all in kernel-space (see sendfile(2)).
>
> What you describe below is less effective, since you need to ask the
> kernel to read the data, chunk-by-chunk, send it to userspace, and then
> from userspace back to kernel space to be sent to the net.
>
> Beyond that, the Apache output filter stack is also spending time
> examining your data, possibly buffering it differently than you are (for
> example to make HTTP chunked-encoding) - by using sendfile, you'll be
> bypassing the output filter chain (for the request, at least;
> connection/protocol filters, such as HTTPS encryption will still get in
> the way, but you probably want that to happen :)) further optimizing the
> output.
>
> If you're manipulating data, you need to stream yourself, but if you
> have data on the disk and can serve it as-is, sendfile will almost
> always perform much, much, much better.
>
In the cases I was pointing out (in line with the original request) that 
streaming the
data is more efficient that writing it to disk and then using 
sendfile.... (reading is
more efficient than writing - from experience)

Depends on the cost of producing the file - the end time response for 
the user may
well be less by streaming it out than writing it to disk - and then 
sending it to the
user with sendfile - they will already have most of the file (network 
permitting)
before the last chunk of content is produced....

I won't re-iterate the rest of the issues with memory management that 
are achieved
but also I will also point out that if you ever write a file to disk you 
are putting your
server at risk - either from a security or dos attack point of view - so 
if you can avoid
it - do so!


---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.

Re: Large File Download

Posted by Xinhuan Zheng <xz...@christianbook.com>.

Hello,
I encountered the large file download before. I recalled that when
sendfile directive is turned on httpd.conf, it uses sendfile(2) system
call. It is more efficient than combination of read(2) and write(2) since
sendfile is operating within the kernel.
However, my question is if the large file is served via NFS mount, I am
not sure if sendfile(2) would work just same way as a local file system.
We need to route the file downloading requests through a kind of load
balancer. The keepalive can be defined at load balancer and httpd.conf
file. By default we have keepalive set to 2 minutes on the load balancer,
meaning that the connection between load balancer and backend server needs
to keep alive for 2 minutes for the same client requests. Does it affect
overall downloading performance? Should we turn keepalive off?
- xinhuan

On 3/28/15, 3:54 PM, "Issac Goldstand" <ma...@beamartyr.net> wrote:

>sendfile is much more efficient than that.  At the most basic level,
>sendfile allows a file to be streamed directly from the block device (or
>OS cache) to the network, all in kernel-space (see sendfile(2)).
>
>What you describe below is less effective, since you need to ask the
>kernel to read the data, chunk-by-chunk, send it to userspace, and then
>from userspace back to kernel space to be sent to the net.
>
>Beyond that, the Apache output filter stack is also spending time
>examining your data, possibly buffering it differently than you are (for
>example to make HTTP chunked-encoding) - by using sendfile, you'll be
>bypassing the output filter chain (for the request, at least;
>connection/protocol filters, such as HTTPS encryption will still get in
>the way, but you probably want that to happen :)) further optimizing the
>output.
>
>If you're manipulating data, you need to stream yourself, but if you
>have data on the disk and can serve it as-is, sendfile will almost
>always perform much, much, much better.
>
>  Issac
>
>On 3/28/2015 7:40 PM, Dr James Smith wrote:
>> You can effectively stream a file byte by byte - you just need to print
>> a chunk at a time and mod_perl and apache will handle it
>> appropriately... I do this all the time to handle large data downloads
>> (the systems I manage are backed by peta bytes of data)...
>> 
>> The art is often not in the output - but in the way you get and process
>> data before sending it - I have code that will upload/download arbitrary
>> large files (using HTML5's file objects) without using excessive amounts
>> of memory... (all data is stored in chunks in a MySQL database)
>> 
>> Streaming has other advantages with large data - if you wait till you
>> generate all the data then you will find that you often get a time out -
>> I have a script which can take up to 2 hours to generate all the output
>> - but it never times out as it is sending a line of data at a time....
>> and do data is sent every 5-10 seconds... and the memory footprint is
>> trivial - as only data for one line of output is in memory at a time..
>> 
>> 
>> On 28/03/2015 16:25, John Dunlap wrote:
>>> sendfile sounds like its exactly what I'm looking for. I see it in the
>>> API documentation for Apache2::RequestIO but how do I get a reference
>>> to it from the reference to Apache2::RequestRec which is passed to my
>>> handler?
>>>
>>> On Sat, Mar 28, 2015 at 9:54 AM, Perrin Harkins <pharkins@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>
>>>     Yeah, sendfile() is how I've done this in the past, although I was
>>>     using mod_perl 1.x for it.
>>>
>>>     On Sat, Mar 28, 2015 at 5:55 AM, André Warnier <aw@ice-sa.com
>>>     <ma...@ice-sa.com>> wrote:
>>>
>>>         Randolf Richardson wrote:
>>>
>>>                 I know that it's possible(and arguably best practice)
>>>                 to use Apache to
>>>                 download large files efficiently and quickly, without
>>>                 passing them through
>>>                 mod_perl. However, the data I need to download from my
>>>                 application is both
>>>                 dynamically generated and sensitive so I cannot expose
>>>                 it to the internet
>>>                 for anonymous download via Apache. So, I'm wondering
>>>                 if mod_perl has a
>>>                 capability similar to the output stream of a java
>>>                 servlet. Specifically, I
>>>                 want to return bits and pieces of the file at a time
>>>                 over the wire so that
>>>                 I can avoid loading the entire file into memory prior
>>>                 to sending it to the
>>>                 browser. Currently, I'm loading the entire file into
>>>                 memory before sending
>>>                 it and
>>>
>>>                 Is this possible with mod_perl and, if so, how should
>>>                 I go about
>>>                 implementing it?
>>>
>>>
>>>                     Yes, it is possible -- instead of loading the
>>>             entire contents of a file into RAM, just read blocks in a
>>>             loop and keep sending them until you reach EoF (End of
>>>File).
>>>
>>>                     You can also use $r->flush along the way if you
>>>             like, but as I understand it this isn't necessary because
>>>             Apache HTTPd will send the data as soon as its internal
>>>             buffers contain enough data.  Of course, if you can tune
>>>             your block size in your loop to match Apache's output
>>>             buffer size, then that will probably help.  (I don't know
>>>             much about the details of Apache's output buffers because
>>>             I've not read up too much on them, so I hope my
>>>             assumptions about this are correct.)
>>>
>>>                     One of the added benefits you get from using a
>>>             loop is that you can also implement rate limiting if that
>>>             becomes useful.  You can certainly also implement access
>>>             controls as well by cross-checking the file being sent
>>>             with whatever internal database queries you'd normally use
>>>             to ensure it's okay to send the file first.
>>>
>>>
>>>         You can also :
>>>         1) write the data to a file
>>>         2) $r->sendfile(...);
>>>         3) add a cleanup handler, to delete the file when the request
>>>         has been served.
>>>         See here for details :
>>>         
>>>http://perl.apache.org/docs/2.0/api/Apache2/RequestIO.html#C_sendfile_
>>>
>>>         For this to work, there is an Apache configuration directive
>>>         which must be set to "on". I believe it is called
>>>"UseSendFile".
>>>         Essentially what senfile() does, is to delegate the actual
>>>         reading and sending of the file to Apache httpd and the
>>>         underlying OS, using code which is specifically optimised for
>>>         this purpose.  It is much kore efficient than doing this in a
>>>         read/write loop by yourself, at the cost of having less fine
>>>         control over the operation.
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> John Dunlap
>>> /CTO | Lariat /
>>> /
>>> /
>>> /*Direct:*/
>>> /john@lariat.co <ma...@lariat.co>/
>>> /
>>> *Customer Service:*/
>>> 877.268.6667
>>> support@lariat.co <ma...@lariat.co>
>> 
>> 
>> 
>> ------------------------------------------------------------------------
>> <http://www.avast.com/> 	
>> 
>> This email has been checked for viruses by Avast antivirus software.
>> www.avast.com <http://www.avast.com/>
>> 
>> 
>> 
>> -- The Wellcome Trust Sanger Institute is operated by Genome Research
>> Limited, a charity registered in England with number 1021457 and a
>> company registered in England with number 2742969, whose registered
>> office is 215 Euston Road, London, NW1 2BE.
>

Re: Large File Download

Posted by Issac Goldstand <ma...@beamartyr.net>.

sendfile is much more efficient than that.  At the most basic level,
sendfile allows a file to be streamed directly from the block device (or
OS cache) to the network, all in kernel-space (see sendfile(2)).

What you describe below is less effective, since you need to ask the
kernel to read the data, chunk-by-chunk, send it to userspace, and then
from userspace back to kernel space to be sent to the net.

Beyond that, the Apache output filter stack is also spending time
examining your data, possibly buffering it differently than you are (for
example to make HTTP chunked-encoding) - by using sendfile, you'll be
bypassing the output filter chain (for the request, at least;
connection/protocol filters, such as HTTPS encryption will still get in
the way, but you probably want that to happen :)) further optimizing the
output.

If you're manipulating data, you need to stream yourself, but if you
have data on the disk and can serve it as-is, sendfile will almost
always perform much, much, much better.

  Issac

On 3/28/2015 7:40 PM, Dr James Smith wrote:
> You can effectively stream a file byte by byte - you just need to print
> a chunk at a time and mod_perl and apache will handle it
> appropriately... I do this all the time to handle large data downloads
> (the systems I manage are backed by peta bytes of data)...
> 
> The art is often not in the output - but in the way you get and process
> data before sending it - I have code that will upload/download arbitrary
> large files (using HTML5's file objects) without using excessive amounts
> of memory... (all data is stored in chunks in a MySQL database)
> 
> Streaming has other advantages with large data - if you wait till you
> generate all the data then you will find that you often get a time out -
> I have a script which can take up to 2 hours to generate all the output
> - but it never times out as it is sending a line of data at a time....
> and do data is sent every 5-10 seconds... and the memory footprint is
> trivial - as only data for one line of output is in memory at a time..
> 
> 
> On 28/03/2015 16:25, John Dunlap wrote:
>> sendfile sounds like its exactly what I'm looking for. I see it in the
>> API documentation for Apache2::RequestIO but how do I get a reference
>> to it from the reference to Apache2::RequestRec which is passed to my
>> handler?
>>
>> On Sat, Mar 28, 2015 at 9:54 AM, Perrin Harkins <pharkins@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Yeah, sendfile() is how I've done this in the past, although I was
>>     using mod_perl 1.x for it.
>>
>>     On Sat, Mar 28, 2015 at 5:55 AM, André Warnier <aw@ice-sa.com
>>     <ma...@ice-sa.com>> wrote:
>>
>>         Randolf Richardson wrote:
>>
>>                 I know that it's possible(and arguably best practice)
>>                 to use Apache to
>>                 download large files efficiently and quickly, without
>>                 passing them through
>>                 mod_perl. However, the data I need to download from my
>>                 application is both
>>                 dynamically generated and sensitive so I cannot expose
>>                 it to the internet
>>                 for anonymous download via Apache. So, I'm wondering
>>                 if mod_perl has a
>>                 capability similar to the output stream of a java
>>                 servlet. Specifically, I
>>                 want to return bits and pieces of the file at a time
>>                 over the wire so that
>>                 I can avoid loading the entire file into memory prior
>>                 to sending it to the
>>                 browser. Currently, I'm loading the entire file into
>>                 memory before sending
>>                 it and
>>
>>                 Is this possible with mod_perl and, if so, how should
>>                 I go about
>>                 implementing it?
>>
>>
>>                     Yes, it is possible -- instead of loading the
>>             entire contents of a file into RAM, just read blocks in a
>>             loop and keep sending them until you reach EoF (End of File).
>>
>>                     You can also use $r->flush along the way if you
>>             like, but as I understand it this isn't necessary because
>>             Apache HTTPd will send the data as soon as its internal
>>             buffers contain enough data.  Of course, if you can tune
>>             your block size in your loop to match Apache's output
>>             buffer size, then that will probably help.  (I don't know
>>             much about the details of Apache's output buffers because
>>             I've not read up too much on them, so I hope my
>>             assumptions about this are correct.)
>>
>>                     One of the added benefits you get from using a
>>             loop is that you can also implement rate limiting if that
>>             becomes useful.  You can certainly also implement access
>>             controls as well by cross-checking the file being sent
>>             with whatever internal database queries you'd normally use
>>             to ensure it's okay to send the file first.
>>
>>
>>         You can also :
>>         1) write the data to a file
>>         2) $r->sendfile(...);
>>         3) add a cleanup handler, to delete the file when the request
>>         has been served.
>>         See here for details :
>>         http://perl.apache.org/docs/2.0/api/Apache2/RequestIO.html#C_sendfile_
>>
>>         For this to work, there is an Apache configuration directive
>>         which must be set to "on". I believe it is called "UseSendFile".
>>         Essentially what senfile() does, is to delegate the actual
>>         reading and sending of the file to Apache httpd and the
>>         underlying OS, using code which is specifically optimised for
>>         this purpose.  It is much kore efficient than doing this in a
>>         read/write loop by yourself, at the cost of having less fine
>>         control over the operation.
>>
>>
>>
>>
>>
>> -- 
>> John Dunlap
>> /CTO | Lariat /
>> /
>> /
>> /*Direct:*/
>> /john@lariat.co <ma...@lariat.co>/
>> /
>> *Customer Service:*/
>> 877.268.6667
>> support@lariat.co <ma...@lariat.co>
> 
> 
> 
> ------------------------------------------------------------------------
> <http://www.avast.com/> 	
> 
> This email has been checked for viruses by Avast antivirus software.
> www.avast.com <http://www.avast.com/>
> 
> 
> 
> -- The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.

Re: Large File Download

Posted by Dr James Smith <js...@sanger.ac.uk>.

You can effectively stream a file byte by byte - you just need to print 
a chunk at a time and mod_perl and apache will handle it 
appropriately... I do this all the time to handle large data downloads 
(the systems I manage are backed by peta bytes of data)...

The art is often not in the output - but in the way you get and process 
data before sending it - I have code that will upload/download arbitrary 
large files (using HTML5's file objects) without using excessive amounts 
of memory... (all data is stored in chunks in a MySQL database)

Streaming has other advantages with large data - if you wait till you 
generate all the data then you will find that you often get a time out - 
I have a script which can take up to 2 hours to generate all the output 
- but it never times out as it is sending a line of data at a time.... 
and do data is sent every 5-10 seconds... and the memory footprint is 
trivial - as only data for one line of output is in memory at a time..


On 28/03/2015 16:25, John Dunlap wrote:
> sendfile sounds like its exactly what I'm looking for. I see it in the 
> API documentation for Apache2::RequestIO but how do I get a reference 
> to it from the reference to Apache2::RequestRec which is passed to my 
> handler?
>
> On Sat, Mar 28, 2015 at 9:54 AM, Perrin Harkins <pharkins@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Yeah, sendfile() is how I've done this in the past, although I was
>     using mod_perl 1.x for it.
>
>     On Sat, Mar 28, 2015 at 5:55 AM, André Warnier <aw@ice-sa.com
>     <ma...@ice-sa.com>> wrote:
>
>         Randolf Richardson wrote:
>
>                 I know that it's possible(and arguably best practice)
>                 to use Apache to
>                 download large files efficiently and quickly, without
>                 passing them through
>                 mod_perl. However, the data I need to download from my
>                 application is both
>                 dynamically generated and sensitive so I cannot expose
>                 it to the internet
>                 for anonymous download via Apache. So, I'm wondering
>                 if mod_perl has a
>                 capability similar to the output stream of a java
>                 servlet. Specifically, I
>                 want to return bits and pieces of the file at a time
>                 over the wire so that
>                 I can avoid loading the entire file into memory prior
>                 to sending it to the
>                 browser. Currently, I'm loading the entire file into
>                 memory before sending
>                 it and
>
>                 Is this possible with mod_perl and, if so, how should
>                 I go about
>                 implementing it?
>
>
>                     Yes, it is possible -- instead of loading the
>             entire contents of a file into RAM, just read blocks in a
>             loop and keep sending them until you reach EoF (End of File).
>
>                     You can also use $r->flush along the way if you
>             like, but as I understand it this isn't necessary because
>             Apache HTTPd will send the data as soon as its internal
>             buffers contain enough data.  Of course, if you can tune
>             your block size in your loop to match Apache's output
>             buffer size, then that will probably help.  (I don't know
>             much about the details of Apache's output buffers because
>             I've not read up too much on them, so I hope my
>             assumptions about this are correct.)
>
>                     One of the added benefits you get from using a
>             loop is that you can also implement rate limiting if that
>             becomes useful.  You can certainly also implement access
>             controls as well by cross-checking the file being sent
>             with whatever internal database queries you'd normally use
>             to ensure it's okay to send the file first.
>
>
>         You can also :
>         1) write the data to a file
>         2) $r->sendfile(...);
>         3) add a cleanup handler, to delete the file when the request
>         has been served.
>         See here for details :
>         http://perl.apache.org/docs/2.0/api/Apache2/RequestIO.html#C_sendfile_
>
>         For this to work, there is an Apache configuration directive
>         which must be set to "on". I believe it is called "UseSendFile".
>         Essentially what senfile() does, is to delegate the actual
>         reading and sending of the file to Apache httpd and the
>         underlying OS, using code which is specifically optimised for
>         this purpose. It is much kore efficient than doing this in a
>         read/write loop by yourself, at the cost of having less fine
>         control over the operation.
>
>
>
>
>
> -- 
> John Dunlap
> /CTO | Lariat/
> /
> /
> /*Direct:*/
> /john@lariat.co <ma...@lariat.co>/
> /
> *Customer Service:*/
> 877.268.6667
> support@lariat.co <ma...@lariat.co>



---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.

Re: Large File Download

Posted by John Dunlap <jo...@lariat.co>.

sendfile sounds like its exactly what I'm looking for. I see it in the API
documentation for Apache2::RequestIO but how do I get a reference to it
from the reference to Apache2::RequestRec which is passed to my handler?

On Sat, Mar 28, 2015 at 9:54 AM, Perrin Harkins <ph...@gmail.com> wrote:

> Yeah, sendfile() is how I've done this in the past, although I was using
> mod_perl 1.x for it.
>
> On Sat, Mar 28, 2015 at 5:55 AM, André Warnier <aw...@ice-sa.com> wrote:
>
>> Randolf Richardson wrote:
>>
>>> I know that it's possible(and arguably best practice) to use Apache to
>>>> download large files efficiently and quickly, without passing them
>>>> through
>>>> mod_perl. However, the data I need to download from my application is
>>>> both
>>>> dynamically generated and sensitive so I cannot expose it to the
>>>> internet
>>>> for anonymous download via Apache. So, I'm wondering if mod_perl has a
>>>> capability similar to the output stream of a java servlet.
>>>> Specifically, I
>>>> want to return bits and pieces of the file at a time over the wire so
>>>> that
>>>> I can avoid loading the entire file into memory prior to sending it to
>>>> the
>>>> browser. Currently, I'm loading the entire file into memory before
>>>> sending
>>>> it and
>>>>
>>>> Is this possible with mod_perl and, if so, how should I go about
>>>> implementing it?
>>>>
>>>
>>>         Yes, it is possible -- instead of loading the entire contents of
>>> a file into RAM, just read blocks in a loop and keep sending them until you
>>> reach EoF (End of File).
>>>
>>>         You can also use $r->flush along the way if you like, but as I
>>> understand it this isn't necessary because Apache HTTPd will send the data
>>> as soon as its internal buffers contain enough data.  Of course, if you can
>>> tune your block size in your loop to match Apache's output buffer size,
>>> then that will probably help.  (I don't know much about the details of
>>> Apache's output buffers because I've not read up too much on them, so I
>>> hope my assumptions about this are correct.)
>>>
>>>         One of the added benefits you get from using a loop is that you
>>> can also implement rate limiting if that becomes useful.  You can certainly
>>> also implement access controls as well by cross-checking the file being
>>> sent with whatever internal database queries you'd normally use to ensure
>>> it's okay to send the file first.
>>>
>>>
>> You can also :
>> 1) write the data to a file
>> 2) $r->sendfile(...);
>> 3) add a cleanup handler, to delete the file when the request has been
>> served.
>> See here for details : http://perl.apache.org/docs/2.
>> 0/api/Apache2/RequestIO.html#C_sendfile_
>>
>> For this to work, there is an Apache configuration directive which must
>> be set to "on". I believe it is called "UseSendFile".
>> Essentially what senfile() does, is to delegate the actual reading and
>> sending of the file to Apache httpd and the underlying OS, using code which
>> is specifically optimised for this purpose.  It is much kore efficient than
>> doing this in a read/write loop by yourself, at the cost of having less
>> fine control over the operation.
>>
>
>


-- 
John Dunlap
*CTO | Lariat *

*Direct:*
*john@lariat.co <jo...@lariat.co>*

*Customer Service:*
877.268.6667
support@lariat.co

Re: Large File Download

Posted by Perrin Harkins <ph...@gmail.com>.

Yeah, sendfile() is how I've done this in the past, although I was using
mod_perl 1.x for it.

On Sat, Mar 28, 2015 at 5:55 AM, André Warnier <aw...@ice-sa.com> wrote:

> Randolf Richardson wrote:
>
>> I know that it's possible(and arguably best practice) to use Apache to
>>> download large files efficiently and quickly, without passing them
>>> through
>>> mod_perl. However, the data I need to download from my application is
>>> both
>>> dynamically generated and sensitive so I cannot expose it to the internet
>>> for anonymous download via Apache. So, I'm wondering if mod_perl has a
>>> capability similar to the output stream of a java servlet. Specifically,
>>> I
>>> want to return bits and pieces of the file at a time over the wire so
>>> that
>>> I can avoid loading the entire file into memory prior to sending it to
>>> the
>>> browser. Currently, I'm loading the entire file into memory before
>>> sending
>>> it and
>>>
>>> Is this possible with mod_perl and, if so, how should I go about
>>> implementing it?
>>>
>>
>>         Yes, it is possible -- instead of loading the entire contents of
>> a file into RAM, just read blocks in a loop and keep sending them until you
>> reach EoF (End of File).
>>
>>         You can also use $r->flush along the way if you like, but as I
>> understand it this isn't necessary because Apache HTTPd will send the data
>> as soon as its internal buffers contain enough data.  Of course, if you can
>> tune your block size in your loop to match Apache's output buffer size,
>> then that will probably help.  (I don't know much about the details of
>> Apache's output buffers because I've not read up too much on them, so I
>> hope my assumptions about this are correct.)
>>
>>         One of the added benefits you get from using a loop is that you
>> can also implement rate limiting if that becomes useful.  You can certainly
>> also implement access controls as well by cross-checking the file being
>> sent with whatever internal database queries you'd normally use to ensure
>> it's okay to send the file first.
>>
>>
> You can also :
> 1) write the data to a file
> 2) $r->sendfile(...);
> 3) add a cleanup handler, to delete the file when the request has been
> served.
> See here for details : http://perl.apache.org/docs/2.
> 0/api/Apache2/RequestIO.html#C_sendfile_
>
> For this to work, there is an Apache configuration directive which must be
> set to "on". I believe it is called "UseSendFile".
> Essentially what senfile() does, is to delegate the actual reading and
> sending of the file to Apache httpd and the underlying OS, using code which
> is specifically optimised for this purpose.  It is much kore efficient than
> doing this in a read/write loop by yourself, at the cost of having less
> fine control over the operation.
>

Re: Large File Download

Posted by André Warnier <aw...@ice-sa.com>.

Randolf Richardson wrote:
>> I know that it's possible(and arguably best practice) to use Apache to
>> download large files efficiently and quickly, without passing them through
>> mod_perl. However, the data I need to download from my application is both
>> dynamically generated and sensitive so I cannot expose it to the internet
>> for anonymous download via Apache. So, I'm wondering if mod_perl has a
>> capability similar to the output stream of a java servlet. Specifically, I
>> want to return bits and pieces of the file at a time over the wire so that
>> I can avoid loading the entire file into memory prior to sending it to the
>> browser. Currently, I'm loading the entire file into memory before sending
>> it and
>>
>> Is this possible with mod_perl and, if so, how should I go about
>> implementing it?
> 
> 	Yes, it is possible -- instead of loading the entire contents of a 
> file into RAM, just read blocks in a loop and keep sending them until 
> you reach EoF (End of File).
> 
> 	You can also use $r->flush along the way if you like, but as I 
> understand it this isn't necessary because Apache HTTPd will send the 
> data as soon as its internal buffers contain enough data.  Of course, 
> if you can tune your block size in your loop to match Apache's output 
> buffer size, then that will probably help.  (I don't know much about 
> the details of Apache's output buffers because I've not read up too 
> much on them, so I hope my assumptions about this are correct.)
> 
> 	One of the added benefits you get from using a loop is that you can 
> also implement rate limiting if that becomes useful.  You can 
> certainly also implement access controls as well by cross-checking 
> the file being sent with whatever internal database queries you'd 
> normally use to ensure it's okay to send the file first.
> 

You can also :
1) write the data to a file
2) $r->sendfile(...);
3) add a cleanup handler, to delete the file when the request has been served.
See here for details : http://perl.apache.org/docs/2.0/api/Apache2/RequestIO.html#C_sendfile_

For this to work, there is an Apache configuration directive which must be set to "on". I 
believe it is called "UseSendFile".
Essentially what senfile() does, is to delegate the actual reading and sending of the file 
to Apache httpd and the underlying OS, using code which is specifically optimised for this 
purpose.  It is much kore efficient than doing this in a read/write loop by yourself, at 
the cost of having less fine control over the operation.

Re: Large File Download

Posted by Randolf Richardson <ra...@modperl.pl>.

> I know that it's possible(and arguably best practice) to use Apache to
> download large files efficiently and quickly, without passing them through
> mod_perl. However, the data I need to download from my application is both
> dynamically generated and sensitive so I cannot expose it to the internet
> for anonymous download via Apache. So, I'm wondering if mod_perl has a
> capability similar to the output stream of a java servlet. Specifically, I
> want to return bits and pieces of the file at a time over the wire so that
> I can avoid loading the entire file into memory prior to sending it to the
> browser. Currently, I'm loading the entire file into memory before sending
> it and
> 
> Is this possible with mod_perl and, if so, how should I go about
> implementing it?

	Yes, it is possible -- instead of loading the entire contents of a 
file into RAM, just read blocks in a loop and keep sending them until 
you reach EoF (End of File).

	You can also use $r->flush along the way if you like, but as I 
understand it this isn't necessary because Apache HTTPd will send the 
data as soon as its internal buffers contain enough data.  Of course, 
if you can tune your block size in your loop to match Apache's output 
buffer size, then that will probably help.  (I don't know much about 
the details of Apache's output buffers because I've not read up too 
much on them, so I hope my assumptions about this are correct.)

	One of the added benefits you get from using a loop is that you can 
also implement rate limiting if that becomes useful.  You can 
certainly also implement access controls as well by cross-checking 
the file being sent with whatever internal database queries you'd 
normally use to ensure it's okay to send the file first.

> -- 
> John Dunlap
> *CTO | Lariat *
> 
> *Direct:*
> *john@lariat.co <jo...@lariat.co>*
> 
> *Customer Service:*
> 877.268.6667
> support@lariat.co
> 


Randolf Richardson - randolf@inter-corporate.com
Inter-Corporate Computer & Network Services, Inc.
Beautiful British Columbia, Canada
http://www.inter-corporate.com/