You are viewing a plain text version of this content. The canonical link for it is here.

Posted to modperl@perl.apache.org by John Dunlap <Jo...@lariat.co> on 2019/08/16 17:25:04 UTC

Streaming files

In Java servlets, I can stream a file back to the browser one chunk at a
time. This has 2 benefits which interest me.
1) Files can be stored outside the web root so users cannot download them
unless they are logged in, even if they know the path.
2) Large files can be streamed back to the client without having the entire
file loaded into memory at the same time

How would you recommend achieving similar functionality in mod_perl?

-- 
John Dunlap
*CTO | Lariat *

*Direct:*
*john@lariat.co <jo...@lariat.co>*

*Customer Service:*
877.268.6667
support@lariat.co

Re: Streaming files

Posted by Mithun Bhattacharya <mi...@gmail.com>.

1. Streaming in chunks has nothing to do with security - you should have
proper authentication and authorization layer wrapping these requests.
2. Handling chunks and large files are things which apache has perfected
and we usually let apache do the nitty gritty.
     After all the validations are complete I just do return
$r->lookup_file($file)->run(); - lookup_file is provided by
Apache2::SubRequest

On Fri, Aug 16, 2019 at 1:37 PM Dan Ullom <da...@gmail.com> wrote:

> You could put the files behind Nginx and use your mod_perl web app as an
> external authentication service. I've never done it but it doesn't look too
> hard.
>
> On Fri, Aug 16, 2019, 12:25 PM John Dunlap <Jo...@lariat.co> wrote:
>
>> In Java servlets, I can stream a file back to the browser one chunk at a
>> time. This has 2 benefits which interest me.
>> 1) Files can be stored outside the web root so users cannot download them
>> unless they are logged in, even if they know the path.
>> 2) Large files can be streamed back to the client without having the
>> entire file loaded into memory at the same time
>>
>> How would you recommend achieving similar functionality in mod_perl?
>>
>> --
>> John Dunlap
>> *CTO | Lariat *
>>
>> *Direct:*
>> *john@lariat.co <jo...@lariat.co>*
>>
>> *Customer Service:*
>> 877.268.6667
>> support@lariat.co
>>
>

Re: Streaming files

Posted by Dan Ullom <da...@gmail.com>.

You could put the files behind Nginx and use your mod_perl web app as an
external authentication service. I've never done it but it doesn't look too
hard.

On Fri, Aug 16, 2019, 12:25 PM John Dunlap <Jo...@lariat.co> wrote:

> In Java servlets, I can stream a file back to the browser one chunk at a
> time. This has 2 benefits which interest me.
> 1) Files can be stored outside the web root so users cannot download them
> unless they are logged in, even if they know the path.
> 2) Large files can be streamed back to the client without having the
> entire file loaded into memory at the same time
>
> How would you recommend achieving similar functionality in mod_perl?
>
> --
> John Dunlap
> *CTO | Lariat *
>
> *Direct:*
> *john@lariat.co <jo...@lariat.co>*
>
> *Customer Service:*
> 877.268.6667
> support@lariat.co
>

Re: Storing images in PostgreSQL (was: Streaming files)

Posted by Thorsten Schöning <ts...@am-soft.de>.

Guten Tag Randolf Richardson,
am Dienstag, 20. August 2019 um 19:06 schrieben Sie:

>         One of the great things about Perl is that there is more than one
> way to do things, and this flows through to how to handle the storing 
> and retrieving of images.  (Everyone else is welcome to share their 
> perspectives and experiences on this in replies to this message.)

For very similar reasons like you already mentioned we are even
storing user generated files with sometimes hundreds of MiB each in
Postgres currently. Most files are far smaller, though.  Works, but
has cons as well, especially things like too large dumps in the long
term, increased I/O getting things into postgres etc.

Especially properly using RAII-pattern and new file systems like
BTRFS/ZFS in my opinion are the better way to go today/in future. RAII
helps managing things cleanly, snapshots+ACLs can be used to protect
data, both support checksumming for data integrity and replication is
somewhat easily implemented using sending/receiving snapshots as well.
ZFS and ZREP is already used to mirror servers in our setup and works
reliably.

Nevertheless, your mail in my opinion is a good collection of
arguments which I could have written myself as well. :-)

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Storing images in PostgreSQL (was: Streaming files)

Posted by Randolf Richardson <ra...@modperl.pl>.

	In answer to your question, I recognize that there are many pros and 
cons to storing images in PostgreSQL.  I'm providing a fuller answer 
here and including it in the ModPerl mailing list in the hopes that 
it will be helpful in some way to others who are considering this...

	While we have sites that don't store images in a database, we're 
finding it's a lot more convenient to do so, and not only for backups 
as the advantages include:

		1. transactional operations all the way through

		2. easier clean-up cascading due to referential integrity

		3. simpler multi-server replication

		4. faster serving of individual images where directories with very 
large numbers of files can take longer to seek for than an 
efficiently-indexed table (Unix and Linux both seem to generally be 
very efficient in this regard in most instances, and Novell NetWare 
was too when we hosted web sites with it in the 1990s, but many other 
Operating Systems are not very good at this)

		5. code that's easier to maintain because we don't have to have to 
deal with data outside of the database separately

	We also run a number of projects that use a mixture of large images 
in the filesystem and small images in the database (e.g., icons and 
user avatars), but find this hybrid approach to be more of a hassle 
so we're gradually moving away to file-system or fully-database.

	Note:  We'll still store general site design images in the file 
system for the web page's logo and other graphics that make the site 
user-friendly for the end-user.  For database storage, our primary 
focus is on storing user-supplied uploaded images in the database.

	From an access control standpoint, since we need to query the 
database from a Perl script anyway (e.g., to determine access 
control, expiry policies, etc.) before serving an image that is 
stored outside of the web site's DocumentRoot, having the image data, 
length (calculated), and mime type in the ResultSet works very well 
for generating the needed HTTP headers.

	From a security standpoint, if an image disappears from the file 
system (there are many reasons this can occur), or someone 
accidentally runs ransomeware that encrypts some or all of the 
images, then our dataset therefore becomes unreliable.  Instead of 
writing a set of scripts to check that no image files are missing and 
administering that, I find it a lot easier to just store the image 
data in the database and not have to worry about this aspect at all.

	In our experience, PostgreSQL actually handles this very well, and 
the strategy we use is to store images in separate tables that were 
specifically created for storing one image per row along with other 
information, including original filename (if the project calls for 
it), dimensions (e.g., width and height), mime type, etc., along with 
arbitrary additional data such as creation timestampt, user ID (of 
uploader), title, description (which I know we can do anyway with a 
reference to a file), and even a view count (which can be maintained 
by an SQL Trigger).  Storing the dimensions is also advantageous for 
dynamic HTML generation, by the way, since it means that we can 
generate HTML "img" tags with the width, height, and alt values.

	We take this a step further though by also storing a smaller version 
of the image (a.k.a., a thumbnail) that we serve with a different 
script, or with an additional character (or keyword) in the filename 
so that our script knows which image to send to the client.  Both the 
full-size and smaller images exist in the same row in separate 
columns, which is also very convenient because then we have only one 
row to reference for both the full-size and thumbnail image.

	One of the great things about Perl is that there is more than one 
way to do things, and this flows through to how to handle the storing 
and retrieving of images.  (Everyone else is welcome to share their 
perspectives and experiences on this in replies to this message.)

> Why would you store images in a database?
> 
> 
> > On 18 Aug 2019, at 23:28, Randolf Richardson <ra...@modperl.pl> wrote:
> > 
> > 	Yes, you can use a Perl handler to do that.  (I do this to send 
> > images stored in PostgreSQL that are retrieved with DBI instead of 
> > reading from files on the file system outside of the document root.)
> > 
> > 	Perl can read X number of bytes into a local variable, then send the 
> > contents of that local variable.  (Flushing the output is optional as 
> > Apache HTTPd should handle this efficiently and automatically.)
> > 
> > 	Make sure you set the "Content-type" header appropriately so that 
> > your users' web browsers can handle the download appropriately:
> > 
> > 		$r->content_type($MIME_TYPE); # --- Set MIME type
> > 
> > 	Within a loop you'll keep reading the next portion of your data into 
> > a variable such as $chunk, and simply send it in the standard way:
> > 
> > 		$r->print($chunk); # --- Send data
> > 
> > 	You can also send other headers (I usually set the content-type 
> > after sending other headers), such as "Content-Disposition" and 
> > whatnot to communicate whatever filename you prefer, and to specify 
> > the file size so that your users can see "percent complete" download 
> > indicators from their web browsers.  (If you'd like an example, feel 
> > free to ask.)
> > 
> >> Using a perl handler as the download link would be ideal. So, if I am
> >> understanding you correctly, so long as I load chunks of the file and call
> >> the perl print function on each chunk individually instead of loading all
> >> of them into a variable and THEN calling print, I should avoid having to
> >> load the complete file into memory?
> >> 
> >> On Fri, Aug 16, 2019 at 2:36 PM Randolf Richardson <ra...@modperl.pl>
> >> wrote:
> >> 
> >>>        One fairly straight-forward approach would be to write a script
> >>> that
> >>> serves as the path for downloads, then have it parse the filename to
> >>> use as a key in determining which content to send to the user.  (The
> >>> AcceptPathInfo directive could be helpful for this, especially if you
> >>> want to the script to appear as a subdirectory to the user.)
> >>> 
> >>>        That script could perform all the necessary security checks that
> >>> you
> >>> need, and read portions of the file to be streamed as needed (the
> >>> streaming protocol implementation details would also need to be
> >>> handled by your script, and you might find it helpful to look at
> >>> what's available in CPAN for the streaming protocol you want to use).
> >>> 
> >>>> In Java servlets, I can stream a file back to the browser one chunk at a
> >>>> time. This has 2 benefits which interest me.
> >>>> 1) Files can be stored outside the web root so users cannot download them
> >>>> unless they are logged in, even if they know the path.
> >>>> 2) Large files can be streamed back to the client without having the
> >>> entire
> >>>> file loaded into memory at the same time
> >>>> 
> >>>> How would you recommend achieving similar functionality in mod_perl?
> >>>> 
> >>>> --
> >>>> John Dunlap
> >>>> *CTO | Lariat *
> >>>> 
> >>>> *Direct:*
> >>>> *john@lariat.co <jo...@lariat.co>*
> >>>> 
> >>>> *Customer Service:*
> >>>> 877.268.6667
> >>>> support@lariat.co
> >>>> 
> >>> 
> >>> 
> >>> Randolf Richardson - randolf@inter-corporate.com
> >>> Inter-Corporate Computer & Network Services, Inc.
> >>> Beautiful British Columbia, Canada
> >>> https://www.inter-corporate.com/
> >>> 
> >>> 
> >>> 
> >> 
> >> -- 
> >> John Dunlap
> >> *CTO | Lariat *
> >> 
> >> *Direct:*
> >> *john@lariat.co <jo...@lariat.co>*
> >> 
> >> *Customer Service:*
> >> 877.268.6667
> >> support@lariat.co
> >> 
> > 
> > 
> > Randolf Richardson - randolf@inter-corporate.com
> > Inter-Corporate Computer & Network Services, Inc.
> > Beautiful British Columbia, Canada
> > https://www.inter-corporate.com/
> > 
> > 
> 


Randolf Richardson - randolf@inter-corporate.com
Inter-Corporate Computer & Network Services, Inc.
Beautiful British Columbia, Canada
https://www.inter-corporate.com/

Re: Streaming files

Posted by Randolf Richardson <ra...@modperl.pl>.

	Yes, you can use a Perl handler to do that.  (I do this to send 
images stored in PostgreSQL that are retrieved with DBI instead of 
reading from files on the file system outside of the document root.)

	Perl can read X number of bytes into a local variable, then send the 
contents of that local variable.  (Flushing the output is optional as 
Apache HTTPd should handle this efficiently and automatically.)

	Make sure you set the "Content-type" header appropriately so that 
your users' web browsers can handle the download appropriately:

		$r->content_type($MIME_TYPE); # --- Set MIME type

	Within a loop you'll keep reading the next portion of your data into 
a variable such as $chunk, and simply send it in the standard way:

		$r->print($chunk); # --- Send data

	You can also send other headers (I usually set the content-type 
after sending other headers), such as "Content-Disposition" and 
whatnot to communicate whatever filename you prefer, and to specify 
the file size so that your users can see "percent complete" download 
indicators from their web browsers.  (If you'd like an example, feel 
free to ask.)

> Using a perl handler as the download link would be ideal. So, if I am
> understanding you correctly, so long as I load chunks of the file and call
> the perl print function on each chunk individually instead of loading all
> of them into a variable and THEN calling print, I should avoid having to
> load the complete file into memory?
> 
> On Fri, Aug 16, 2019 at 2:36 PM Randolf Richardson <ra...@modperl.pl>
> wrote:
> 
> >         One fairly straight-forward approach would be to write a script
> > that
> > serves as the path for downloads, then have it parse the filename to
> > use as a key in determining which content to send to the user.  (The
> > AcceptPathInfo directive could be helpful for this, especially if you
> > want to the script to appear as a subdirectory to the user.)
> >
> >         That script could perform all the necessary security checks that
> > you
> > need, and read portions of the file to be streamed as needed (the
> > streaming protocol implementation details would also need to be
> > handled by your script, and you might find it helpful to look at
> > what's available in CPAN for the streaming protocol you want to use).
> >
> > > In Java servlets, I can stream a file back to the browser one chunk at a
> > > time. This has 2 benefits which interest me.
> > > 1) Files can be stored outside the web root so users cannot download them
> > > unless they are logged in, even if they know the path.
> > > 2) Large files can be streamed back to the client without having the
> > entire
> > > file loaded into memory at the same time
> > >
> > > How would you recommend achieving similar functionality in mod_perl?
> > >
> > > --
> > > John Dunlap
> > > *CTO | Lariat *
> > >
> > > *Direct:*
> > > *john@lariat.co <jo...@lariat.co>*
> > >
> > > *Customer Service:*
> > > 877.268.6667
> > > support@lariat.co
> > >
> >
> >
> > Randolf Richardson - randolf@inter-corporate.com
> > Inter-Corporate Computer & Network Services, Inc.
> > Beautiful British Columbia, Canada
> > https://www.inter-corporate.com/
> >
> >
> >
> 
> -- 
> John Dunlap
> *CTO | Lariat *
> 
> *Direct:*
> *john@lariat.co <jo...@lariat.co>*
> 
> *Customer Service:*
> 877.268.6667
> support@lariat.co
> 


Randolf Richardson - randolf@inter-corporate.com
Inter-Corporate Computer & Network Services, Inc.
Beautiful British Columbia, Canada
https://www.inter-corporate.com/

Re: Streaming files

Posted by Randolf Richardson <ra...@modperl.pl>.

	One fairly straight-forward approach would be to write a script that 
serves as the path for downloads, then have it parse the filename to 
use as a key in determining which content to send to the user.  (The 
AcceptPathInfo directive could be helpful for this, especially if you 
want to the script to appear as a subdirectory to the user.)

	That script could perform all the necessary security checks that you 
need, and read portions of the file to be streamed as needed (the 
streaming protocol implementation details would also need to be 
handled by your script, and you might find it helpful to look at 
what's available in CPAN for the streaming protocol you want to use).

> In Java servlets, I can stream a file back to the browser one chunk at a
> time. This has 2 benefits which interest me.
> 1) Files can be stored outside the web root so users cannot download them
> unless they are logged in, even if they know the path.
> 2) Large files can be streamed back to the client without having the entire
> file loaded into memory at the same time
> 
> How would you recommend achieving similar functionality in mod_perl?
> 
> -- 
> John Dunlap
> *CTO | Lariat *
> 
> *Direct:*
> *john@lariat.co <jo...@lariat.co>*
> 
> *Customer Service:*
> 877.268.6667
> support@lariat.co
> 


Randolf Richardson - randolf@inter-corporate.com
Inter-Corporate Computer & Network Services, Inc.
Beautiful British Columbia, Canada
https://www.inter-corporate.com/