You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Mark Stosberg <ma...@summersault.com> on 2008/09/29 21:00:41 UTC

Any success with storing photos in a database?

This question isn't so much a mod_perl question, as it is a question
about building high performance websites with Perl. 

We have a large, busy, database application that relates to millions of
photos, which we also need to store and display. We've been keeping the
meta data about the photos in PostgreSQL, and the files on the file
system. (Implemented much like CGI::Uploader ).

This has worked great in terms of performance, but with so much data to
manage, over time we have run into data inconsistency issues between the
file system and the database.

So, I'm asking if anyone has had experience successfully storing photos
(or othe files) directly in database? That would solve the consistency
issue, but may create a performance issue. Perhaps the right kind of
caching layer could solve that.

    Mark

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark Stosberg            Principal Developer  
   mark@summersault.com     Summersault, LLC     
   765-939-9301 ext 202     database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .



Re: Any success with storing photos in a database?

Posted by Mark Stosberg <ma...@summersault.com>.
> Usually people just make sure inserts/updates to the photo table is
> done in a transaction and if that transaction succeeds or fails, it
> does the appropriate write/delete on the file system.  

I could perhaps be better about "rolling back" file system actions
if a DB transaction files. 

I'll be looking more into the cause today. Initial findings suggest that
there was import issue with legacy data long ago, and a significant
issue may not even remain in in the current code.

> But since you're using PostgreSQL ( my favorite database and a large
> part of my consulting practice ) you could even go so far as to write a
> few pl/perl stored procedures to handle keeping the file system in sync 
> with the database. 

Ah, interesting idea. I'll keep that in mind. I'll have to deal with the
hoop that the image storage server is currently not accessible from the
master PostgreSQL server.

    Mark

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark Stosberg            Principal Developer  
   mark@summersault.com     Summersault, LLC     
   765-939-9301 ext 202     database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .



Re: Any success with storing photos in a database?

Posted by Frank Wiles <fr...@wiles.org>.
On Mon, 29 Sep 2008 15:00:41 -0400
Mark Stosberg <ma...@summersault.com> wrote:

> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl. 
> 
> We have a large, busy, database application that relates to millions
> of photos, which we also need to store and display. We've been
> keeping the meta data about the photos in PostgreSQL, and the files
> on the file system. (Implemented much like CGI::Uploader ).
> 
> This has worked great in terms of performance, but with so much data
> to manage, over time we have run into data inconsistency issues
> between the file system and the database.
> 
> So, I'm asking if anyone has had experience successfully storing
> photos (or othe files) directly in database? That would solve the
> consistency issue, but may create a performance issue. Perhaps the
> right kind of caching layer could solve that.

Actually you're already doing it correctly.  Andre already mentioned
many of the pitfalls of trying to store large binary data in a database,
so I won't rehash them again.

The only issue you seem to be having is the inconsistency.  That issue
is going to be much easier to solve than trying to scale by putting the
photos in the database. 

Usually people just make sure inserts/updates to the photo table is
done in a transaction and if that transaction succeeds or fails, it
does the appropriate write/delete on the file system.  

But since you're using PostgreSQL ( my favorite database and a large
part of my consulting practice ) you could even go so far as to write a
few pl/perl stored procedures to handle keeping the file system in sync 
with the database. 

 -------------------------------------------------------
   Frank Wiles, Revolution Systems, LLC. 
     Personal : frank@wiles.org  http://www.wiles.org
     Work     : frank@revsys.com http://www.revsys.com 


Re: Any success with storing photos in a database?

Posted by David Nicol <da...@gmail.com>.
On Mon, Sep 29, 2008 at 3:16 PM, André Warnier <aw...@ice-sa.com> wrote:

> Oh, and I've thought of another advantage, in an Apache/web context : to
> send the content of a file to a browser, you can take advantage of the
> sendfile() call, which is very efficient.  Now if your file is a blob in a
> row of a database, you have to read it yourself in memory, and send it,
> don't you ?

SQLite has an API for opening handles to blobs, but you would have to
write something in C to work with it.

Re: Any success with storing photos in a database?

Posted by Mark Stosberg <ma...@summersault.com>.
> One point I am curious about, is what kind of file structure you use to 
> store the millions of images on the filesystem.  I can't imagine that 
> you do it really into one flat directory ?

Thanks for the response. 

We use the 'md5' scheme in CGI::Uploader. From the docs:  
  "[We] will create three levels of directories based on the first three letters
  of the ID's md5 sum. The result may look like this:"
    2/0/2/123.jpg" 

> And are you storing the real paths directly in the database ?

No, we generate the path based on the unique ID. 

You points were all interesting and steer me back in the direction of trying to better
understand and eliminate how inconsistency is creeping into our system.

    Mark

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark Stosberg            Principal Developer  
   mark@summersault.com     Summersault, LLC     
   765-939-9301 ext 202     database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .



Re: Any success with storing photos in a database?

Posted by Michael Peters <mp...@plusthree.com>.
Himanshu wrote:

> A newbie question in the insightful thread. Can serving static files
> and cookie authentication go together? 

Yes. You can do this with something like mod_auth_tkt in a proxy web server (very light) and have a 
backend mod_perl (or php, or Java, etc) server set the cookie after someone has logged in. Then you 
just configure your auth in your proxy for which groups have access to which files.

> Second what
> are the sizes at which one should start moving to files. e.g. how
> about plain text blog entries containing no more than say 5 KBytes.

It really depends on you database. In MySQL using BLOBS won't really cause
problems unless you use queries that involve full table scans or queries that actually pull or 
manipulate the BLOBS a lot. So if you have the proper indexes and only pull the BLOBS rarely, then 
you should be ok for most uses.

But as far as blogs go, you really shouldn't be pulling the text of the blog out of the DB for every 
request. Since it changes very rarely, especially compared to the number of times it's read then 
storing it in a "published" state on the filesystem is usually best. And if you still need to add 
dynamic bits to the page at run time, then instead of publishing a .html file, you can publish a 
template (for Template Toolkit, HTML::Template, etc).

-- 
Michael Peters
Plus Three, LP


Re: Any success with storing photos in a database?

Posted by Himanshu <hi...@gmail.com>.
2008/9/30 Frank Wiles <fr...@wiles.org>:
> On Mon, 29 Sep 2008 22:16:12 +0200
> André Warnier <aw...@ice-sa.com> wrote:
>
>> We have also looked at various data organisations over time.
>> Regarding storing large objects directly in a database, the one issue
>> is always that (because of the object sizes), it makes any operation
>> on the rows of such a database or table very heavy.  Imagine having
>> to dump or reload a table that contains 500,000 "blobs" of 2-3 MB
>> each. (Don't know about PostgreSQL, but many db systems require a
>> dump and a reload when you change a table structure).
>
> FYI PostgreSQL doesn't require a dump/reload when altering the table
> structure.
>
>> Oh, and I've thought of another advantage, in an Apache/web context :
>> to send the content of a file to a browser, you can take advantage of
>> the sendfile() call, which is very efficient.  Now if your file is a
>> blob in a row of a database, you have to read it yourself in memory,
>> and send it, don't you ?
>
> That is one advantage, but I'm going to take it a step further and say
> by having the files outside of the database you don't even need Apache,
> but should in fact use a lighter weight web server like tux, nginx, or
> lighttpd for serving up static media like that.

A newbie question in the insightful thread. Can serving static files
and cookie authentication go together? If yes, any hints. Second what
are the sizes at which one should start moving to files. e.g. how
about plain text blog entries containing no more than say 5 KBytes.

Thanks,
Himanshu

>
>  -------------------------------------------------------
>   Frank Wiles, Revolution Systems, LLC.
>     Personal : frank@wiles.org  http://www.wiles.org
>     Work     : frank@revsys.com http://www.revsys.com
>
>

Re: Any success with storing photos in a database?

Posted by Frank Wiles <fr...@wiles.org>.
On Mon, 29 Sep 2008 22:16:12 +0200
André Warnier <aw...@ice-sa.com> wrote:

> We have also looked at various data organisations over time.
> Regarding storing large objects directly in a database, the one issue
> is always that (because of the object sizes), it makes any operation
> on the rows of such a database or table very heavy.  Imagine having
> to dump or reload a table that contains 500,000 "blobs" of 2-3 MB
> each. (Don't know about PostgreSQL, but many db systems require a
> dump and a reload when you change a table structure).

FYI PostgreSQL doesn't require a dump/reload when altering the table
structure. 

> Oh, and I've thought of another advantage, in an Apache/web context :
> to send the content of a file to a browser, you can take advantage of
> the sendfile() call, which is very efficient.  Now if your file is a
> blob in a row of a database, you have to read it yourself in memory,
> and send it, don't you ?

That is one advantage, but I'm going to take it a step further and say
by having the files outside of the database you don't even need Apache,
but should in fact use a lighter weight web server like tux, nginx, or
lighttpd for serving up static media like that. 

 -------------------------------------------------------
   Frank Wiles, Revolution Systems, LLC. 
     Personal : frank@wiles.org  http://www.wiles.org
     Work     : frank@revsys.com http://www.revsys.com 


Re: Any success with storing photos in a database?

Posted by André Warnier <aw...@ice-sa.com>.
Mark Stosberg wrote:
> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl. 
> 
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display. We've been keeping the
> meta data about the photos in PostgreSQL, and the files on the file
> system. (Implemented much like CGI::Uploader ).
> 
> This has worked great in terms of performance, but with so much data to
> manage, over time we have run into data inconsistency issues between the
> file system and the database.
> 
> So, I'm asking if anyone has had experience successfully storing photos
> (or othe files) directly in database? That would solve the consistency
> issue, but may create a performance issue. Perhaps the right kind of
> caching layer could solve that.
> 
I am curious about your application, because we have something similar 
and similar volumes, not with photos but with documents in general.

The following is an opinion piece.

We have also looked at various data organisations over time.
Regarding storing large objects directly in a database, the one issue is 
always that (because of the object sizes), it makes any operation on the 
rows of such a database or table very heavy.  Imagine having to dump or 
reload a table that contains 500,000 "blobs" of 2-3 MB each.
(Don't know about PostgreSQL, but many db systems require a dump and a 
reload when you change a table structure).
Or simply take a backup of that table.  And you cannot make an "rsync" 
of a database as easily as of a filesystem.
It also means that any intermediate buffer (which are often used to 
improve retrieval of "nearby" rows) is quickly full, with few rows in it.
(On the other hand, if you just keep thumbnails of a couple of Kb, I 
guess it would not matter much.)

Another issue is that it happens that databases get screwed up, and the 
likelihood probably increases as you push them to their limits (for 
example with very large row sizes). Resolving some inconsistencies 
between database rows and files on disk may be no fun, but resolving 
inconsistencies within a database may be even less so.

One point I am curious about, is what kind of file structure you use to 
store the millions of images on the filesystem.  I can't imagine that 
you do it really into one flat directory ?
And are you storing the real paths directly in the database ?

To get back to your issue of inconsistency : maybe the best strategy is 
just to check for such inconsistencies as early as possible ?  For 
example, when you add or delete objects, write this information 
somewhere in a daily "transactions" file, which is then analysed at 
night by some job which checks that everything is really where it is 
supposed to be, and lets you know when not.

Regarding efficiency : when you think about it, a modern filesystem is 
about the fastest, most efficient in space and most reliable database 
system that one can think of, with the lowest overhead, as long as you 
know the exact path of an object, and as long as all the directories in 
the path are kept to a reasonable size (important).  It has its inherent 
buffering at various levels, optimised to access files.  It has a whole 
bunch of utilities to manipulate it; it it shareable, but can be locked 
when you need it. It is portable.
It does have one inconvenient : it has a single "key" to access an 
object (the path).  But for that, you have your database system.

Oh, and I've thought of another advantage, in an Apache/web context : to 
send the content of a file to a browser, you can take advantage of the 
sendfile() call, which is very efficient.  Now if your file is a blob in 
a row of a database, you have to read it yourself in memory, and send 
it, don't you ?

I've just re-convinced myself not to change our storage structure.

Re: Any success with storing photos in a database?

Posted by James Smith <js...@sanger.ac.uk>.

On Tue, 30 Sep 2008, Cosimo Streppone wrote:

> In data 30 settembre 2008 alle ore 00:09:52, James Smith <js...@sanger.ac.uk> 
> ha scritto:
>
>> On Mon, 29 Sep 2008, Cosimo Streppone wrote:
>> 
>>> In data 29 settembre 2008 alle ore 23:45:05, James Smith 
>>> <js...@sanger.ac.uk> ha scritto:
>>> 
>>>> There are good reasons to store images (especially small ones) in 
>>>> databases (and with careful management of headers in your mod_perl).
>
>>> If you have "proper" metadata, you can go and delete your files.
>> 
>> We have "proper" meta data
>
> Yes, sorry. I was thinking about our case.
>
>> we produce and delete anwhere up to and including 1/2
>> million files per day - and
>> the deletion is the crippling stage on a journalled file system
>
> I see. Again, our case is very different though.
> 99,99% is create/add/modify and we almost never delete.
>
> What filesystem/os do you use?

It has to be a shared filesystem - so at the moment GPFS/red had; we are 
moving over to a memcached system to store and server the temporary images;
otherwise we would be stuck with NFS or Lustre both of which fail
quite badly with small files.. all this is backed by fibre attached
SAN storage

(they really are temporary and can be easily restored)

James

>
> -- 
> Cosimo


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

Re: Any success with storing photos in a database?

Posted by Cosimo Streppone <co...@streppone.it>.
In data 30 settembre 2008 alle ore 00:09:52, James Smith  
<js...@sanger.ac.uk> ha scritto:

> On Mon, 29 Sep 2008, Cosimo Streppone wrote:
>
>> In data 29 settembre 2008 alle ore 23:45:05, James Smith  
>> <js...@sanger.ac.uk> ha scritto:
>>
>>> There are good reasons to store images (especially small ones) in  
>>> databases (and with careful management of headers in your mod_perl).

>> If you have "proper" metadata, you can go and delete your files.
>
> We have "proper" meta data

Yes, sorry. I was thinking about our case.

> we produce and delete anwhere up to and including 1/2
> million files per day - and
> the deletion is the crippling stage on a journalled file system

I see. Again, our case is very different though.
99,99% is create/add/modify and we almost never delete.

What filesystem/os do you use?

-- 
Cosimo

Re: Any success with storing photos in a database?

Posted by James Smith <js...@sanger.ac.uk>.

On Mon, 29 Sep 2008, Cosimo Streppone wrote:

> In data 29 settembre 2008 alle ore 23:45:05, James Smith <js...@sanger.ac.uk> 
> ha scritto:
>
>> There are good reasons to store images (especially small ones) in databases 
>> (and with careful management of headers in your mod_perl).
>> 
>> Some of you have missed inherent problems with the file systems
>> even balanced heirarchical tree - ones in a shared server
>> environment which can lead to gross efficiencies - in your cases
>> you may not be doing multiple deletes - but in the examples I work
>> with it is not the creation and storage which breaks the file
>> system, but the requirement to clear our old files before filling
>> up the file system.
>
> If you have "proper" metadata, you can go and delete your files.
> In our case, we chose to hash our paths by basically user-id,
> so every file owned by a user is in the same folder and
> can be deleted without any problems.
>

We have "proper" meta data - deleting files is in all file systems an
expensive operation, if you have a large number of files to delete,
the overhead of deleting files can become excessive - we produce and
delete anwhere up to and including 1/2 million files per day - and
the deletion is the crippling stage on a journalled file system

> Maybe I didn't get your point.
>
> -- 
> Cosimo


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

Re: Any success with storing photos in a database?

Posted by Cosimo Streppone <co...@streppone.it>.
In data 29 settembre 2008 alle ore 23:45:05, James Smith  
<js...@sanger.ac.uk> ha scritto:

> There are good reasons to store images (especially small ones) in  
> databases (and with careful management of headers in your mod_perl).
>
> Some of you have missed inherent problems with the file systems
> even balanced heirarchical tree - ones in a shared server
> environment which can lead to gross efficiencies - in your cases
> you may not be doing multiple deletes - but in the examples I work
> with it is not the creation and storage which breaks the file
> system, but the requirement to clear our old files before filling
> up the file system.

If you have "proper" metadata, you can go and delete your files.
In our case, we chose to hash our paths by basically user-id,
so every file owned by a user is in the same folder and
can be deleted without any problems.

Maybe I didn't get your point.

-- 
Cosimo

Re: Any success with storing photos in a database?

Posted by André Warnier <aw...@ice-sa.com>.
James Smith wrote:
> 
> 
> There are good reasons to store images (especially small ones) in 
> databases (and with careful management of headers in your mod_perl).
> 
> Some of you have missed inherent problems with the file systems
> even balanced heirarchical tree - ones in a shared server
> environment which can lead to gross efficiencies - in your cases
> you may not be doing multiple deletes - but in the examples I work
> with it is not the creation and storage which breaks the file
> system, but the requirement to clear our old files before filling
> up the file system.
> 
> 

Finally, someone who does not agree with the general line.
That makes it more interesting.

As for the size, the OP indicated that his objects were pictures (photos 
I believe), which tend to be in the multi-MB range, and growing as 
cameras get more pixels.  The documents we handle also, since there is a 
notable tendency to inflation of office document sizes, what with 
embedded pictures and graphics and such.

I guess indeed it also depends on whether the stored images/documents 
are transient, or stay forever. In our case they stay forever, because 
they are part of a kind of archive. Once loaded, a document is never 
deleted.
That is why we do not just store the original documents in a purely 
hierarchical file structure, but have developed a system based on a 
notion of "logical volumes", which can be moved without affecting the 
link between the database records and the stored objects.
This being said, the organisation in question is still on top of 
classical filesystems.
In recent years, the universal support under Unix/Linux of "LVM's" much 
simplifies the question of space management, physical location etc.
Disk storage per volume unit is also getting ever larger and cheaper, so 
it does not cause us big concern.
At the limit, if a certain document collection is not needed anymore, 
the corresponding records can be moved to an archive table or database, 
but the documents themselves stay.





Re: Any success with storing photos in a database?

Posted by James Smith <js...@sanger.ac.uk>.

There are good reasons to store images (especially small ones) in 
databases (and with careful management of headers in your mod_perl).

Some of you have missed inherent problems with the file systems
even balanced heirarchical tree - ones in a shared server
environment which can lead to gross efficiencies - in your cases
you may not be doing multiple deletes - but in the examples I work
with it is not the creation and storage which breaks the file
system, but the requirement to clear our old files before filling
up the file system.


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

Re: Any success with storing photos in a database?

Posted by Cosimo Streppone <co...@streppone.it>.
In data 29 settembre 2008 alle ore 21:00:41, Mark Stosberg  
<ma...@summersault.com> ha scritto:

> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl.
>
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display. We've been keeping the
> meta data about the photos in PostgreSQL, and the files on the file
> system. (Implemented much like CGI::Uploader ).

We have:

a) ~150,000 avatar tiny pictures (50x50);
b) ~300,000 user photos (320x240 originals), but
    also available in 4 more sizes;
c) tens of millions of album pictures in original
    and thumbnail sizes;

We're using MySQL 5.0 with MyISAM storage engine. Yes.
Until recently, a) & b) were stored into a MySQL blob field. Yes.
Did you hear me screaming? :-)

Problems I found when I started working here:

- our mod_perl backends were serving 20-40% of picture requests,
   which is completely insane;

- our picture-serving code was fetching the picture from the database,
   eventually scaling it on the fly (!), saving it in memcached
   and $r->print()ing it out down the wire.
   That's completely insane. The scaling even disabled caching.

- when you update a picture metadata (this is MySQL), you _LOCK_
   the _ENTIRE_ table with hundreds of thousands images.

- fetching from a blob field in MySQL is expensive.

Now instead:

- avatars were our first experiment. They are stored as static resources
   with a hierachical and balanced filesystem structure
   (using digests and splitting them up);

   We completely removed the caching layer from our mod_perl code,
   because caching happens directly in the browser for static
   resources;

   We managed to move away 500,000 req/day from mod_perl to static
   HTTP servers.

- User photos are coming. We wrote a nice application layer that
   can upload a single resource to many pools of static servers and
   in different sizes with automatic thumbnailing. The filesystem path
   scheme can be defined into the resource perl class, but basically
   is the same digest + splitting, like in:

     http://static.myapp.com/pool1/a1/b2c/d3f4g5h6.../123456_m.jpg

   We are mass-exporting pictures from the database blobs to our
   filesystems via DAV. Using DAV is not the most efficient way but
   allows you to attach arbitrary metadata to the filesystem.

   We now use that to resolve inconsistencies and "sync" the
   metadata in the database.
   This should go live with next release. I'll let you know :)

I'm not sure what to do for album pictures. They are already out
of the database, thank god, but the "design" guys now want 3-4
thumbnails even for those pics. Suggestions?

> This has worked great in terms of performance, but with so much data to
> manage, over time we have run into data inconsistency issues between the
> file system and the database.

Can you explain the issues you found?
I'd really like to know, so I'm prepared. :-)

> So, I'm asking if anyone has had experience successfully storing photos
> (or othe files) directly in database? That would solve the consistency
> issue, but may create a performance issue.

Yes, performance issues.

> Perhaps the right kind of caching layer could solve that.

I'm not sure. If you throw caching into the "pics-in-the-db" mess,
IMHO you only make the situation worse.

-- 
Cosimo

Re: Any success with storing photos in a database?

Posted by "W. Tyler Gee" <ge...@gmail.com>.
I'm pretty sure that the consensus is to never actually store the files in
the database.   What actual inconsistencies are you seeing that you are
trying to fix?

On Mon, Sep 29, 2008 at 1:00 PM, Mark Stosberg <ma...@summersault.com> wrote:

>
> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl.
>
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display. We've been keeping the
> meta data about the photos in PostgreSQL, and the files on the file
> system. (Implemented much like CGI::Uploader ).
>
> This has worked great in terms of performance, but with so much data to
> manage, over time we have run into data inconsistency issues between the
> file system and the database.
>
> So, I'm asking if anyone has had experience successfully storing photos
> (or othe files) directly in database? That would solve the consistency
> issue, but may create a performance issue. Perhaps the right kind of
> caching layer could solve that.
>
>    Mark
>
> --
>  . . . . . . . . . . . . . . . . . . . . . . . . . . .
>   Mark Stosberg            Principal Developer
>   mark@summersault.com     Summersault, LLC
>   765-939-9301 ext 202     database driven websites
>  . . . . . http://www.summersault.com/ . . . . . . . .
>
>
>


-- 
~Tyler

Re: Any success with storing photos in a database?

Posted by Jonathan Vanasco <jo...@2xlp.com>.
chiming in a few months late...

from my experience, and responding to some thoughts in the thread:

	- storing photos in mysql/pgsql is not a good idea.  aside from misc  
issues that always arise, you end up stressing the db through searches
	- the better way would be to store photo meta-data in the system,  
and then search for that to serve.   its a much leaner search and  
faster/more lightweight.

	-we played around with a lot of things

	-- mogilefs
  	---  i think flickr also used pre-yahoo. someone mentioned a  
flickrbook, there is also a powerpoint from about 1year after they  
launched

	-- directory hashing
	--- create an md5 and store in db; store on filesystem as ab/cd/ef/ 
gh/abcdefg + suffix ( [original,-thumb ,-medium].jpg )
	---- use rewrite rules on nginx to loadbalance across cluster by  
bucket and serve

	-- i didn't like perlbal, but it can do some neat in-time  
bitshifting on gifs to change the color palatte

	-- the BEST solution we found and upgraded to was.. Amazon s3 +  
Panther Express.  this is before amazon offered cdn services
	--- store files on amazon -- originals in private bucket, copies in  
public buckets
	--- serve files from panther, which was a kickass cdn.  they cache  
everything in their network , which offsets instabilities from amazon
	--- don't worry about disk usage, clustering, load balancing,  
BACKUP, and miscellaneous stuff designed around images
	--- migrating to that setup took us 2 days... and it was a better  
performer than 3 months of code and constant maintenance





Re: Any success with storing photos in a database? (prevents double-submits)

Posted by Eric Howe <er...@pieinsky.ca>.
On Oct 15, 2008, at 11:11, Cosimo Streppone wrote:

> On Wed, Oct 15, 2008 at 12:31 PM, Mark wrote:
[...]
> Then Perrin wrote:
>
>> JavaScript is okay, but can be a problem when people hit back
>> expecting to use the form again and the button is still disabled.
>
> You could always trigger some javascript event on page load
> the re-enables the submit control, even if already enabled,
> resetting the flag to "0". This is what I did.

I do it this way (via jQuery) and it works okay even when faced with  
the browser's back-button:

	function returnFalse() {
	        return false;
	}
	$(document).ready(function() {
	        // Take care of back-button issues first.
         	$('form').find(':submit').unbind('click', returnFalse);
         	// Then wire up the auto-disabling stuff.
         	$('form').submit(function() {
         	        $(this).find(':submit').click(returnFalse);
         	        return true;
         	});
	});

I don't use the 'disabled' attribute as that prevents the browser (at  
least some of them) from sending the button back to the server with  
the rest of the form.

Eric Howe
eric@pieinsky.ca


Re: Any success with storing photos in a database? (prevents double-submits)

Posted by Cosimo Streppone <co...@streppone.it>.
On Wed, Oct 15, 2008 at 12:31 PM, Mark wrote:

> We had a "double submit" bug that allowed a form to be submitted twice  
> when we weren't fully prepared for that. [...]
> One approach we are considering is change the submit action on forms
> with JavaScript, so it disables the submit button

That's the cheapest option IMO, and it works very well
for 95% of the typical double submit cases (like Windows
users with the "double-click" syndrome :).

On submit event I checked for a flag to be "1".
If "1", then form has been already submitted.
If "0", set to "1" and submit.

Then Perrin wrote:

> JavaScript is okay, but can be a problem when people hit back
> expecting to use the form again and the button is still disabled.

You could always trigger some javascript event on page load
the re-enables the submit control, even if already enabled,
resetting the flag to "0". This is what I did.

Then, if you want to be 100% sure, you have to work
on the server side...

-- 
Cosimo

Re: Any success with storing photos in a database? (prevents double-submits)

Posted by Mark Stosberg <ma...@summersault.com>.
> Take some elements from the form and search for an add in the past  
> <time period>. We did one recently where we took the email address,  
> recipient and amount from the form and then made sure that the same  
> data hadn't already been submitted in the last 24 hours - working on  
> the principle that a user is unlikely to gift the exact same amount to  
> the same person on the same day. The check is trivial compared to the  
> commit.
> 
> FWIW, we looked into this and this approach seemed to be the quickest  
> and most reliable available. If you Google on this, you may find an  
> article or two which support this approach.

This does seem workable and reliable. I suppose I'm hoping for an
approach that can easy re-used across many forms, without much per-form
effort required.

Also, I think for some kinds of changes being submitted, the check of
"has this happened recently" won't apply. 

For example, we allow people to re-order a list of photos. They may well
reverse the ordering a few seconds later and put the photos back in the
original order. (Although that example may not be vurnerable to a
double-submit bug, as it would be valid to submit the same ordering
request twice. )

    Mark



Re: Any success with storing photos in a database? (prevents double-submits)

Posted by Simon Forster <si...@ldml.com>.
On 15 Oct 2008, at 18:49, Mark Stosberg wrote:

> On Wed, 15 Oct 2008 12:41:55 -0400
> "Perrin Harkins" <pe...@elem.com> wrote:
>
>> On Wed, Oct 15, 2008 at 12:31 PM, Mark Stosberg  
>> <ma...@summersault.com> wrote:
>>> We had a "double submit" bug that allowed a form to be submitted  
>>> twice when we
>>> weren't fully prepared for that. We are still researching the best  
>>> practices to
>>> address this a general case. One approach we are considering is  
>>> change the
>>> submit action on forms with JavaScript, so it disables the submit  
>>> button, and
>>> then actually submit the form, preventing one kind of double- 
>>> submission. It
>>> seems like I don't see this approach happening in the wild much,  
>>> though. I
>>> suspect there is a better solution.
>>
>> JavaScript is okay, but can be a problem when people hit back
>> expecting to use the form again and the button is still disabled.
>
> Thanks for the response.
>
> That's what I was concerned about. I don't have a sense of how
> much this would happen in practice.

Take some elements from the form and search for an add in the past  
<time period>. We did one recently where we took the email address,  
recipient and amount from the form and then made sure that the same  
data hadn't already been submitted in the last 24 hours - working on  
the principle that a user is unlikely to gift the exact same amount to  
the same person on the same day. The check is trivial compared to the  
commit.

FWIW, we looked into this and this approach seemed to be the quickest  
and most reliable available. If you Google on this, you may find an  
article or two which support this approach.

HTH

Simon

Re: Any success with storing photos in a database? (prevents double-submits)

Posted by Michael Peters <mp...@plusthree.com>.
Mark Stosberg wrote:

> So how might an implementation look? 

I would either make the uuid the primary key (might affect performance since it's not an integer, 
but a string) or a unique key for the same table. Then you don't have anything else to keep track of 
(no extra tables, etc).

> Every so often, the table could cleaned up via cron, (since we probably don't
> care about seeing the same UUID weeks apart, just seconds or minutes apart).

UUID's should never collide.

> There is still room for a small race condition in between checking to see if we
> used the UUID and inserting it, but I think that may be acceptable.

If you're really worried about someone attacking you in this way then insert the record with the 
uuid first and then let them upload. If you don't find the uuid they are trying to upload to, then 
they changed it so just disallow the upload.

-- 
Michael Peters
Plus Three, LP


Re: Any success with storing photos in a database? (prevents double-submits)

Posted by Mark Stosberg <ma...@summersault.com>.
On Wed, 15 Oct 2008 13:51:21 -0400
Michael Peters <mp...@plusthree.com> wrote:

> Mark Stosberg wrote:
> 
> > At one point in the past I did a variation of this where we put the next ID of
> > a related database sequence in the form, and this would become the new primary
> > key when inserted, and it would of course not allow the same primary key to be
> > used twice. That worked, but I realized was open to abuse if a user tweaked the
> > number to be larger than the sequence. Then, eventually a legitimate user would
> > eventually be assigned that value by the sequence, and it would fail. 
> > 
> > Are their specific modules that you recommend to help with this?
> 
> You can try a UUID (look at Data::UUID). It's not sequential, but someone
> could still tweak the form to create a value that could potentially conflict
> with another value in the future. But I'd say it's much less likely than a
> sequential id.

So how might an implementation look? 

We could create table of "form_uuids".  Each time a form is submitted, we check
if the UUID we have is has already been used.

If not, we insert the new UUID and proceed.

Every so often, the table could cleaned up via cron, (since we probably don't
care about seeing the same UUID weeks apart, just seconds or minutes apart).

There is still room for a small race condition in between checking to see if we
used the UUID and inserting it, but I think that may be acceptable. 

    Mark



Re: Any success with storing photos in a database? (prevents double-submits)

Posted by Michael Peters <mp...@plusthree.com>.
Mark Stosberg wrote:

> At one point in the past I did a variation of this where we put the next ID of
> a related database sequence in the form, and this would become the new primary
> key when inserted, and it would of course not allow the same primary key to be
> used twice. That worked, but I realized was open to abuse if a user tweaked the
> number to be larger than the sequence. Then, eventually a legitimate user would
> eventually be assigned that value by the sequence, and it would fail. 
> 
> Are their specific modules that you recommend to help with this?

You can try a UUID (look at Data::UUID). It's not sequential, but someone could still tweak the form 
to create a value that could potentially conflict with another value in the future. But I'd say it's 
much less likely than a sequential id.

-- 
Michael Peters
Plus Three, LP


Re: Any success with storing photos in a database? (prevents double-submits)

Posted by Mark Stosberg <ma...@summersault.com>.
On Wed, 15 Oct 2008 12:41:55 -0400
"Perrin Harkins" <pe...@elem.com> wrote:

> On Wed, Oct 15, 2008 at 12:31 PM, Mark Stosberg <ma...@summersault.com> wrote:
> > We had a "double submit" bug that allowed a form to be submitted twice when we
> > weren't fully prepared for that. We are still researching the best practices to
> > address this a general case. One approach we are considering is change the
> > submit action on forms with JavaScript, so it disables the submit button, and
> > then actually submit the form, preventing one kind of double-submission. It
> > seems like I don't see this approach happening in the wild much, though. I
> > suspect there is a better solution.
> 
> JavaScript is okay, but can be a problem when people hit back
> expecting to use the form again and the button is still disabled.

Thanks for the response.

That's what I was concerned about. I don't have a sense of how 
much this would happen in practice. 

> Another approach is a unique ID in the form that you track in the
> user's session (i.e. this ID was seen before).  If the problem is
> large uploads with no feedback until they finish, you can use one of
> the upload progress tools.

At one point in the past I did a variation of this where we put the next ID of
a related database sequence in the form, and this would become the new primary
key when inserted, and it would of course not allow the same primary key to be
used twice. That worked, but I realized was open to abuse if a user tweaked the
number to be larger than the sequence. Then, eventually a legitimate user would
eventually be assigned that value by the sequence, and it would fail. 

Are their specific modules that you recommend to help with this?

    Mark



Re: Any success with storing photos in a database? (prevents double-submits)

Posted by Perrin Harkins <pe...@elem.com>.
On Wed, Oct 15, 2008 at 12:31 PM, Mark Stosberg <ma...@summersault.com> wrote:
> We had a "double submit" bug that allowed a form to be submitted twice when we
> weren't fully prepared for that. We are still researching the best practices to
> address this a general case. One approach we are considering is change the
> submit action on forms with JavaScript, so it disables the submit button, and
> then actually submit the form, preventing one kind of double-submission. It
> seems like I don't see this approach happening in the wild much, though. I
> suspect there is a better solution.

JavaScript is okay, but can be a problem when people hit back
expecting to use the form again and the button is still disabled.
Another approach is a unique ID in the form that you track in the
user's session (i.e. this ID was seen before).  If the problem is
large uploads with no feedback until they finish, you can use one of
the upload progress tools.

- Perrin

Re: Any success with storing photos in a database? (prevents double-submits)

Posted by Mark Stosberg <ma...@summersault.com>.
On Tue, 30 Sep 2008 10:06:26 -0400
Mark Stosberg <ma...@summersault.com> wrote:

> On Mon, 29 Sep 2008 17:04:29 -0400
> "Perrin Harkins" <pe...@elem.com> wrote:
> 
> > On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg <ma...@summersault.com> wrote:
> > > We have a large, busy, database application that relates to millions of
> > > photos, which we also need to store and display.
> 
> If I find anything "interesting" about our inconsistency, I'll report back. 

I said I'd report back if I found out where the inconsistency was appearing, between files
on the file system and meta-data in the database. 

So far, I've found the following:

We had a "double submit" bug that allowed a form to be submitted twice when we
weren't fully prepared for that. We are still researching the best practices to
address this a general case. One approach we are considering is change the
submit action on forms with JavaScript, so it disables the submit button, and
then actually submit the form, preventing one kind of double-submission. It
seems like I don't see this approach happening in the wild much, though. I
suspect there is a better solution.

Beyond this, I think we've found the remaining inconsistencies to be happening
at extremely low rates, to the point where it might be worth completely
tracking down the final issue. 

In summary, I think the majority advice found on this list holds true: storing
image files on the file system and meta-data in the database is a good way to
go.

    Mark

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark Stosberg            Principal Developer  
   mark@summersault.com     Summersault, LLC     
   765-939-9301 ext 202     database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .



Re: Any success with storing photos in a database?

Posted by Mark Stosberg <ma...@summersault.com>.
On Mon, 29 Sep 2008 17:04:29 -0400
"Perrin Harkins" <pe...@elem.com> wrote:

> On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg <ma...@summersault.com> wrote:
> > We have a large, busy, database application that relates to millions of
> > photos, which we also need to store and display.
> 
> Have you read Cal Henderson's book about how Flickr works?  

It looks familiar, but I don't think I've read it. I've bookmarked it
now. Thanks for the suggestion, Perrin!

If I find anything "interesting" about our inconsistency, I'll report back. 

I was just reviewing the code flow of 'store_upload()' in CGI::Uploader:

http://search.cpan.org/src/MARKSTOS/CGI-Uploader-2.15/lib/CGI/Uploader.pm

The key part I see is for the 'update' case where we:

1. Delete old generated files (like Thumbnails)
2. Run the DB update
3. Regenerate the thumbnails. 

The DB update failed, we couldn't "undelete" the thumbnails. That design could perhaps could be improved. 

Also, I see that this code doesn't use transactions, I think because it expects that the outer application code may wrap this work in a larger action. 

With current DBD::Pg and PostgreSQL versions, I could add some "savepoints" here, to make some parts
of this more transactional (like inserting a photo and all of it's thumbnails in the same in one transactions). 

The savepoint could would be specific to using the PostgreSQL driver, as I'm not aware that other DBs support the same syntax. 

Other ideas for improvements here are welcome. 

    Mark

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark Stosberg            Principal Developer  
   mark@summersault.com     Summersault, LLC     
   765-939-9301 ext 202     database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .



Re: Any success with storing photos in a database?

Posted by Perrin Harkins <pe...@elem.com>.
On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg <ma...@summersault.com> wrote:
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display.

Have you read Cal Henderson's book about how Flickr works?  It's a bit
extreme, but interesting.  A smaller version of the "many photos"
problem is LiveJournal, who use their custom file storage API
(MogileFS) and serve the files with their own web server (perlbal).
Definitely another extreme solution that should probably be considered
a last resort after trying to make easier stuff work, but it sounds
like the easier stuff is not working for you.

- Perrin