You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Alan D. Cabrera" <li...@toolazydogs.com> on 2007/08/24 02:43:28 UTC

A JCR with infinite capacity

I need a JCR that has infinite capacity.  This is for a photo  
repository that would store millions of 5-10MB photos.  The photos  
themselves would not change.  I need to be able to add

One idea that I've been toying with is to create a persistence  
manager that manages multiple stores.  Some of these stores could  
even be on other physical servers, maybe even be  a farm of servers.   
I was thinking of managing the node and property states along the  
same lines as the google file system manages blocks; I have some  
ideas how this can be done simply.  Property and node states could be  
duplicated for resiliance in the face of disk/server failures.   
Another nice thing is that I could map the JCR path to a URL to the  
actual file server and have the file server serve up the file w/out  
having to go through the JCR metadata layer.

Another way might be to graft Hadoop as a FileSystem.

  I realize that this may only make sense in the limited application  
of storing photos that are not modified.  Thoughts?


Regards,
Alan


Re: A JCR with infinite capacity

Posted by "Alan D. Cabrera" <li...@toolazydogs.com>.
On Aug 23, 2007, at 5:49 PM, Padraic I. Hannon wrote:

> However, you will probably run into search and retrieval issues? I  
> am unsure as to the guts of the lucene part of the application, but  
> there may be problems there with so many files?

Yeah, I was thinking that a ton of relatively little, on the scale  
that Hadoop handles, might not be a good application of Hadoop.  But  
my reasoning could be tainted because it would be really fun writing  
an infinite redundant store under Jackrabbit.


Regards,
Alan


Re: A JCR with infinite capacity

Posted by Bertrand Delacretaz <bd...@apache.org>.
On 8/24/07, Padraic I. Hannon <pi...@wasabicowboy.com> wrote:
> ...I am
> unsure as to the guts of the lucene part of the application, but there
> may be problems there with so many files?...

Indexing the "usual" metadata of millions of images in a Lucene index
shouldn't be a problem. Googling "Lucene millions" shows lots of
working examples.

-Bertrand

Re: A JCR with infinite capacity

Posted by "Padraic I. Hannon" <pi...@wasabicowboy.com>.
However, you will probably run into search and retrieval issues? I am 
unsure as to the guts of the lucene part of the application, but there 
may be problems there with so many files?

-paddy

Re: A JCR with infinite capacity

Posted by "Padraic I. Hannon" <pi...@wasabicowboy.com>.
I started to write a Hadoop DFS based persistence manager. I can create 
a Jira ticket and upload if you would like. I am unsure if it works as I 
have been distracted by other things over the last week or so.

-paddy

Alan D. Cabrera wrote:
> I need a JCR that has infinite capacity.  This is for a photo 
> repository that would store millions of 5-10MB photos.  The photos 
> themselves would not change.  I need to be able to add
>
> One idea that I've been toying with is to create a persistence manager 
> that manages multiple stores.  Some of these stores could even be on 
> other physical servers, maybe even be  a farm of servers.  I was 
> thinking of managing the node and property states along the same lines 
> as the google file system manages blocks; I have some ideas how this 
> can be done simply.  Property and node states could be duplicated for 
> resiliance in the face of disk/server failures.  Another nice thing is 
> that I could map the JCR path to a URL to the actual file server and 
> have the file server serve up the file w/out having to go through the 
> JCR metadata layer.
>
> Another way might be to graft Hadoop as a FileSystem.
>
>  I realize that this may only make sense in the limited application of 
> storing photos that are not modified.  Thoughts?
>
>
> Regards,
> Alan