You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by spam no spam <tr...@att.net> on 2009/08/11 18:50:41 UTC

will jackrabbit scale to petabyte repository?




We are
considering using jackrabbit as our jcr for a rewrite of our current app. We
currently have about 1 PB of content and metadata that we would like to store
in a single workspace. Will jackrabbit scale to this size? Has anyone created a
repository of this size with jackrabbit? Should we limit the size of the
workspaces? 

 

We are
also considering using the ‘Amazon
S3 Persistence Manager Project’ found in the sandbox, has anyone used it in a
production environment? Any recommendations on whether or not it’s fit for
usage would be greatly appreciated.
Thanks.

Re: will jackrabbit scale to petabyte repository?

Posted by Thomas Müller <th...@day.com>.

Hi,

> 100’s of millions of nodes

Maybe you shouldn't use just one repository. Unless you have a really
fast computer and storage system (disk / database). Just filling the
repository with that many nodes (on one machine) can take days. The
same for backups, and things like data store garbage collection. You
should consider splitting the data.

With Jackrabbit, you could use multiple repositories (probably on
different machines). Other companies that deal with petabytes of data
(like Google, Yahoo, Facebook) don't just use 'one big database'.

>>How many nodes do you plan for?
> Just curious, is there any guideline on the # of nodes one jackrabbit can support with acceptable performance ?

No. I was just curious. Update performance doesn't change a lot for
larger repositories.

Regards,
Thomas

Re: will jackrabbit scale to petabyte repository?

Posted by go canal <go...@yahoo.com>.

>>How many nodes do you plan for?

Just curious, is there any guideline on the # of nodes one jackrabbit can support with acceptable performance ?

Each file will have at least one nt:file node, plus some folder nodes; so can I translate the question to # of files for a very rough estimate ?

Another thought, is there a Jackrabbit + Hadoop configuration (using Hadoop as the DataStore ?) to address scalability and even performance (?) ....

 rgds,
canal




________________________________
From: Thomas Müller <th...@day.com>
To: users@jackrabbit.apache.org
Sent: Wednesday, August 12, 2009 2:18:18 PM
Subject: Re: will jackrabbit scale to petabyte repository?

Hi,

> considering using jackrabbit as our jcr for a rewrite of our current app. We
> currently have about 1 PB of content and metadata that we would like to store
> in a single workspace. Will jackrabbit scale to this size? Has anyone created a
> repository of this size with jackrabbit? Should we limit the size of the
> workspaces?

How many nodes do you plan for?

If it's mainly binary data (such as files) I suggest to use the data
store. http://wiki.apache.org/jackrabbit/DataStore - then it shouldn't
be a problem.

If there is little binary data, the problem might be backup (it
depends on the persistence manager you use).

> We are
> also considering using the ‘Amazon
> S3 Persistence Manager Project’ found in the sandbox, has anyone used it in a
> production environment?

I didn't use it, but from what I know the performance might be a
problem. You would need to test it yourself.

Regards,
Thomas

Re: will jackrabbit scale to petabyte repository?

Posted by Pete <tr...@att.net>.




>How many nodes do you plan for?

 

100’s of millions of nodes but I am
aware of the 10K child node ‘performance limit’ so we will limit child nodes
some how. As long as we adhere to the 10k child node limit do you think using a
single workspace will work or would it be better to use several workspaces?



>If it's mainly binary data (such as files) I suggest to use the data

>store. http://wiki.apache.org/jackrabbit/DataStore -
then it shouldn't

>be a problem.



>If there is little binary data, the problem might be backup (it

>depends on the persistence manager you use).





It will be mostly binary data
probably 95% of the storage will be the binary files. As far as backup / export
goes do you think the xml export could be done? Can you export only specific
parts of the repository with jackrabbit or is it all or nothing?



Thanks.

Re: will jackrabbit scale to petabyte repository?

Posted by Thomas Müller <th...@day.com>.

Hi,

> considering using jackrabbit as our jcr for a rewrite of our current app. We
> currently have about 1 PB of content and metadata that we would like to store
> in a single workspace. Will jackrabbit scale to this size? Has anyone created a
> repository of this size with jackrabbit? Should we limit the size of the
> workspaces?

How many nodes do you plan for?

If it's mainly binary data (such as files) I suggest to use the data
store. http://wiki.apache.org/jackrabbit/DataStore - then it shouldn't
be a problem.

If there is little binary data, the problem might be backup (it
depends on the persistence manager you use).

> We are
> also considering using the ‘Amazon
> S3 Persistence Manager Project’ found in the sandbox, has anyone used it in a
> production environment?

I didn't use it, but from what I know the performance might be a
problem. You would need to test it yourself.

Regards,
Thomas