You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by "martin.gasbichler@bluewin.ch" <ma...@bluewin.ch> on 2011/10/28 10:22:47 UTC

Jackrabbit performance experiences

Hi,

our application needs to archive documents along with some metadata and I'm considering to use Jackrabbit in 
combination with a database and maybe a filesystem storage for this task.
However, I did not find any benchmarks that 
show whether Jackrabbit is capable of hosting a large number of documents (say 20-30 millions, 100kb each, 50 
concurrent users).  I'm well aware that the requirement is quite fuzzy so I'm more looking for answers of the kind 
"this may work, but...", or "no chance because...", or "we use Jackrabbit to store X number of documents in the 
following setting...". I'd also like to learn about any benchmarks, because I was not able to find something.

Thanks,


Martin

Re: Jackrabbit performance experiences

Posted by Lukas Kahwe Smith <ml...@pooteeweet.org>.
On Oct 28, 2011, at 10:22 , martin.gasbichler@bluewin.ch wrote:

> Hi,
> 
> our application needs to archive documents along with some metadata and I'm considering to use Jackrabbit in 
> combination with a database and maybe a filesystem storage for this task.
> However, I did not find any benchmarks that 
> show whether Jackrabbit is capable of hosting a large number of documents (say 20-30 millions, 100kb each, 50 
> concurrent users).  I'm well aware that the requirement is quite fuzzy so I'm more looking for answers of the kind 
> "this may work, but...", or "no chance because...", or "we use Jackrabbit to store X number of documents in the 
> following setting...". I'd also like to learn about any benchmarks, because I was not able to find something.


in our benchmarks with 100GB of data (not sure how many documents) the only scaling issue we noticed was with range queries on dates when doing SQL2 queries. not sure why but jackrabbit doesnt seem to use lucene there. other than that node traversal and sql2 queries seemed to scale quite nicely and with multiple cluster nodes concurrency didnt seem to cause issues either.

regards,
Lukas Kahwe Smith
mls@pooteeweet.org