You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Eugeny N Dzhurinsky <eu...@jdevelop.com> on 2005/10/12 14:08:00 UTC

Performance

Hi there!
We are developing CMS, and we need to learn a bit more about the performance
of JackRabbit. The thing is I didn't find anything about the performance
neither in wiki, nor in documentation,

To be more specific, we are wandering about the performance JackRabbit 
could have, for example, working with several hundred thousands nodes/properties 
and very deep nodes tree?

-- 
Eugene N Dzhurinsky

Re: Performance

Posted by Lee Mallabone <le...@transactgroup.net>.
This should definitely go in the Jackrabbit wiki imho.

Perhaps as a new node, 'getting the best performance out of Jackrabbit', or 
something similar?

Lee.

On Wednesday 12 October 2005 20:44, Edgar Poce wrote:
> hi
>
>   I haven't used jackrabbit with a huge amount of data yet but I did
> run some tests and have some impressions I'd like to share, hopefully
> someone will correct me if I'm misleading you.
>
> In most cases I agree with David and Marcel, who pointed that
> jackrabbit doesn't seem to affected by the number of nodes. However, I
> think that in order to maintain the performance you should take into
> account some considerations on the usage that might affect it
> seriously.
>
> Regarding the tree structure. Since each parent holds references to
> its children each time you add a child the parent becomes heavier. It
> causes a degradation in performance for write operations according to
> the number of children. I think it's better to use a deep hierarchy
> rather than a flat structure. I would recommend you to do some testing
> to establish the limits that suits your needs.
>
> Regarding the session handling. A transient item storage is bound to
> each session. The transient storage contains its own cache of nodes
> that are connected to the underlying persistent storage. The thing is
> that each time a node is modified, all the cached transient nodes are
> notified. Therefore the more open sessions you have the more expensive
> the write operation will be. I think you should try each session to
> perform write operations on nodes which are not under heavy load from
> other sessions. e.g. I think it's good practice to avoid write
> operations in the root node if the repository is to be accessed by a
> high number of sessions. I also think that it's a good practice to
> share a single anonymous session for read only access if possible, it
> would reduce the time that write actions will take.
>
> Regarding concurrency. Currently jackrabbit lacks fine grained locking
> for write operations. So, if the repository will be under heavy load I
> would consider an approach like the one used in Magnolia, I'm not sure
> if they still use it but the last time I checked they had a repository
> for authoring and another for publishing.
>
> br,
> edgar
>

Re: Performance

Posted by Edgar Poce <ed...@gmail.com>.
hi

  I haven't used jackrabbit with a huge amount of data yet but I did
run some tests and have some impressions I'd like to share, hopefully
someone will correct me if I'm misleading you.

In most cases I agree with David and Marcel, who pointed that
jackrabbit doesn't seem to affected by the number of nodes. However, I
think that in order to maintain the performance you should take into
account some considerations on the usage that might affect it
seriously.

Regarding the tree structure. Since each parent holds references to
its children each time you add a child the parent becomes heavier. It
causes a degradation in performance for write operations according to
the number of children. I think it's better to use a deep hierarchy
rather than a flat structure. I would recommend you to do some testing
to establish the limits that suits your needs.

Regarding the session handling. A transient item storage is bound to
each session. The transient storage contains its own cache of nodes
that are connected to the underlying persistent storage. The thing is
that each time a node is modified, all the cached transient nodes are
notified. Therefore the more open sessions you have the more expensive
the write operation will be. I think you should try each session to
perform write operations on nodes which are not under heavy load from
other sessions. e.g. I think it's good practice to avoid write
operations in the root node if the repository is to be accessed by a
high number of sessions. I also think that it's a good practice to
share a single anonymous session for read only access if possible, it
would reduce the time that write actions will take.

Regarding concurrency. Currently jackrabbit lacks fine grained locking
for write operations. So, if the repository will be under heavy load I
would consider an approach like the one used in Magnolia, I'm not sure
if they still use it but the last time I checked they had a repository
for authoring and another for publishing.

br,
edgar


On 10/12/05, Eugeny N Dzhurinsky <eu...@jdevelop.com> wrote:
> Hi there!
> We are developing CMS, and we need to learn a bit more about the performance
> of JackRabbit. The thing is I didn't find anything about the performance
> neither in wiki, nor in documentation,
>
> To be more specific, we are wandering about the performance JackRabbit
> could have, for example, working with several hundred thousands nodes/properties
> and very deep nodes tree?
>
> --
> Eugene N Dzhurinsky
>

Re: Performance

Posted by Eugeny N Dzhurinsky <eu...@jdevelop.com>.
On Wed, Oct 12, 2005 at 03:05:38PM +0200, David Nuescheler wrote:
> hi eugeny,
> 
> the problem with performance metrics is that there are many aspects
> of performance. i think your description goes more into the direction
> of scalability. we tested several millions of items of real-life data and
> depending on the persistence manager used we have noticed little to
> no performance degradation.
> 
> the good news about a "standards based" approach is that you can
> develop your cms based on jcr (aka jsr-170) without having a direct
> ties to jackrabbit.
> 
> this means that if you have a customer that outgrows the scalability
> of jackrabbit (which should not happen with a couple hundred thousand
> items) that particular customer can just simply drop-in a commercial
> repository instead of jackrabbit, without changing any of your application
> code.
> 
> does that help?

Yes, definitely!

> since the performance question is amongst the most popular,
> maybe we should start publishing performance and scalability metrics
> on the jackrabbit website? anybody?

It will be great, so I vote "yes" :)

-- 
Eugene N Dzhurinsky

Re: Performance

Posted by David Nuescheler <da...@gmail.com>.
hi eugeny,

the problem with performance metrics is that there are many aspects
of performance. i think your description goes more into the direction
of scalability. we tested several millions of items of real-life data and
depending on the persistence manager used we have noticed little to
no performance degradation.

the good news about a "standards based" approach is that you can
develop your cms based on jcr (aka jsr-170) without having a direct
ties to jackrabbit.

this means that if you have a customer that outgrows the scalability
of jackrabbit (which should not happen with a couple hundred thousand
items) that particular customer can just simply drop-in a commercial
repository instead of jackrabbit, without changing any of your application
code.

does that help?

since the performance question is amongst the most popular,
maybe we should start publishing performance and scalability metrics
on the jackrabbit website? anybody?

regards,
david

On 10/12/05, Eugeny N Dzhurinsky <eu...@jdevelop.com> wrote:
> Hi there!
> We are developing CMS, and we need to learn a bit more about the performance
> of JackRabbit. The thing is I didn't find anything about the performance
> neither in wiki, nor in documentation,
>
> To be more specific, we are wandering about the performance JackRabbit
> could have, for example, working with several hundred thousands nodes/properties
> and very deep nodes tree?
>
> --
> Eugene N Dzhurinsky
>


--
----------------------------------------------------------------------
http://jcr.day.com JSR-170 in Action!
---------------------------------------< david.nuescheler@day.com >---

This message is a private communication. If you are not the intended
recipient, please do not read, copy, or use it, and do not disclose it
to others. Please notify the sender of the delivery error by replying
to this message, and then delete it from your system. Thank you.

The sender does not assume any liability for timely, trouble free,
complete, virus free, secure, error free or uninterrupted arrival of
this e-mail. For verification please request a hard copy version.


mailto:david.nuescheler@day.com
http://www.day.com

David Nuescheler
Chief Technology Officer
Day Software AG
Barfuesserplatz 6 / Postfach
4001 Basel
Switzerland

T  41 61 226 98 98
F  41 61 226 98 97