You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Laurent Belmonte <la...@gmail.com> on 2006/01/18 12:29:05 UTC

Performance bottleneck

Hi all,

I am doing some performance test on jackrabbit. One issue showed by my
tests is a big loss of performance when creating many childs on a
single node. The problem come from the way persistancemanagers store
list of childs for a node. Is somebody working on this issue ?

Re: Performance bottleneck

Posted by Brian Moseley <bc...@osafoundation.org>.

On 1/18/06, Stefan Guggisberg <st...@gmail.com> wrote:

> probably ;) it's not really a matter of level of interest but rather
> of priority.
> i planned to have a look at the issue and try to verify myself laurent's claim
> of bad performance but right now it's not on the top of my todo list...

got it. off to find that cup of coffee now... :)

Re: Performance bottleneck

Posted by Stefan Guggisberg <st...@gmail.com>.

On 1/18/06, Brian Moseley <bc...@maz.org> wrote:
> On 1/18/06, Stefan Guggisberg <st...@gmail.com> wrote:
>
> > there's no general rule of thumb but since jcr's model is inherently
> > hierarchical
> > personally i would try to make use of this feature when designing a
> > content model.
>
> well, yes, that is self evident. but high scale usage suggests that
> even individual levels of the hierarchy can become very large.
>
> you don't seem to be particularly interested in addressing the issue
> of performance degradation with large numbers of child nodes, so i
> guess this thread will serve as documentation that we have to address
> performance issues in our content models by introducing
> implementation-specific workarounds. or am i misrepresenting your
> level of interest?

probably ;) it's not really a matter of level of interest but rather
of priority.
i planned to have a look at the issue and try to verify myself laurent's claim
of bad performance but right now it's not on the top of my todo list...

>
> i for one certainly expect to support 10,000 users in cosmo. at the
> moment all users are stored as child nodes of the workspace root node.
> i suppose i will introduce some intermediary layers of nodes between
> the root and user nodes to cut down the number of children of any
> particular node. i'm not particularly happy with that solution,
> because it has a ripple effect on every part of my application, but if
> that's what jackrabbit forces me to do to scale, i will.
>

Re: Performance bottleneck

Posted by Stefan Guggisberg <st...@gmail.com>.

On 1/20/06, Stefan Guggisberg <st...@gmail.com> wrote:
> On 1/18/06, Jukka Zitting <ju...@gmail.com> wrote:
> > Hi,
> >
> > You may want to file a Jira issue about the performance with large
> > numbers of child nodes. I think it is a valid concern that should be
> > addressed somehow. If nothing else, the performance constraint should
> > be mentioned in the Jackrabbit documentation.
>
> i posted a jira issue: https://issues.apache.org/jira/browse/JCR-307
> it's not a PersistenceManager issue but a problem of the
> NodeState#ChildNodeEntries
> implementation which is optimized for same name sibling processing
> but doesn't seem to scale well with a large number of (uniquely named) entries.
> i'll look into it asap.

fyi: i committed several NodeState optimizations which resolve this issue
(svn r. r37592).

cheers
stefan

>
> cheers
> stefan
>
> >
> > BR,
> >
> > Jukka Zitting
> >
> > --
> > Yukatan - http://yukatan.fi/ - info@yukatan.fi
> > Software craftmanship, JCR consulting, and Java development
> >
>

Re: Performance bottleneck

Posted by Stefan Guggisberg <st...@gmail.com>.

On 1/18/06, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> You may want to file a Jira issue about the performance with large
> numbers of child nodes. I think it is a valid concern that should be
> addressed somehow. If nothing else, the performance constraint should
> be mentioned in the Jackrabbit documentation.

i posted a jira issue: https://issues.apache.org/jira/browse/JCR-307
it's not a PersistenceManager issue but a problem of the
NodeState#ChildNodeEntries
implementation which is optimized for same name sibling processing
but doesn't seem to scale well with a large number of (uniquely named) entries.
i'll look into it asap.

cheers
stefan

>
> BR,
>
> Jukka Zitting
>
> --
> Yukatan - http://yukatan.fi/ - info@yukatan.fi
> Software craftmanship, JCR consulting, and Java development
>

Re: Performance bottleneck

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

You may want to file a Jira issue about the performance with large
numbers of child nodes. I think it is a valid concern that should be
addressed somehow. If nothing else, the performance constraint should
be mentioned in the Jackrabbit documentation.

BR,

Jukka Zitting

--
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftmanship, JCR consulting, and Java development

Re: Performance bottleneck

Posted by Brian Moseley <bc...@maz.org>.

On 1/18/06, Edgar Poce <ed...@gmail.com> wrote:

> I posted a few comments regarding performance last month. see
> http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/3994. See
> the comments about using the root node and about handling many
> sessions concurrently.
> I didn't add it to the wiki, as someone proposed, because IMHO those
> issues belong to jira, but I'm not sure if I'm missing something
> important :(.

thanks for the pointer. that's a great summary of performance issues
to investigate. i suggest adding it to the wiki for the time being,
since it can always be removed later if/when any of these issues are
addressed in jira.

Re: Performance bottleneck

Posted by Alexandru Popescu <th...@gmail.com>.

#: Edgar Poce changed the world a bit at a time by saying (astral date: 1/18/2006 7:35 PM) :#
> Hi brian
> 
> On 1/18/06, Brian Moseley <bc...@maz.org> wrote:
>> i for one certainly expect to support 10,000 users in cosmo. at the
>> moment all users are stored as child nodes of the workspace root node.
>> i suppose i will introduce some intermediary layers of nodes between
>> the root and user nodes to cut down the number of children of any
>> particular node. i'm not particularly happy with that solution,
>> because it has a ripple effect on every part of my application, but if
>> that's what jackrabbit forces me to do to scale, i will.
>>
> 
> I posted a few comments regarding performance last month. see
> http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/3994. See
> the comments about using the root node and about handling many
> sessions concurrently.
> I didn't add it to the wiki, as someone proposed, because IMHO those
> issues belong to jira, but I'm not sure if I'm missing something
> important :(.
> 
> br,
> edgar
> 

Excellent entry Edgar. Something that everybody should have an eye on. I would recommend adding this 
entry somewhere very visible for all users.

./alex
--
.w( the_mindstorm )p.

Re: Performance bottleneck

Posted by Edgar Poce <ed...@gmail.com>.

Hi brian

On 1/18/06, Brian Moseley <bc...@maz.org> wrote:
> i for one certainly expect to support 10,000 users in cosmo. at the
> moment all users are stored as child nodes of the workspace root node.
> i suppose i will introduce some intermediary layers of nodes between
> the root and user nodes to cut down the number of children of any
> particular node. i'm not particularly happy with that solution,
> because it has a ripple effect on every part of my application, but if
> that's what jackrabbit forces me to do to scale, i will.
>

I posted a few comments regarding performance last month. see
http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/3994. See
the comments about using the root node and about handling many
sessions concurrently.
I didn't add it to the wiki, as someone proposed, because IMHO those
issues belong to jira, but I'm not sure if I'm missing something
important :(.

br,
edgar

Re: Performance bottleneck

Posted by Brian Moseley <bc...@maz.org>.

On 1/18/06, Stefan Guggisberg <st...@gmail.com> wrote:

> there's no general rule of thumb but since jcr's model is inherently
> hierarchical
> personally i would try to make use of this feature when designing a
> content model.

well, yes, that is self evident. but high scale usage suggests that
even individual levels of the hierarchy can become very large.

you don't seem to be particularly interested in addressing the issue
of performance degradation with large numbers of child nodes, so i
guess this thread will serve as documentation that we have to address
performance issues in our content models by introducing
implementation-specific workarounds. or am i misrepresenting your
level of interest?

i for one certainly expect to support 10,000 users in cosmo. at the
moment all users are stored as child nodes of the workspace root node.
i suppose i will introduce some intermediary layers of nodes between
the root and user nodes to cut down the number of children of any
particular node. i'm not particularly happy with that solution,
because it has a ripple effect on every part of my application, but if
that's what jackrabbit forces me to do to scale, i will.

Re: Performance bottleneck

Posted by Stefan Guggisberg <st...@gmail.com>.

On 1/18/06, Brian Moseley <bc...@osafoundation.org> wrote:
> On 1/18/06, Stefan Guggisberg <st...@gmail.com> wrote:
>
> > btw: i would argue that a content model with 30'000 child nodes could probably
> > be optimized.
>
> ou seem to be implying that it is incorrect for a node to have a large
> amount of children. why is that incorrect? is there a rule of thumb on
> how many children a node can support in a well design content model?
>

there's no general rule of thumb but since jcr's model is inherently
hierarchical
personally i would try to make use of this feature when designing a
content model.

Re: Performance bottleneck

Posted by Brian Moseley <bc...@osafoundation.org>.

On 1/18/06, Stefan Guggisberg <st...@gmail.com> wrote:

> btw: i would argue that a content model with 30'000 child nodes could probably
> be optimized.

ou seem to be implying that it is incorrect for a node to have a large
amount of children. why is that incorrect? is there a rule of thumb on
how many children a node can support in a well design content model?

Re: Performance bottleneck

Posted by Laurent Belmonte <la...@gmail.com>.

On 1/18/06, Stefan Guggisberg <st...@gmail.com> wrote:
> hi laurent,
>

>
> this is not a PersistenceManager issue. are you creating same-name sibling
> or uniquely named child nodes? what svn rev. are you using?
I am using last svn rev  370133.

My test is very simple :
for(i=0; i < 20000; i++ ) {
    rootNode.addNode("node-test"+i,"nt:unstructured");

   .. setting 3 properties..

   session.save(); called every 1000 nodes created
}

When a child is created, pm store(NodeState) is called. Each time a
node is stored in DerbyStore, the enterily list of his childs is
re-written and more there is children more this operation take time (
the same fact is true for xmlpm and objectpm).

>
> btw: i would argue that a content model with 30'000 child nodes could probably
> be optimized.

maybe, maybe not. Image an application which manage contacts. Maybe a
user of this application wouild want to import all his adressbook in
the application resulting in a node containing 10000 nodes ( yeah he
know lots of people ;) ). This operation can take time because 10000
contacts is a huge amount of data. But after when he will want to add
only one contact, the time of creation will be not acceptable for him.

Best regards,
Laurent
>
> cheers
> stefan
>

Re: Performance bottleneck

Posted by Stefan Guggisberg <st...@gmail.com>.

hi laurent,

On 1/18/06, Laurent Belmonte <la...@gmail.com> wrote:
> On 1/18/06, Martin Perez <mp...@gmail.com> wrote:
> > What persistence manager are you using?
> >
> > It has been commented that XMLPM or ObjectPM are not very good performance
> > persistence managers...
>
> I have done a test with DerbyPM. The results :
> creating 999 nodes on a node already containing 0 childs         take 5402 ms
> creating   1 node  on a node already containing 999 childs       take 126 ms
> creating 999 nodes on a node already containing 1000 childs      take 4292 ms
> creating   1 node  on a node already containing 1999 childs      take 137 ms
> creating 999 nodes on a node already containing 2000 childs      take 3796 ms
> creating   1 node  on a node already containing 2999 childs      take 254 ms
> creating 999 nodes on a node already containing 3000 childs      take 4702 ms
> creating   1 node  on a node already containing 3999 childs      take 370 ms
> .
> .
> .
> creating 999 nodes on a node already containing 20000 childs     take 32841 ms
> creating   1 node  on a node already containing 20999 childs     take 9473 ms
> .
> .
> .
> creating 999 nodes on a node already containing 28000 childs     take 41058 ms
> creating   1 node  on a node already containing 28999 childs     take 19228 ms
>
> as you can see. More a node have childs, more time is taken for create
> one child.
>

this is not a PersistenceManager issue. are you creating same-name sibling
or uniquely named child nodes? what svn rev. are you using?

btw: i would argue that a content model with 30'000 child nodes could probably
be optimized.

cheers
stefan

Re: Performance bottleneck

Posted by Laurent Belmonte <la...@gmail.com>.

On 1/18/06, Martin Perez <mp...@gmail.com> wrote:
> What persistence manager are you using?
>
> It has been commented that XMLPM or ObjectPM are not very good performance
> persistence managers...

I have done a test with DerbyPM. The results :
creating 999 nodes on a node already containing 0 childs 	 take 5402 ms
creating   1 node  on a node already containing 999 childs 	 take 126 ms
creating 999 nodes on a node already containing 1000 childs 	 take 4292 ms
creating   1 node  on a node already containing 1999 childs 	 take 137 ms
creating 999 nodes on a node already containing 2000 childs 	 take 3796 ms
creating   1 node  on a node already containing 2999 childs 	 take 254 ms
creating 999 nodes on a node already containing 3000 childs 	 take 4702 ms
creating   1 node  on a node already containing 3999 childs 	 take 370 ms
.
.
.
creating 999 nodes on a node already containing 20000 childs 	 take 32841 ms
creating   1 node  on a node already containing 20999 childs 	 take 9473 ms
.
.
.
creating 999 nodes on a node already containing 28000 childs 	 take 41058 ms
creating   1 node  on a node already containing 28999 childs 	 take 19228 ms

as you can see. More a node have childs, more time is taken for create
one child.

Re: Performance bottleneck

Posted by Martin Perez <mp...@gmail.com>.

What persistence manager are you using?

It has been commented that XMLPM or ObjectPM are not very good performance
persistence managers...

Martin

On 1/18/06, Laurent Belmonte <la...@gmail.com> wrote:
>
> Hi all,
>
> I am doing some performance test on jackrabbit. One issue showed by my
> tests is a big loss of performance when creating many childs on a
> single node. The problem come from the way persistancemanagers store
> list of childs for a node. Is somebody working on this issue ?
>