You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/09/09 17:23:32 UTC

[jira] Created: (JCR-2744) Avoid element arrays in PathImpl

Avoid element arrays in PathImpl
--------------------------------

                 Key: JCR-2744
                 URL: https://issues.apache.org/jira/browse/JCR-2744
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-spi, jackrabbit-spi-commons
            Reporter: Jukka Zitting


The path handling code in spi-commons shows quite often in thread dumps and profiling results, as the current implementation does quite a bit of repetitive allocating and copying of path element arrays. We should be able to streamline and simplify the path handling code by only tracking the latest path element and a reference to the parent path. To do this efficiently we may need to adjust some of the Path and PathFactory method declarations (that currently assume element array -based paths) also in the SPI.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2744) Avoid element arrays in PathImpl

Posted by "Michael Dürig (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908783#action_12908783 ] 

Michael Dürig commented on JCR-2744:
------------------------------------

This looks very good to me. Do you have some performance figures already?

However I think that RelativePath#getElements() need to improve. The current implementations recursively calls getElements() on the parent and adds its own element to the returned array. To do so the returned array needs to be copied. Copying the array once for each recursive invocation effectively results in O(n^2) run time characteristics. I suggest to change the recursive implementation into an iterative one which collects the name elements of all ancestors of the current path into a pre allocated array. 

> Avoid element arrays in PathImpl
> --------------------------------
>
>                 Key: JCR-2744
>                 URL: https://issues.apache.org/jira/browse/JCR-2744
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-spi, jackrabbit-spi-commons
>            Reporter: Jukka Zitting
>
> The path handling code in spi-commons shows quite often in thread dumps and profiling results, as the current implementation does quite a bit of repetitive allocating and copying of path element arrays. We should be able to streamline and simplify the path handling code by only tracking the latest path element and a reference to the parent path. To do this efficiently we may need to adjust some of the Path and PathFactory method declarations (that currently assume element array -based paths) also in the SPI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2744) Avoid element arrays in PathImpl

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-2744:
-------------------------------

    Attachment: TransientManyChildNodesTest.png

The attached graph shows already some performance improvement that I believe is mostly due to these changes. I see some gains also in other benchmarks, but it's hard to say whether it's this or some of the other recent improvements at work.

Re: getElements(); I'm actually hoping to get rid of this method entirely, which is why I so far haven't put too much effort into optimizing it.

Going even further, I believe we could (and should) merge the Path and Path.Element interfaces to further drop the number of objects we need to instantiate and track.


> Avoid element arrays in PathImpl
> --------------------------------
>
>                 Key: JCR-2744
>                 URL: https://issues.apache.org/jira/browse/JCR-2744
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-spi, jackrabbit-spi-commons
>            Reporter: Jukka Zitting
>         Attachments: TransientManyChildNodesTest.png
>
>
> The path handling code in spi-commons shows quite often in thread dumps and profiling results, as the current implementation does quite a bit of repetitive allocating and copying of path element arrays. We should be able to streamline and simplify the path handling code by only tracking the latest path element and a reference to the parent path. To do this efficiently we may need to adjust some of the Path and PathFactory method declarations (that currently assume element array -based paths) also in the SPI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2744) Avoid element arrays in PathImpl

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-2744:
-------------------------------

    Attachment: TransientManyChildNodesTest-2.png

I've now gotten rid of the separate Element objects. The AbstractPath class now implements both the Path and Element interfaces (the latter properly only when dealing with a single-element path).

I also improved the getElements() method to avoid the O(n^2) behaviour.

The attached plot shows the current performance improvements (about 20% on large transient operations) that are mostly due to these changes. There's still room for additional improvements, but I believe the biggest wins have already been achieved here.

> Avoid element arrays in PathImpl
> --------------------------------
>
>                 Key: JCR-2744
>                 URL: https://issues.apache.org/jira/browse/JCR-2744
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-spi, jackrabbit-spi-commons
>            Reporter: Jukka Zitting
>         Attachments: TransientManyChildNodesTest-2.png, TransientManyChildNodesTest.png
>
>
> The path handling code in spi-commons shows quite often in thread dumps and profiling results, as the current implementation does quite a bit of repetitive allocating and copying of path element arrays. We should be able to streamline and simplify the path handling code by only tracking the latest path element and a reference to the parent path. To do this efficiently we may need to adjust some of the Path and PathFactory method declarations (that currently assume element array -based paths) also in the SPI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2744) Avoid element arrays in PathImpl

Posted by "Michael Dürig (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908816#action_12908816 ] 

Michael Dürig commented on JCR-2744:
------------------------------------

I'm in favor of further optimizing Path.Element or even trying to get rid of it when it turns out to be feasible. Since getElements() is called from quite many places, we might gain some quick insight into the impact of further optimization from changing the method to an iterative implementation and running the performance test suite. 

Also - if it turn out to be beneficial performance wise - I'd prefer to have an iterative implementation of getElements() in 2.2 if further optimizations won't make it into the release.

> Avoid element arrays in PathImpl
> --------------------------------
>
>                 Key: JCR-2744
>                 URL: https://issues.apache.org/jira/browse/JCR-2744
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-spi, jackrabbit-spi-commons
>            Reporter: Jukka Zitting
>         Attachments: TransientManyChildNodesTest.png
>
>
> The path handling code in spi-commons shows quite often in thread dumps and profiling results, as the current implementation does quite a bit of repetitive allocating and copying of path element arrays. We should be able to streamline and simplify the path handling code by only tracking the latest path element and a reference to the parent path. To do this efficiently we may need to adjust some of the Path and PathFactory method declarations (that currently assume element array -based paths) also in the SPI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.