You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Thomas Mueller (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/03/19 13:37:38 UTC

[jira] [Issue Comment Edited] (JCR-3263) Consistency checker performance improvements

    [ https://issues.apache.org/jira/browse/JCR-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232582#comment-13232582 ] 

Thomas Mueller edited comment on JCR-3263 at 3/19/12 12:36 PM:
---------------------------------------------------------------

> No, that isn't the problem.

Did you run a benchmark using sequential node ids (patch for JCR-2857)?

> batches of node prop bundles at once

It sounds like in this case you loaded the node bundles sorted by node id, that means the randomly distributed nodes don't hurt performance here.

To reduce the number of network round trips (if this is really the problem), another solution would be a new method 

    Map<NodeId, NodeInfo> getNodeInfos(List<NodeId> ids).

This method could be used to load any number of nodes, and it could be used to traverse a part of the repository (where getAllNodeInfos could only be used to read the repository completely).
                
      was (Author: tmueller):
    > No, that isn't the problem.

Did you run a benchmark using sequential node ids (patch for JCR-2857)?

> batches of node prop bundles at once

It sounds like in this case you loaded node node bundles sorted by node id, that means the randomly distributed nodes don't hurt performance here.

To reduce the number of network round trips (if this is really the problem), another solution would be a new method 

    Map<NodeId, NodeInfo> getNodeInfos(List<NodeId> ids).

This method could be used to load any number of nodes, and it could be used to traverse a part of the repository (where getAllNodeInfos could only be used to read the repository completely).
                  
> Consistency checker performance improvements
> --------------------------------------------
>
>                 Key: JCR-3263
>                 URL: https://issues.apache.org/jira/browse/JCR-3263
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>            Reporter: Unico Hommes
>
> Currently the consistency checker loads in a batch of node ids and for each node id fetches the corresponding bundle, its child bundles, and parent bundle separately. This makes the consistency checker perform less than optimal and may take hours (days?) to complete for large repositories.
> I've been able to make the checker execute about 20 times faster on my local machine by loading in batches of node prop bundles at once. For 17000 nodes in the workspace the current implementation ran for about 23 seconds whereas with the enhancements I made it finished in 1.2 seconds.
> Now the problem lies in the fact that loading in node prop bundles in batches may require a lot of memory. And it is not very predictable how much per batch size because the sizes of the individual bundles are unpredictable.
> Also the node prop bundle contains much more information than is needed for a consistency check.
> What would be ideal in this situation is to introduce a new type - call it NodeInfo - that contains only the structural information the checker needs to do its work. Meaning the node id, the parent id and the child ids. In order to allow for a possible future referential integrity check perhaps also its reference type propeties.
> The IterablePersistenceManager interface would then get an additional method:
> Map<NodeId, NodeInfo> getAllNodeInfos();
> If this is an acceptable proposal I would like to work on this and contribute a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira