You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2009/07/16 16:02:14 UTC

[jira] Created: (JCR-2218) NodeEntryImpl.getWorkspaceId() very inefficient

NodeEntryImpl.getWorkspaceId() very inefficient 
------------------------------------------------

                 Key: JCR-2218
                 URL: https://issues.apache.org/jira/browse/JCR-2218
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-jcr2spi
            Reporter: Michael Dürig


NodeEntryImpl.getWorkspaceId() calculates its path on each call by calling itself recursively. Further each call to getWorkspaceId() results in various calls to the path and item factories which might be somewhat expensive by themselves. 

In my test scenario I have a RepositoryService.getItemInfos() call returning ~1000 items. Processing these items results in about 2700000 (!) calls to getWorkspaceId(). Profiler data shows, that 98% of the time to process the 1000 items is spent in getWorkspaceId()  and related calls. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2218) NodeEntryImpl.getWorkspaceId() very inefficient

Posted by "Michael Dürig (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Dürig updated JCR-2218:
-------------------------------

    Attachment: JCR-2218.patch

Proposed patch implementing the approach from my last comment. While this shouldn't have any negative performance impact on flat hierarchies, it shows an overall (*) performance gain up to 20% on deep hierarchies. 

(*) Session.getItem() on a deep hierarchy. 

> NodeEntryImpl.getWorkspaceId() very inefficient 
> ------------------------------------------------
>
>                 Key: JCR-2218
>                 URL: https://issues.apache.org/jira/browse/JCR-2218
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-jcr2spi
>            Reporter: Michael Dürig
>         Attachments: JCR-2218.patch
>
>
> NodeEntryImpl.getWorkspaceId() calculates its path on each call by calling itself recursively. Further each call to getWorkspaceId() results in various calls to the path and item factories which might be somewhat expensive by themselves. 
> In my test scenario I have a RepositoryService.getItemInfos() call returning ~1000 items. Processing these items results in about 2700000 (!) calls to getWorkspaceId(). Profiler data shows, that 98% of the time to process the 1000 items is spent in getWorkspaceId()  and related calls. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2218) NodeEntryImpl.getWorkspaceId() very inefficient

Posted by "Michael Dürig (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739985#action_12739985 ] 

Michael Dürig commented on JCR-2218:
------------------------------------

Another alternative approach (due to Angela): Instead of constructing the ids from the parent  id and the name (which is quite expensive), traverse up to either root or a uuid and collect the respective path elements. The use the root or the uuid respectively and the collected path elements to construct the id.

> NodeEntryImpl.getWorkspaceId() very inefficient 
> ------------------------------------------------
>
>                 Key: JCR-2218
>                 URL: https://issues.apache.org/jira/browse/JCR-2218
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-jcr2spi
>            Reporter: Michael Dürig
>
> NodeEntryImpl.getWorkspaceId() calculates its path on each call by calling itself recursively. Further each call to getWorkspaceId() results in various calls to the path and item factories which might be somewhat expensive by themselves. 
> In my test scenario I have a RepositoryService.getItemInfos() call returning ~1000 items. Processing these items results in about 2700000 (!) calls to getWorkspaceId(). Profiler data shows, that 98% of the time to process the 1000 items is spent in getWorkspaceId()  and related calls. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (JCR-2218) NodeEntryImpl.getWorkspaceId() very inefficient

Posted by "Michael Dürig (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Dürig resolved JCR-2218.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 2.0.0

Applied path in revision 803164

> NodeEntryImpl.getWorkspaceId() very inefficient 
> ------------------------------------------------
>
>                 Key: JCR-2218
>                 URL: https://issues.apache.org/jira/browse/JCR-2218
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-jcr2spi
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>             Fix For: 2.0.0
>
>         Attachments: JCR-2218.patch
>
>
> NodeEntryImpl.getWorkspaceId() calculates its path on each call by calling itself recursively. Further each call to getWorkspaceId() results in various calls to the path and item factories which might be somewhat expensive by themselves. 
> In my test scenario I have a RepositoryService.getItemInfos() call returning ~1000 items. Processing these items results in about 2700000 (!) calls to getWorkspaceId(). Profiler data shows, that 98% of the time to process the 1000 items is spent in getWorkspaceId()  and related calls. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2218) NodeEntryImpl.getWorkspaceId() very inefficient

Posted by "Michael Dürig (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735039#action_12735039 ] 

Michael Dürig commented on JCR-2218:
------------------------------------

I just checked the effect of an alternative approach: call site caching of IdFactory calls. 

public NodeId getId() throws InvalidItemStateException, RepositoryException {
    IdFactory idFactory = getIdFactory();
    PathFactory pathFactory = getPathFactory();
    IdCache idCache = getIdCache();

    if (uniqueID != null) {
        NodeId nodeId = idCache.get(uniqueID);
        if (nodeId == null) {
            nodeId = idFactory.createNodeId(uniqueID);
            idCache.put(uniqueID, nodeId);
        }
        return nodeId;
    }
    else if (parent == null) { // root
        NodeId nodeId = idCache.get("ROOT");  
        if (nodeId == null) {
            nodeId = idFactory.createNodeId((String) null, pathFactory.getRootPath());
            idCache.put("ROOT", nodeId);  
        }
        return nodeId;
    }
    else {
        NodeId parentId = parent.getId();
        Name name = getName();
        int index = getIndex();
        NodeId nodeId = idCache.get(parentId, name, index);
        if (nodeId == null) {
            Path path = pathFactory.create(name, index);
            nodeId = idFactory.createNodeId(parentId, path);
            idCache.put(parentId, name, index, nodeId);
        }
        return nodeId;
    }
}

My profiling shows, that there is nothing much to be gained from this. This is in line with an earlier observation, that looking up ItemIds from a hash map comes with about the same cost as creating new itemIds. The main contribution coming from the equals and hashCode methods from the various classes involved when comparing ItemIds. 




> NodeEntryImpl.getWorkspaceId() very inefficient 
> ------------------------------------------------
>
>                 Key: JCR-2218
>                 URL: https://issues.apache.org/jira/browse/JCR-2218
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-jcr2spi
>            Reporter: Michael Dürig
>
> NodeEntryImpl.getWorkspaceId() calculates its path on each call by calling itself recursively. Further each call to getWorkspaceId() results in various calls to the path and item factories which might be somewhat expensive by themselves. 
> In my test scenario I have a RepositoryService.getItemInfos() call returning ~1000 items. Processing these items results in about 2700000 (!) calls to getWorkspaceId(). Profiler data shows, that 98% of the time to process the 1000 items is spent in getWorkspaceId()  and related calls. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (JCR-2218) NodeEntryImpl.getWorkspaceId() very inefficient

Posted by "Michael Dürig (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Dürig reassigned JCR-2218:
----------------------------------

    Assignee: Michael Dürig

> NodeEntryImpl.getWorkspaceId() very inefficient 
> ------------------------------------------------
>
>                 Key: JCR-2218
>                 URL: https://issues.apache.org/jira/browse/JCR-2218
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-jcr2spi
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>         Attachments: JCR-2218.patch
>
>
> NodeEntryImpl.getWorkspaceId() calculates its path on each call by calling itself recursively. Further each call to getWorkspaceId() results in various calls to the path and item factories which might be somewhat expensive by themselves. 
> In my test scenario I have a RepositoryService.getItemInfos() call returning ~1000 items. Processing these items results in about 2700000 (!) calls to getWorkspaceId(). Profiler data shows, that 98% of the time to process the 1000 items is spent in getWorkspaceId()  and related calls. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2218) NodeEntryImpl.getWorkspaceId() very inefficient

Posted by "Michael Dürig (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731953#action_12731953 ] 

Michael Dürig commented on JCR-2218:
------------------------------------

To fix this I propose to cache the NodeId and/or path per NodeEntry. On certain operations (like move, further operation to be identified) the cache needs to be invalidated. To avoid having to invalidate the cache of each entry in the sub tree rooted at a specific item, I propose that cache validity checks are deferred as much as possible (i.e. until getWorkspaceId() is called). The cache for an entry is valid, if neither of its parents nor itself are marked as invalid . If the cache for an entry is determined to be invalid, its path is recalculated thereby clearing any invalid cache marker on the path to the root. Note that when a marker of an entry is cleared, all child entries of that entry need to be marked (with the exception of the child entry which path is being recalculated). 

> NodeEntryImpl.getWorkspaceId() very inefficient 
> ------------------------------------------------
>
>                 Key: JCR-2218
>                 URL: https://issues.apache.org/jira/browse/JCR-2218
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-jcr2spi
>            Reporter: Michael Dürig
>
> NodeEntryImpl.getWorkspaceId() calculates its path on each call by calling itself recursively. Further each call to getWorkspaceId() results in various calls to the path and item factories which might be somewhat expensive by themselves. 
> In my test scenario I have a RepositoryService.getItemInfos() call returning ~1000 items. Processing these items results in about 2700000 (!) calls to getWorkspaceId(). Profiler data shows, that 98% of the time to process the 1000 items is spent in getWorkspaceId()  and related calls. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.