You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Lukas Eder (JIRA)" <ji...@apache.org> on 2013/04/04 11:41:15 UTC
[jira] [Created] (JCR-3557) Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager

Lukas Eder created JCR-3557:
-------------------------------

             Summary: Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager
                 Key: JCR-3557
                 URL: https://issues.apache.org/jira/browse/JCR-3557
             Project: Jackrabbit Content Repository
          Issue Type: Bug
          Components: jackrabbit-core
    Affects Versions: 2.6
            Reporter: Lukas Eder
            Priority: Minor
         Attachments: Yourkit_Screenshot_SQL.png, Yourkit_Screenshot_Stacktrace.png

Deleting a tree of nodes in Jackrabbit results in a prohibitive amount of SQL queries being executed when using the BundleDbPersistenceManager. Please consider the attached Screenshots from Yourkit Profiler, which illustrate how saving the deletion of a single node through the JCR API results in 34 SQL DELETE statements for my test case:

- Yourkit_Screenshot_Stacktrace.png: The stack trace from a single deletion. Please disregard the absolute values of the time column. It is biased by the profiling session.
- Yourkit_Screenshot_SQL.png: An extract of the 34 SQL statements issued by the selected stack subtree

While other operations are probably not very optimal either, this one is a very low hanging fruit, in my opinion. Consider the following piece of code in AbstractBundlePersistenceManager.storeInternal():

{code}
for (ItemState state : changeLog.deletedStates()) {
    if (state.isNode()) {
        NodePropBundle bundle = getBundle((NodeId) state.getId());
        if (bundle == null) {
            throw new NoSuchItemStateException(state.getId().toString());
        }
        deleteBundle(bundle);
        deleted.add(state.getId());
    }
}
{code}

Now, instead of iterating over deletedStates and loading / deleting them one by one, they should be bulk-loaded / bulk-deleted as such:

{code}
List<NodeId> nodeIds = new ArrayList<NodeId>();
for (ItemState state : changeLog.deletedStates()) {
    if (state.isNode()) {
        nodeIds.add((NodeId) state.getId());
    }
}
List<NodePropBundle> bundles = new ArrayList<NodePropBundle>();
bundles.addAll(getBundles(nodeIds)); // Can throw NoSuchItemStateException
deleteBundles(bundles);
deleted.addAll(nodeIds);
{code}

For backwards-compatibility and convenience, AbstractBundlePersistenceManager would provide trivial default implementations for getBundles() and deleteBundles():

{code}
protected List<NodePropBundle> getBundles(Collection<? extends NodeId> nodeIds) throws ItemStateException {
    List<NodePropBundle> result = new ArrayList<NodePropBundle>();

    for (NodeId nodeId : nodeIds) {
        result.add(getBundle(nodeId));
    }

    return result;
}

protected void deleteBundles(Collection<? extends NodePropBundle> bundles) throws ItemStateException {
    for (NodePropBundle bundle : bundles) {
        deleteBundle(bundle);
    }
}
{code}

But the BundleDbPersistenceManager could override this default behaviour by executing something like

{code}
-- getBundles()
select NODE_ID, BUNDLE_DATA from BUNDLE where NODE_ID in (?, ?, ...)

-- deleteBundles()
delete from BUNDLE where NODE_ID in (?, ?, ...)
{code}

The same also applies for deleting from the REFS table.

If this seems like a viable improvement and if I didn't overlook something subtle where one-by-one selection / deletion of bundle data is important, I could go ahead and provide a patch for this issue. Once such an improvement is in place, other bundle persistence managers might take advantage of the new algorithm as well. Also, we could review adding / updating bundles in another JIRA ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira