You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Lukas Eder (JIRA)" <ji...@apache.org> on 2013/04/04 11:41:15 UTC
[jira] [Created] (JCR-3557) Inefficient deletion of ChangeLog
objects by AbstractBundlePersistenceManager
Lukas Eder created JCR-3557:
-------------------------------
Summary: Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager
Key: JCR-3557
URL: https://issues.apache.org/jira/browse/JCR-3557
Project: Jackrabbit Content Repository
Issue Type: Bug
Components: jackrabbit-core
Affects Versions: 2.6
Reporter: Lukas Eder
Priority: Minor
Attachments: Yourkit_Screenshot_SQL.png, Yourkit_Screenshot_Stacktrace.png
Deleting a tree of nodes in Jackrabbit results in a prohibitive amount of SQL queries being executed when using the BundleDbPersistenceManager. Please consider the attached Screenshots from Yourkit Profiler, which illustrate how saving the deletion of a single node through the JCR API results in 34 SQL DELETE statements for my test case:
- Yourkit_Screenshot_Stacktrace.png: The stack trace from a single deletion. Please disregard the absolute values of the time column. It is biased by the profiling session.
- Yourkit_Screenshot_SQL.png: An extract of the 34 SQL statements issued by the selected stack subtree
While other operations are probably not very optimal either, this one is a very low hanging fruit, in my opinion. Consider the following piece of code in AbstractBundlePersistenceManager.storeInternal():
{code}
for (ItemState state : changeLog.deletedStates()) {
if (state.isNode()) {
NodePropBundle bundle = getBundle((NodeId) state.getId());
if (bundle == null) {
throw new NoSuchItemStateException(state.getId().toString());
}
deleteBundle(bundle);
deleted.add(state.getId());
}
}
{code}
Now, instead of iterating over deletedStates and loading / deleting them one by one, they should be bulk-loaded / bulk-deleted as such:
{code}
List<NodeId> nodeIds = new ArrayList<NodeId>();
for (ItemState state : changeLog.deletedStates()) {
if (state.isNode()) {
nodeIds.add((NodeId) state.getId());
}
}
List<NodePropBundle> bundles = new ArrayList<NodePropBundle>();
bundles.addAll(getBundles(nodeIds)); // Can throw NoSuchItemStateException
deleteBundles(bundles);
deleted.addAll(nodeIds);
{code}
For backwards-compatibility and convenience, AbstractBundlePersistenceManager would provide trivial default implementations for getBundles() and deleteBundles():
{code}
protected List<NodePropBundle> getBundles(Collection<? extends NodeId> nodeIds) throws ItemStateException {
List<NodePropBundle> result = new ArrayList<NodePropBundle>();
for (NodeId nodeId : nodeIds) {
result.add(getBundle(nodeId));
}
return result;
}
protected void deleteBundles(Collection<? extends NodePropBundle> bundles) throws ItemStateException {
for (NodePropBundle bundle : bundles) {
deleteBundle(bundle);
}
}
{code}
But the BundleDbPersistenceManager could override this default behaviour by executing something like
{code}
-- getBundles()
select NODE_ID, BUNDLE_DATA from BUNDLE where NODE_ID in (?, ?, ...)
-- deleteBundles()
delete from BUNDLE where NODE_ID in (?, ?, ...)
{code}
The same also applies for deleting from the REFS table.
If this seems like a viable improvement and if I didn't overlook something subtle where one-by-one selection / deletion of bundle data is important, I could go ahead and provide a patch for this issue. Once such an improvement is in place, other bundle persistence managers might take advantage of the new algorithm as well. Also, we could review adding / updating bundles in another JIRA ticket.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira