You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2016/05/14 01:16:12 UTC
[jira] [Assigned] (OAK-4358) Stale cluster ids can potentially lead
to lots of previous docs traversal in NodeDocument.getNewestRevision
[ https://issues.apache.org/jira/browse/OAK-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vikas Saurabh reassigned OAK-4358:
----------------------------------
Assignee: Marcel Reutegger (was: Vikas Saurabh)
[~mreutegg], As discussed, I was trying to do optimized iteration in getChanges(prop, minRev) to then change the logic in getNewestRevision to do max(getChanges(_revision), getChanges(_commitRoot)). Here's a test that I was working with (adaptation of existing {{getChangesMixedClusterIds}}):
{code}
@Test
public void getChangesMixedClusterIdsTooManyPrevDocsRead() throws Exception {
final int numChanges = 200;
Random random = new Random();
final List<RevisionVector> splitHeads = Lists.newArrayList();
final int SPLIT_DOC_IDX = 4;//the test fails for 0 too... I'm just being conservative though!
MemoryDocumentStore store = new MemoryDocumentStore() {
@Override
public <T extends Document> T find(Collection<T> collection,
String key) {
String path = Utils.getPathFromId(key);
if (path.startsWith("p")) {
RevisionVector minRevBeingRead = splitHeads.get(splitHeads.size() - SPLIT_DOC_IDX - 1);
String maxRevStrInSplitTree = path.substring(2, path.length()-2);
Revision maxRevInSplitTree = Revision.fromString(maxRevStrInSplitTree);
assertFalse("Previous doc (" + key + ") read for min rev " + minRevBeingRead
, !minRevBeingRead.isRevisionNewer(maxRevInSplitTree));
}
return super.find(collection, key);
}
};
DocumentNodeStore ns1 = createTestStore(store, 1, 0);
DocumentNodeStore ns2 = createTestStore(store, 2, 0);
List<DocumentNodeStore> nodeStores = Lists.newArrayList(ns1, ns2);
for (int i = 0; i < numChanges; i++) {
DocumentNodeStore ns = nodeStores.get(random.nextInt(nodeStores.size()));
ns.runBackgroundOperations();
NodeBuilder builder = ns.getRoot().builder();
builder.setProperty("p", i);
merge(ns, builder);
ns.runBackgroundOperations();
if (random.nextDouble() < 0.2) {
RevisionVector splitHead = ns.getHeadRevision();
splitHeads.add(splitHead);
for (UpdateOp op : SplitOperations.forDocument(
getRootDocument(store), ns, splitHead,
Predicates.<String>alwaysFalse(), 2)) {
store.createOrUpdate(NODES, op);
}
}
}
NodeDocument doc = getRootDocument(store);
RevisionVector minRevBeingRead = splitHeads.get(splitHeads.size() - SPLIT_DOC_IDX - 1);
Lists.newArrayList(doc.getChanges("p", minRevBeingRead));
ns1.dispose();
ns2.dispose();
}
{code}
I think the assertion that NO previous doc should be read which is strictly older than minRevVector got a little bit too strict (as fixing that seemed like we'd need to introduce some min concept in reading {{PropertyHistory}}).
In the end, I think it'd be beyond my comfort level to fix the issue. If the test seems fine, I can post similar one for getNewestRevision as well. Assigning the issue to you though (would be trying some stuff anyway in the mean time)
> Stale cluster ids can potentially lead to lots of previous docs traversal in NodeDocument.getNewestRevision
> -----------------------------------------------------------------------------------------------------------
>
> Key: OAK-4358
> URL: https://issues.apache.org/jira/browse/OAK-4358
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: documentmk
> Reporter: Vikas Saurabh
> Assignee: Marcel Reutegger
>
> When some (actual test case and conditions still being investigated) of the following conditions are met:
> * There are property value changes from different cluster id
> * There are very old and stale cluster id (probably older incarnations of current node itself)
> * A parallel background split removes all _commitRoot, _revision entries such that the latest one (which is less that baseRev) is very old
> , finding newest revision traverses a lot of previous docs. Since root document gets split a lot and is a very common commitRoot (thus participating during checkConflicts in lot of commits), the issue can slow down commits by a lot
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)