You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2015/09/09 13:05:46 UTC

[jira] [Updated] (OAK-3380) Property index pruning should happen asynchronously

     [ https://issues.apache.org/jira/browse/OAK-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vikas Saurabh updated OAK-3380:
-------------------------------
    Description: 
Following up on this (a relatively old) thread \[1], we should do pruning of property index structure asynchronously. The thread was never concluded.. here are a couple of ideas picked from the thread:
* Move pruning to an async thread
* Throttle pruning i.e. prune only once in a while
** I'm not sure how that would work though -- an unpruned part would remain as is until another index happens on that path.

Once we can move pruning to some async thread (reducing concurrent updates), OAK-2673 + OAK-2929 can take care of add-add conflicts.

----
h6. Why is this an issue despite merge retries taking care of it?
A couple of cases which have concurrent updates hitting merge conflicts in our product (Adobe AEM):
* Some index are very volatile (in the sense that indexed property switches its values very quickly) e.g. sling job status, AEM workflow status.
* Multiple threads take care of jobs. Although sling maintains a bucketed structure for job storage to reduce conflicts... but inside index tree the bucket structure, at times, gets pruned and needs to be created in the next job status change

While retries do take care of these conflict a lot of times and even when they don't, AEM workflows has it's own retry to work around. But, retrying, IMHO, is just a waste of time -- more importantly in paths where application doesn't really have a control.

h6. Would this add to cost of traversing index structure?
Yes, there'd be some left over paths in index structure between asynchronous prunes. But, I think the cost of such wasted traversals would be covered up with time saved in avoiding the concurrent update conflict.
----

(cc [~tmueller], [~mreutegg], [~alex.parvulescu], [~chetanm])

\[1]: http://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201506.mbox/%3CCADicHF66U2Vh-hLrJUNANsYtXfiDj2mT3vKTr4ybknGpzy9MNw@mail.gmail.com%3E

  was:
Following up on this (a relatively old) thread \[1], we should do pruning of property index structure asynchronously. The thread was never concluded, here are a couple of ideas picked from the thread:
* Move pruning to an async thread
* Throttle pruning i.e. prune only once in a while
** I'm not sure how that would work though -- an unpruned part would remain as is until another index happens on that path.

Once we can move pruning to some async thread (reducing concurrent updates), OAK-2673 + OAK-2929 can take care of add-add conflicts.

(cc [~tmueller], [~mreutegg], [~alex.parvulescu], [~chetanm])

\[1]: http://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201506.mbox/%3CCADicHF66U2Vh-hLrJUNANsYtXfiDj2mT3vKTr4ybknGpzy9MNw@mail.gmail.com%3E


> Property index pruning should happen asynchronously
> ---------------------------------------------------
>
>                 Key: OAK-3380
>                 URL: https://issues.apache.org/jira/browse/OAK-3380
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3.5
>            Reporter: Vikas Saurabh
>            Priority: Minor
>              Labels: resilience
>
> Following up on this (a relatively old) thread \[1], we should do pruning of property index structure asynchronously. The thread was never concluded.. here are a couple of ideas picked from the thread:
> * Move pruning to an async thread
> * Throttle pruning i.e. prune only once in a while
> ** I'm not sure how that would work though -- an unpruned part would remain as is until another index happens on that path.
> Once we can move pruning to some async thread (reducing concurrent updates), OAK-2673 + OAK-2929 can take care of add-add conflicts.
> ----
> h6. Why is this an issue despite merge retries taking care of it?
> A couple of cases which have concurrent updates hitting merge conflicts in our product (Adobe AEM):
> * Some index are very volatile (in the sense that indexed property switches its values very quickly) e.g. sling job status, AEM workflow status.
> * Multiple threads take care of jobs. Although sling maintains a bucketed structure for job storage to reduce conflicts... but inside index tree the bucket structure, at times, gets pruned and needs to be created in the next job status change
> While retries do take care of these conflict a lot of times and even when they don't, AEM workflows has it's own retry to work around. But, retrying, IMHO, is just a waste of time -- more importantly in paths where application doesn't really have a control.
> h6. Would this add to cost of traversing index structure?
> Yes, there'd be some left over paths in index structure between asynchronous prunes. But, I think the cost of such wasted traversals would be covered up with time saved in avoiding the concurrent update conflict.
> ----
> (cc [~tmueller], [~mreutegg], [~alex.parvulescu], [~chetanm])
> \[1]: http://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201506.mbox/%3CCADicHF66U2Vh-hLrJUNANsYtXfiDj2mT3vKTr4ybknGpzy9MNw@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)