You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Marcel Reutegger (Jira)" <ji...@apache.org> on 2022/08/05 13:12:00 UTC

[jira] [Comment Edited] (OAK-9880) Simplify rgc DEFAULT_NO_BRANCH query

    [ https://issues.apache.org/jira/browse/OAK-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575832#comment-17575832 ] 

Marcel Reutegger edited comment on OAK-9880 at 8/5/22 1:11 PM:
---------------------------------------------------------------

bq. Alternative approach could be to use the lowest _sdMaxRevTime (of all sweepRevs, and oldestRevTimeStamp) - with the effect that rgc will have to wait longer until it can clean up garbage

I don't see how this can work when there are inactive clusterIds. Their sweepRev will not be updated and RGC will effectively become stuck.

However, see also my comment in the PR about keeping track of minMaxRevTimeInSecs per clusterId and only attempt to delete garbage when it changes.


was (Author: mreutegg):
.bq Alternative approach could be to use the lowest _sdMaxRevTime (of all sweepRevs, and oldestRevTimeStamp) - with the effect that rgc will have to wait longer until it can clean up garbage

I don't see how this can work when there are inactive clusterIds. Their sweepRev will not be updated and RGC will effectively become stuck.

> Simplify rgc DEFAULT_NO_BRANCH query
> ------------------------------------
>
>                 Key: OAK-9880
>                 URL: https://issues.apache.org/jira/browse/OAK-9880
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: mongomk
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Major
>
> We have seen a repeat of long running rgc *remove* operations - similarly to what was described in OAK-8351.
> This time happening with the query generated by [queryForDefaultNoBranch|https://github.com/apache/jackrabbit-oak/blob/99b250a05ffe490f66de67374125fabee17f6fda/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoVersionGCSupport.java#L213-L242] with the query shape for example similar to:
> {noformat}
> {
>     "_sdType" : 70,
>     "_sdMaxRevTime" : {
>         "$lt" : NumberLong(1603030303)
>     },
>     "$or" : [
>         {
>             "$or" : [
>                 {
>                     "_id" : /.*-1\/0/
>                     },
>                 {
>                     "_id" : /[^-]*/,
>                     "_path" : /.*-1\/0/
>                 }
>         ],
>             "_sdMaxRevTime" : {
>                 "$lt" : NumberLong(1602020202)
>             }
>         },
>         {
>             "$or" : [
>                 {
>                     "_id" : /.*-2\/0/
>                     },
>                 {
>                     "_id" : /[^-]*/,
>                     "_path" : /.*-2/0/
>                 }
>         ],
>             "_sdMaxRevTime" : {
>                 "$lt" : NumberLong(1601010101)
>             }
>         }
> }
> {noformat}
> While setting an index filter with the query plan in mongodb is one option, we could additionally also look into simplifying the above query further into multiple queries : eg. by having 1 query per clusterNodeId, and then simplifying the {{_sdMaxRevTime}} accordingly, so that the above would translate into the following 2 queries (with the hope that mongodb finds the optimal query plan) :
> {noformat}
> {
>     "_sdType" : 70,
>     "_sdMaxRevTime" : {
>         "$lt" : NumberLong(1602020202)
>     },
>     "$or" : [
>         {
>             "_id" : /.*-1\/0/
>         },
>         {
>             "_id" : /[^-]*/,
>             "_path" : /.*-1\/0/
>         }
>     }
> }
> {noformat}
> and
> {noformat}
> {
>     "_sdType" : 70,
>     "_sdMaxRevTime" : {
>         "$lt" : NumberLong(1601010101)
>     },
>     "$or" : [
>         {
>             "_id" : /.*-2\/0/
>         },
>         {
>             "_id" : /[^-]*/,
>             "_path" : /.*-2\/0/
>         }
>     }
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)