You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Nitin Gupta (Jira)" <ji...@apache.org> on 2022/12/21 03:40:00 UTC

[jira] [Closed] (OAK-9880) Simplify rgc DEFAULT_NO_BRANCH query

     [ https://issues.apache.org/jira/browse/OAK-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitin Gupta closed OAK-9880.
----------------------------

> Simplify rgc DEFAULT_NO_BRANCH query
> ------------------------------------
>
>                 Key: OAK-9880
>                 URL: https://issues.apache.org/jira/browse/OAK-9880
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: mongomk
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Major
>             Fix For: 1.46.0
>
>
> We have seen a repeat of long running rgc *remove* operations - similarly to what was described in OAK-8351.
> This time happening with the query generated by [queryForDefaultNoBranch|https://github.com/apache/jackrabbit-oak/blob/99b250a05ffe490f66de67374125fabee17f6fda/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoVersionGCSupport.java#L213-L242] with the query shape for example similar to:
> {noformat}
> {
>     "_sdType" : 70,
>     "_sdMaxRevTime" : {
>         "$lt" : NumberLong(1603030303)
>     },
>     "$or" : [
>         {
>             "$or" : [
>                 {
>                     "_id" : /.*-1\/0/
>                     },
>                 {
>                     "_id" : /[^-]*/,
>                     "_path" : /.*-1\/0/
>                 }
>         ],
>             "_sdMaxRevTime" : {
>                 "$lt" : NumberLong(1602020202)
>             }
>         },
>         {
>             "$or" : [
>                 {
>                     "_id" : /.*-2\/0/
>                     },
>                 {
>                     "_id" : /[^-]*/,
>                     "_path" : /.*-2/0/
>                 }
>         ],
>             "_sdMaxRevTime" : {
>                 "$lt" : NumberLong(1601010101)
>             }
>         }
> }
> {noformat}
> While setting an index filter with the query plan in mongodb is one option, we could additionally also look into simplifying the above query further into multiple queries : eg. by having 1 query per clusterNodeId, and then simplifying the {{_sdMaxRevTime}} accordingly, so that the above would translate into the following 2 queries (with the hope that mongodb finds the optimal query plan) :
> {noformat}
> {
>     "_sdType" : 70,
>     "_sdMaxRevTime" : {
>         "$lt" : NumberLong(1602020202)
>     },
>     "$or" : [
>         {
>             "_id" : /.*-1\/0/
>         },
>         {
>             "_id" : /[^-]*/,
>             "_path" : /.*-1\/0/
>         }
>     }
> }
> {noformat}
> and
> {noformat}
> {
>     "_sdType" : 70,
>     "_sdMaxRevTime" : {
>         "$lt" : NumberLong(1601010101)
>     },
>     "$or" : [
>         {
>             "_id" : /.*-2\/0/
>         },
>         {
>             "_id" : /[^-]*/,
>             "_path" : /.*-2\/0/
>         }
>     }
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)