You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Stefan Egli (Jira)" <ji...@apache.org> on 2022/08/04 14:22:00 UTC

[jira] [Created] (OAK-9880) Simplify rgc query

Stefan Egli created OAK-9880:
--------------------------------

             Summary: Simplify rgc query
                 Key: OAK-9880
                 URL: https://issues.apache.org/jira/browse/OAK-9880
             Project: Jackrabbit Oak
          Issue Type: Task
          Components: mongomk
            Reporter: Stefan Egli
            Assignee: Stefan Egli


We have seen a repeat of long running rgc *remove* operations - similarly to what was described in OAK-8351.

This time happening with the query generated by [queryForDefaultNoBranch|https://github.com/apache/jackrabbit-oak/blob/99b250a05ffe490f66de67374125fabee17f6fda/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoVersionGCSupport.java#L213-L242] with the query shape for example similar to:
{noformat}
{
    "_sdType" : 70,
    "_sdMaxRevTime" : {
        "$lt" : NumberLong(1603030303)
    },
    "$or" : [
        {
            "$or" : [
                {
                    "_id" : /.*-1\/0/
                    },
                {
                    "_id" : /[^-]*/,
                    "_path" : /.*-1\/0/
                }
        ],
            "_sdMaxRevTime" : {
                "$lt" : NumberLong(1602020202)
            }
        },
        {
            "$or" : [
                {
                    "_id" : /.*-2\/0/
                    },
                {
                    "_id" : /[^-]*/,
                    "_path" : /.*-2/0/
                }
        ],
            "_sdMaxRevTime" : {
                "$lt" : NumberLong(1601010101)
            }
        }
}
{noformat}
While setting an index filter with the query plan in mongodb is one option, we could additionally also look into simplifying the above query further into multiple queries : eg. by having 1 query per clusterNodeId, and then simplifying the {{_sdMaxRevTime}} accordingly, so that the above would translate into the following 2 queries (with the hope that mongodb finds the optimal query plan) :
{noformat}
{
    "_sdType" : 70,
    "_sdMaxRevTime" : {
        "$lt" : NumberLong(1602020202)
    },
    "$or" : [
        {
            "_id" : /.*-1\/0/
        },
        {
            "_id" : /[^-]*/,
            "_path" : /.*-1\/0/
        }
    }
}
{noformat}
and
{noformat}
{
    "_sdType" : 70,
    "_sdMaxRevTime" : {
        "$lt" : NumberLong(1601010101)
    },
    "$or" : [
        {
            "_id" : /.*-2\/0/
        },
        {
            "_id" : /[^-]*/,
            "_path" : /.*-2\/0/
        }
    }
}
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)