You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2016/09/16 12:15:22 UTC
[jira] [Updated] (OAK-4816) Property index: cost estimate with path
restriction is too optimistic
[ https://issues.apache.org/jira/browse/OAK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Mueller updated OAK-4816:
--------------------------------
Component/s: query
> Property index: cost estimate with path restriction is too optimistic
> ---------------------------------------------------------------------
>
> Key: OAK-4816
> URL: https://issues.apache.org/jira/browse/OAK-4816
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: query
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Fix For: 1.6
>
>
> The property index cost estimation is too optimistic in case there is a property restriction plus a path restriction. The current algorithm, as documented in http://jackrabbit.apache.org/oak/docs/query/property-index.html#Cost_Estimation , assumes that matching entries are evenly distributed over the whole repository. In many cases, this is not the case. In extreme cases, _all_ entries that match the property restriction are in the subtree that matches the path restriction. Example:
> * 10'000 nodes with property color "red".
> * 1 million nodes in the repository
> * 10'000 nodes in the subtree /content
> * query {{/jcr:root/content//\*[@color = 'red']}}
> Currently, the cost estimate is about 100, there are about 10'000 entries for "red", and "/content" contains 1% of all nodes. But in reality, there might be 10'000 entries with color "red" in that subtree (that is, all of them).
> The cost estimation should take that into account, and assume that at least 80% of the matching nodes are in that subtree (if the subtree contains that many nodes).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)