You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2016/11/08 15:28:58 UTC

[jira] [Updated] (OAK-4323) Query engine: index cost formula incorrect when using "limit"

     [ https://issues.apache.org/jira/browse/OAK-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller updated OAK-4323:
--------------------------------
    Fix Version/s:     (was: 1.6)

> Query engine: index cost formula incorrect when using "limit"
> -------------------------------------------------------------
>
>                 Key: OAK-4323
>                 URL: https://issues.apache.org/jira/browse/OAK-4323
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>
> As described in OAK-2081, the cost formula currently used in the query engine is not correct if "limit" is used, because it doesn't account for false positives. 
> Example: Let's say there are two indexes:
> * color: 10000 nodes with color=red, but a bit slower (lets say a remote index), cost per entry is 1.5.
> * size: 20000 nodes with size=M, but a bit faster (lets say a local index), cost per entry is 1.
> Without limit, the index for "color" should be used as 10000 * 1.5 = 15000 is lower than 20000 * 1 = 20000.
> With limit=100, then we could calculate as follows: there are at most 10000 entries (according to index "color"), so the false positive rate of the "size" index is at least 50%. So cost of "color" is 100 * 1.5 = 150. Cost of "size" is 100 * 1 = 100, but with false positive rate of 50%, so cost is actually 200. Therefor, still the index "color" should be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)