You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-issues@jackrabbit.apache.org by "Davide Giannella (JIRA)" <ji...@apache.org> on 2014/06/26 15:56:25 UTC

[jira] [Commented] (OAK-1907) Better cost estimates for traversal, property, and ordered indexes

    [ https://issues.apache.org/jira/browse/OAK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044666#comment-14044666 ] 

Davide Giannella commented on OAK-1907:
---------------------------------------

Based on my comment above I amended slightly the provided POC by
passing a new Random instance at every insert/delete and got the same
result as before so it seems there's not really any difference and my
concerns above are not valid.

Execution with class-scope Random: real: 3330840 approx: 6144000.

Execution with new instance every run: real: 3329182 approx: 6144000.

Amended version for the records:

{noformat}
$ diff ApproxCount.java ApproxCount_001.java 
19,21c19,20
< //    Random random = new Random(1);
<     Random rnd2 = new Random(1);
<     
---
>     Random random = new Random(1);
> 
39c38
<             if (count == 0 || rnd2.nextInt(3) > 0) {
---
>             if (count == 0 || random.nextInt(3) > 0) {
92c91
<         if (new Random().nextInt(x) == 0) {
---
>         if (random.nextInt(x) == 0) {
104c103
<         if (new Random().nextInt(x) == 0) {
---
>         if (random.nextInt(x) == 0) {
{noformat}


> Better cost estimates for traversal, property, and ordered indexes
> ------------------------------------------------------------------
>
>                 Key: OAK-1907
>                 URL: https://issues.apache.org/jira/browse/OAK-1907
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.0, 1.0.1, 1.0.2
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.1
>
>         Attachments: ApproxCount.java, OAK-1907.diff
>
>
> Currently, cost estimates of traversal, property index, and ordered index don't take the number of nodes into account, if there are more than about 100 nodes. This is problematic because in many cases, the wrong index is used (because of incorrect cost estimate).
> To get a better estimate, a very rough estimate on the number of child nodes below a given path is needed. 
> One idea is: when adding a node, if Math.random() < 0.00001, add a hidden, randomly named property (for example called ":count-xyz" where xyz is a uuid, value 100'000) to the parents of that node, so that we know there are probably more than 100'000 nodes below a given path. When removing a node, with the same algorithm add a hidden property (":count-xyz", value -100'000). That should result in a slowdown of less than 0.01%, but should allow us much better cost estimates. Those properties could be consolidated asynchronously if needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)