You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2016/03/03 12:44:18 UTC

[jira] [Commented] (OAK-4065) Counter index can get out of sync

    [ https://issues.apache.org/jira/browse/OAK-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177711#comment-15177711 ] 

Thomas Mueller commented on OAK-4065:
-------------------------------------

I didn't find a problem in the code, but maybe the problem is that after some time, if many nodes are added and removed, the count can get out of sync. A test case that adds 1 million nodes, then 10 times do this: adds 1 million nodes and remove 1 million nodes. After that, the expected value is 1 million. However, the histogram shows the following. The first image is using a cap (never go below zero), while the second image does not use a cap, so values can be below zero.

!probability_1m_10times-add1remove1m.png!

So for the first case, the most likely value is actually 0, which is what I saw. For my taste, too many values are outside of the acceptable window. So maybe this is the reason why the counter got out of sync.

To get a more accurate count, we could:

* don't trust (near) 0 values in the cost estimate
* take into account the approximate count of children as well, as the average of parent and children is more accurate
* from time to time, for example once a week, (partially) reindex the counter index
* use a mechanism that is accurate over time when adding, removing, adding nodes

A more accurate mechanism would be for example: instead of an independent pseudo-random value, calculate the hash code of the node name, and if that modulo 10000 is 0, increment / decrement the counter. That way, counts below 0 are not possible, as each add/remove pair will result in exactly 0 (most are +0/-0, and 1:10000 there is a +1/-1 pair).

> Counter index can get out of sync
> ---------------------------------
>
>                 Key: OAK-4065
>                 URL: https://issues.apache.org/jira/browse/OAK-4065
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>         Attachments: probability_1m_10times-add1remove1m.png
>
>
> I don't have a reproducible test case yet, but it looks like some usage pattern  (for example creating, deleting, moving many nodes in one transaction) can cause the counter index to get out of sync with the real data. Worst case, the counter index thinks that the repository is empty (the root node has no descendent nodes).
> I want to write test cases to find the problem, or to prove that it doesn't get out of sync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)