You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2017/07/06 10:25:00 UTC

[jira] [Commented] (OAK-6333) IndexPlanner should use actual entryCount instead of limiting it to 1000

    [ https://issues.apache.org/jira/browse/OAK-6333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076314#comment-16076314 ] 

Chetan Mehrotra commented on OAK-6333:
--------------------------------------

Implemented the discussed approach with 1801011.

* By default logic would use actual numDocs
* If entryCount is specified then its value would be used and numDocs would be ignored
* The behaviour can be reverted by setting system property {{oak.lucene.useActualEntryCount}} to {{false}}. This would then make use of old logic (as explained in description). It would also log a message like 
{noformat}
System property oak.lucene.useActualEntryCount found to be false. IndexPlanner would use a default entryCount of 1000 instead of using the actual entry count
{noformat}

While backporting this we need to
* Change default value to false
* Adjust the test setup per new default

> IndexPlanner should use actual entryCount instead of limiting it to 1000
> ------------------------------------------------------------------------
>
>                 Key: OAK-6333
>                 URL: https://issues.apache.org/jira/browse/OAK-6333
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>              Labels: candidate_oak_1_4, candidate_oak_1_6
>             Fix For: 1.8, 1.7.4
>
>
> Currently IndexPlanner uses following logic for estimating the entryCount
> # If the index has fulltext indexing enable then and query has a fulltext constraint clause specified
> ## If {{entryCount}} value is defined then min(entryCount, numOfDocs)
> ## If not then use the {{numDocs}} i.e. actual entry count
> # If the index is pure property index i.e. none of the property definitions have {{analyzed}} set to true
> ## If {{entryCount}} value is defined then min(entryCount, numOfDocs)
> ## Else Take min(1000, numDocs)
> Revisiting the logic for #2 it appears in 1.0.x days (OAK-2200) we capped it to 1000 because cost estimation for property indexes was inaccurate (they used to report low values causing lucene index to loose). 
> With support for Counters the cost estimation for property index has improved and now we should remove this capping and let it use numDocs.
> One area where it causes issue is when we have two indexes where one is superset of other. For e.g. /oak:index/asset and /content/en/ /oak:index/asset where both have some matching properties. Logically if query can be handled by sub index then it should get picked but currently either of them can be picked making query plan undeterministic



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)