You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2018/03/27 09:13:01 UTC

[jira] [Created] (OAK-7379) Lucene Index: per-column selectivity, assume 5 unique entries

Thomas Mueller created OAK-7379:
-----------------------------------

             Summary: Lucene Index: per-column selectivity, assume 5 unique entries
                 Key: OAK-7379
                 URL: https://issues.apache.org/jira/browse/OAK-7379
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: lucene, query
            Reporter: Thomas Mueller
            Assignee: Thomas Mueller


Currently, if a query has a property restriction of the form "property = x", and the property is indexed in a Lucene property index, the estimated cost is the index is the number of documents indexed for that property. This is a very conservative estimate, it means all documents have the same value. So the cost is relatively high for that index.

In almost all cases, there are many distinct values for a property. Rarely there are few values, or a skewed distribution where one value contains most documents. But in almost all cases there are more than 5 distinct values.

I think it makes sense to use 5 as the default value. It is still conservative (cost of the index is high), but much better than now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)