You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Liya Fan (Jira)" <ji...@apache.org> on 2021/01/12 11:54:00 UTC

[jira] [Created] (CALCITE-4465) Estimate the number of distinct values by filter condition

Liya Fan created CALCITE-4465:
---------------------------------

             Summary: Estimate the number of distinct values by filter condition
                 Key: CALCITE-4465
                 URL: https://issues.apache.org/jira/browse/CALCITE-4465
             Project: Calcite
          Issue Type: Improvement
          Components: core
            Reporter: Liya Fan
            Assignee: Liya Fan


According to our current implementation ({{RelMdDistinctRowCount}}), estimating the number of distinctive values (NDV) does not make good use of the filter condition. It simply forwards the call to its input operator with the fiter condition attached.
In fact, more information can be obtained for some special but commonly used conditions. For example, given condition {{x = 'a'}}, we can deduce that {{NDV(x) <= 1}}. Given condition {{x in ('a', 'b')}}, we can deduce that {{NDV(x) <= 2}}.
More generally, if we have {{x in ('a', 'b') AND y in ('c', 'd', 'e')}}, we have {{NDV(x, y) <= 2 * 3 = 6}}.

Thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)