You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Liya Fan (Jira)" <ji...@apache.org> on 2020/10/23 09:19:00 UTC

[jira] [Commented] (CALCITE-4351) RelMdUtil#numDistinctVals always returns 0 for large inputs

    [ https://issues.apache.org/jira/browse/CALCITE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219564#comment-17219564 ] 

Liya Fan commented on CALCITE-4351:
-----------------------------------

[~TsReaper] Thanks for reporting the problem.

The solution of switching new/old formulas seems to work well for the instance you have given. 

However, I am afraid the old formula is also vulnerable to problems related to floating point errors. For example, when domainSize = 1e20 and numSelectiions = 1e3, both new and old formulas both produce 0. 

So I think what we need is a computational stable approach. Maybe we can consider expanding the formula through Taylor series and take the first few terms?

> RelMdUtil#numDistinctVals always returns 0 for large inputs
> -----------------------------------------------------------
>
>                 Key: CALCITE-4351
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4351
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.26.0
>            Reporter: Caizhi Weng
>            Priority: Major
>
> Previous implementation of {{RelMdUtil#numDistinctVals}} uses the approximation {{ln(1 + x) ~= x}} when {{x}} is small.
> However CALCITE-4132 remove this approximation to make the result more accurate. This causes the function to calculate an incorrect result for large inputs (for example, when {{domainSize = 1e18}} and {{numSelected = 1e10}} the result is 0) due to precision problems.
> What I would suggest is to treat small and large inputs in different ways. For small inputs we use the new, more precise function and for large inputs we use the old, approximated function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)