You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Wenzhe Zhou (Jira)" <ji...@apache.org> on 2020/02/26 21:39:00 UTC

[jira] [Resolved] (IMPALA-8759) Use double precision for HLL

     [ https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenzhe Zhou resolved IMPALA-8759.
---------------------------------
    Resolution: Fixed

The Impala binary was built as release build. The testing was ran with database table tpch.lineitem which was loaded in scale factor 150. The total number of rows of the table is 900,035,147.  Measured the time for query,  like "select ndv(col_name) from tpch150_parquet.lineitem", from impala-shell.

> Use double precision for HLL
> ----------------------------
>
>                 Key: IMPALA-8759
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8759
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Peter Ebert
>            Assignee: Wenzhe Zhou
>            Priority: Major
>              Labels: perf, ramp-up
>
> For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a float which is only capable of 6-9 digits of precision.  More accurate estimates for larger cardinalities (beyond 999,999) should be possible with double precision.  Another c++ implementation uses double as well [https://github.com/dialtr/libcount]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)