You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2020/07/24 06:40:00 UTC

[jira] [Closed] (IMPALA-9633) Implement ds_hll_union() builtin function

     [ https://issues.apache.org/jira/browse/IMPALA-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabor Kaszab closed IMPALA-9633.
--------------------------------
    Fix Version/s: Impala 4.0
       Resolution: Fixed

> Implement ds_hll_union() builtin function
> -----------------------------------------
>
>                 Key: IMPALA-9633
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9633
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend, Frontend
>            Reporter: Gabor Kaszab
>            Assignee: Gabor Kaszab
>            Priority: Major
>             Fix For: Impala 4.0
>
>
> ds_hll_union() is an aggregating function that accepts sketches and produces a single scratch that is the combination of the received scratches.
> Example from Hive:
> {code:java}
> create temporary table sketch_intermediate (category char(1), sketch binary);
> insert into sketch_intermediate select category, ds_hll_sketch(id) from sketch_input group by category;
> select ds_hll_estimate(ds_hll_union(sketch)) from sketch_intermediate;
> {code}
> Some test data for the example:
> {code:java}
> create temporary table sketch_input (id int, category char(1));
> insert into table sketch_input values
>   (1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a'), (7, 'a'), (8, 'a'), (9, 'a'), (10, 'a'),
>   (6, 'b'), (7, 'b'), (8, 'b'), (9, 'b'), (10, 'b'), (11, 'b'), (12, 'b'), (13, 'b'), (14, 'b'), (15, 'b');
> {code}
> Approximate result:
> {code:java}
> 15.000000521540663
> {code}
> Hive change that introduced the same: https://issues.apache.org/jira/browse/HIVE-22940



--
This message was sent by Atlassian Jira
(v8.3.4#803005)