You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/10/30 04:33:00 UTC

[jira] [Commented] (IMPALA-10132) Implement ds_hll_estimate_bounds()

    [ https://issues.apache.org/jira/browse/IMPALA-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223404#comment-17223404 ] 

ASF subversion and git services commented on IMPALA-10132:
----------------------------------------------------------

Commit 193c2e773fa9f6772e4a7c30ed3a4f75029863f1 in impala's branch refs/heads/master from Fucun Chu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=193c2e7 ]

IMPALA-10132: Implement ds_hll_estimate_bounds_as_string() function.

This function receives a string that is a serialized Apache DataSketches
HLL sketch and optional kappa that is a number of standard deviations
from the mean: 1, 2 or 3 (default 2). Returns estimate and bounds with
the values separated with commas.
The result is three values: estimate, lower bound and upper bound.

   ds_hll_estimate_bounds_as_string(sketch [, kappa])

Kappa:
 1 represent the 68.3% confidence bounds
 2 represent the 95.4% confidence bounds
 3 represent the 99.7% confidence bounds

Note, ds_hll_estimate_bounds() should return an Array of doubles as
the result but with that we have to wait for the complex type support.
Until, we provide ds_hll_estimate_bounds_as_string() that can be
deprecated once we have array support. Tracking Jira for returning
complex types from functions is IMPALA-9520.

Example:
select ds_hll_estimate_bounds_as_string(ds_hll_sketch(int_col)) from
functional_parquet.alltypestiny;
+----------------------------------------------------------+
| ds_hll_estimate_bounds_as_string(ds_hll_sketch(int_col)) |
+----------------------------------------------------------+
| 2,2,2.0002                                               |
+----------------------------------------------------------+

Change-Id: I46bf8263e8fd3877a087b9cb6f0d1a2392bb9153
Reviewed-on: http://gerrit.cloudera.org:8080/16626
Reviewed-by: Gabor Kaszab <ga...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Implement ds_hll_estimate_bounds()
> ----------------------------------
>
>                 Key: IMPALA-10132
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10132
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Adam Tamas
>            Assignee: Fucun Chu
>            Priority: Major
>
> In hive ds_hll_estimate_bounds() gives back an array of doubles.
> An example for a sketch created from a table which contains only a single value:
> {code:java}
> (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;)
> +-------------------------------+
> |              _c0              |
> +-------------------------------+
> | [1.0,1.0,1.0000998634873453]  |
> +-------------------------------+
> {code}
> The values of the array is probably a lower bound, an estimate and an upper bound of the sketch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org