You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datasketches.apache.org by "Lee Rhodes (Jira)" <ji...@apache.org> on 2020/07/24 19:06:00 UTC
[jira] [Closed] (DATASKETCHES-8) HLL doesn't take empty strings as
distinct values
[ https://issues.apache.org/jira/browse/DATASKETCHES-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lee Rhodes closed DATASKETCHES-8.
---------------------------------
> HLL doesn't take empty strings as distinct values
> -------------------------------------------------
>
> Key: DATASKETCHES-8
> URL: https://issues.apache.org/jira/browse/DATASKETCHES-8
> Project: Apache Datasketches
> Issue Type: Wish
> Reporter: Adam Tamas
> Assignee: Lee Rhodes
> Priority: Trivial
>
> Using ds_hll Hive is not counting empty strings as distinct values for string and varchar columns.
> Example:
> With a t table with the following (string, char(1), varchar(1)) values:
> {code:java}
> +------+------+------+
> | t.s | t.c | t.v |
> +------+------+------+
> | | | |
> | a | a | a |
> | | | |
> | a | a | a |
> | s | s | s |
> | d | d | d |
> +------+------+------+
> {code}
> select ds_hll_estimate(ds_hll_sketch(s)), ds_hll_estimate(ds_hll_sketch(c)), ds_hll_estimate(ds_hll_sketch(v)) from t;
> {code:java}
> +--------------------+--------------------+--------------------+
> | _c0 | _c1 | _c2 |
> +--------------------+--------------------+--------------------+
> | 3.000000014901161 | 4.000000029802323 | 3.000000014901161 |
> +--------------------+--------------------+--------------------+
> {code}
> Could be a problem here: https://github.com/apache/incubator-datasketches-java/blob/master/src/main/java/org/apache/datasketches/hll/BaseHllSketch.java#L351
> Char is working because it is filled with spaces up to the limit.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@datasketches.apache.org
For additional commands, e-mail: dev-help@datasketches.apache.org