You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2019/09/20 15:32:31 UTC

[GitHub] [incubator-doris] morningman commented on a change in pull request #1845: Reduce size of HyperLogLog struct

morningman commented on a change in pull request #1845: Reduce size of HyperLogLog struct
URL: https://github.com/apache/incubator-doris/pull/1845#discussion_r326677267
 
 

 ##########
 File path: be/src/olap/hll.h
 ##########
 @@ -23,59 +23,68 @@
 #include <set>
 #include <map>
 
-#include "olap/olap_common.h"
+#include "gutil/macros.h"
 
 namespace doris {
 
 const static int HLL_COLUMN_PRECISION = 14;
 const static int HLL_EXPLICLIT_INT64_NUM = 160;
-const static int HLL_REGISTERS_COUNT = 16384;
+const static int HLL_REGISTERS_COUNT = 16 * 1024;
 // maximum size in byte of serialized HLL: type(1) + registers (2^14)
-const static int HLL_COLUMN_DEFAULT_LEN = 16385;
+const static int HLL_COLUMN_DEFAULT_LEN = HLL_REGISTERS_COUNT + 1;
 
 // Hyperloglog distinct estimate algorithm.
 // See these papers for more details.
 // 1) Hyperloglog: The analysis of a near-optimal cardinality estimation
 // algorithm (2007)
 // 2) HyperLogLog in Practice (paper from google with some improvements)
 
-// 通过varchar的变长编码方式实现hll集合
-// 实现hll列中间计算结果的处理
-// empty 空集合
-// explicit 存储64位hash值的集合
-// sparse 存储hll非0的register
-// full  存储全部的hll register
-// empty -> explicit -> sparse -> full 四种类型的转换方向不可逆
-// 第一个字节存放hll集合的类型 0:empty 1:explicit 2:sparse 3:full
-// 已决定后面的数据怎么解析
+// Each HLL value is a set of values. To save space, Doris store HLL value
+// in different format according to its cardinality.
+//
 
 Review comment:
   Is there a fix scope for what type should we the choice?
   eg.
   `1~160` using HLL_DATA_EXPLICIT
   `161~4096` using HLL_DATA_SPRASE?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org