You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by mo...@apache.org on 2020/08/01 09:54:28 UTC

[incubator-doris] branch master updated: [SQL][Function] Add approx_count_distinct() function (#4221)

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 116d7ff  [SQL][Function] Add approx_count_distinct() function (#4221)
116d7ff is described below

commit 116d7ffa3c1523adaed9a4586c8173df607ea297
Author: HangyuanLiu <46...@qq.com>
AuthorDate: Sat Aug 1 17:54:19 2020 +0800

    [SQL][Function] Add approx_count_distinct() function (#4221)
    
    Add approx_count_distinct() function to replace the ndv() function
---
 docs/.vuepress/sidebar/en.js                                 |  2 +-
 docs/.vuepress/sidebar/zh-CN.js                              |  2 +-
 docs/en/getting-started/hit-the-rollup.md                    |  2 +-
 .../aggregate-functions/{ndv.md => approx_count_distinct.md} | 12 ++++++------
 docs/en/sql-reference/sql-statements/Data Definition/HLL.md  |  2 +-
 docs/zh-CN/getting-started/hit-the-rollup.md                 |  2 +-
 .../aggregate-functions/{ndv.md => approx_count_distinct.md} | 12 ++++++------
 .../sql-reference/sql-statements/Data Definition/HLL.md      |  2 +-
 .../src/main/java/org/apache/doris/catalog/FunctionSet.java  | 11 +++++++++++
 9 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/docs/.vuepress/sidebar/en.js b/docs/.vuepress/sidebar/en.js
index 1b865b8..3fb8930 100644
--- a/docs/.vuepress/sidebar/en.js
+++ b/docs/.vuepress/sidebar/en.js
@@ -267,7 +267,7 @@ module.exports = [
               "hll_union_agg",
               "max",
               "min",
-              "ndv",
+              "approx_count_distinct",
               "percentile_approx",
               "stddev",
               "stddev_samp",
diff --git a/docs/.vuepress/sidebar/zh-CN.js b/docs/.vuepress/sidebar/zh-CN.js
index fcbd17d..da69db3 100644
--- a/docs/.vuepress/sidebar/zh-CN.js
+++ b/docs/.vuepress/sidebar/zh-CN.js
@@ -281,7 +281,7 @@ module.exports = [
               "hll_union_agg",
               "max",
               "min",
-              "ndv",
+              "approx_count_distinct",
               "percentile_approx",
               "stddev",
               "stddev_samp",
diff --git a/docs/en/getting-started/hit-the-rollup.md b/docs/en/getting-started/hit-the-rollup.md
index 23b5642..f8a0fa9 100644
--- a/docs/en/getting-started/hit-the-rollup.md
+++ b/docs/en/getting-started/hit-the-rollup.md
@@ -226,7 +226,7 @@ Of course, the function of aggregated data is indispensable for general polymer
 
 The following are some types of aggregated queries that can hit Rollup.
 
-| Column type Query type |  Sum  | Distinct/Count Distinct |   Min |  Max  |  Ndv  |
+| Column type Query type |  Sum  | Distinct/Count Distinct |   Min |  Max  |  APPROX_COUNT_DISTINCT  |
 |--------------|-------|-------------------------|-------|-------|-------|
 |     Key      | false |           true          |  true |  true | true  |
 |   Value(Sum) |  true |          false          | false | false | false |
diff --git a/docs/en/sql-reference/sql-functions/aggregate-functions/ndv.md b/docs/en/sql-reference/sql-functions/aggregate-functions/approx_count_distinct.md
similarity index 83%
rename from docs/en/sql-reference/sql-functions/aggregate-functions/ndv.md
rename to docs/en/sql-reference/sql-functions/aggregate-functions/approx_count_distinct.md
index e8cdbc2..4682830 100644
--- a/docs/en/sql-reference/sql-functions/aggregate-functions/ndv.md
+++ b/docs/en/sql-reference/sql-functions/aggregate-functions/approx_count_distinct.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "NDV",
+    "title": "APPROX_COUNT_DISTINCT",
     "language": "en"
 }
 ---
@@ -24,11 +24,11 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# NDV
+# APPROX_COUNT_DISTINCT
 ## Description
 ### Syntax
 
-`NDV (expr)`
+`APPROX_COUNT_DISTINCT (expr)`
 
 
 Returns an approximate aggregation function similar to the result of COUNT (DISTINCT col).
@@ -37,12 +37,12 @@ It combines COUNT and DISTINCT faster and uses fixed-size memory, so less memory
 
 ## example
 ```
-MySQL > select ndv(query_id) from log_statis group by datetime;
+MySQL > select approx_count_distinct(query_id) from log_statis group by datetime;
 +-----------------+
-| ndv(`query_id`) |
+| approx_count_distinct(`query_id`) |
 +-----------------+
 | 17721           |
 +-----------------+
 ```
 ##keyword
-NDV
+APPROX_COUNT_DISTINCT
diff --git a/docs/en/sql-reference/sql-statements/Data Definition/HLL.md b/docs/en/sql-reference/sql-statements/Data Definition/HLL.md
index f8d61a9..4499b88 100644
--- a/docs/en/sql-reference/sql-statements/Data Definition/HLL.md	
+++ b/docs/en/sql-reference/sql-statements/Data Definition/HLL.md	
@@ -72,7 +72,7 @@ distributed by hash(id) buckets 32;
         curl --location-trusted -uname:password -T data -H "label:load_1" -H "columns:dt, id, name, province, sex, cuid, os, set1=hll_hash(cuid), set2=hll_hash(os)"
             http://host/api/test_db/test/_stream_load
 
-3. There are three common ways of aggregating data: (without aggregating the base table directly, the speed may be similar to that of using NDV directly)
+3. There are three common ways of aggregating data: (without aggregating the base table directly, the speed may be similar to that of using APPROX_COUNT_DISTINCT directly)
 
 A. Create a rollup that allows HLL columns to generate aggregation.
 alter table test add rollup test_rollup(dt, set1);
diff --git a/docs/zh-CN/getting-started/hit-the-rollup.md b/docs/zh-CN/getting-started/hit-the-rollup.md
index 6f3dd6e..9ab8d5b 100644
--- a/docs/zh-CN/getting-started/hit-the-rollup.md
+++ b/docs/zh-CN/getting-started/hit-the-rollup.md
@@ -226,7 +226,7 @@ rollup_index4(k4, k6, k5, k1, k2, k3, k7)
 
 以下是可以命中Rollup的一些聚合查询的种类,
 
-| 列类型 查询类型 |  Sum  | Distinct/Count Distinct |   Min |  Max  |  Ndv  |
+| 列类型 查询类型 |  Sum  | Distinct/Count Distinct |   Min |  Max  |  APPROX_COUNT_DISTINCT  |
 |--------------|-------|-------------------------|-------|-------|-------|
 |     Key      | false |           true          |  true |  true | true  |
 |   Value(Sum) |  true |          false          | false | false | false |
diff --git a/docs/zh-CN/sql-reference/sql-functions/aggregate-functions/ndv.md b/docs/zh-CN/sql-reference/sql-functions/aggregate-functions/approx_count_distinct.md
similarity index 83%
rename from docs/zh-CN/sql-reference/sql-functions/aggregate-functions/ndv.md
rename to docs/zh-CN/sql-reference/sql-functions/aggregate-functions/approx_count_distinct.md
index c2e857e..572e58e 100644
--- a/docs/zh-CN/sql-reference/sql-functions/aggregate-functions/ndv.md
+++ b/docs/zh-CN/sql-reference/sql-functions/aggregate-functions/approx_count_distinct.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "NDV",
+    "title": "APPROX_COUNT_DISTINCT",
     "language": "zh-CN"
 }
 ---
@@ -24,11 +24,11 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# NDV
+# APPROX_COUNT_DISTINCT
 ## description
 ### Syntax
 
-`NDV(expr)`
+`APPROX_COUNT_DISTINCT(expr)`
 
 
 返回类似于 COUNT(DISTINCT col) 结果的近似值聚合函数。
@@ -37,12 +37,12 @@ under the License.
 
 ## example
 ```
-MySQL > select ndv(query_id) from log_statis group by datetime;
+MySQL > select approx_count_distinct(query_id) from log_statis group by datetime;
 +-----------------+
-| ndv(`query_id`) |
+| approx_count_distinct(`query_id`) |
 +-----------------+
 | 17721           |
 +-----------------+
 ```
 ##keyword
-NDV
+APPROX_COUNT_DISTINCT
diff --git a/docs/zh-CN/sql-reference/sql-statements/Data Definition/HLL.md b/docs/zh-CN/sql-reference/sql-statements/Data Definition/HLL.md
index dd1a108..1d3f65b 100644
--- a/docs/zh-CN/sql-reference/sql-statements/Data Definition/HLL.md	
+++ b/docs/zh-CN/sql-reference/sql-statements/Data Definition/HLL.md	
@@ -69,7 +69,7 @@ under the License.
         curl --location-trusted -uname:password -T data -H "label:load_1" -H "columns:dt, id, name, province, sex, cuid, os, set1=hll_hash(cuid), set2=hll_hash(os)"
             http://host/api/test_db/test/_stream_load
 
-    3. 聚合数据,常用方式3种:(如果不聚合直接对base表查询,速度可能跟直接使用ndv速度差不多)
+    3. 聚合数据,常用方式3种:(如果不聚合直接对base表查询,速度可能跟直接使用approx_count_distinct速度差不多)
 
       a. 创建一个rollup,让hll列产生聚合,
         alter table test add rollup test_rollup(dt, set1);
diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java
index d581ba4..67436cf 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java
@@ -997,6 +997,17 @@ public class FunctionSet {
                     "_ZN5doris12HllFunctions12hll_finalizeEPN9doris_udf15FunctionContextERKNS1_9StringValE",
                     true, false, true));
 
+            //APPROX_COUNT_DISTINCT
+            //alias of ndv, compute approx count distinct use HyperLogLog
+            addBuiltin(AggregateFunction.createBuiltin("approx_count_distinct",
+                    Lists.newArrayList(t), Type.BIGINT, Type.VARCHAR,
+                    "_ZN5doris12HllFunctions8hll_initEPN9doris_udf15FunctionContextEPNS1_9StringValE",
+                    "_ZN5doris12HllFunctions" + HLL_UPDATE_SYMBOL.get(t),
+                    "_ZN5doris12HllFunctions9hll_mergeEPN9doris_udf15FunctionContextERKNS1_9StringValEPS4_",
+                    "_ZN5doris12HllFunctions13hll_serializeEPN9doris_udf15FunctionContextERKNS1_9StringValE",
+                    "_ZN5doris12HllFunctions12hll_finalizeEPN9doris_udf15FunctionContextERKNS1_9StringValE",
+                    true, false, true));
+
             // BITMAP_UNION_INT
             addBuiltin(AggregateFunction.createBuiltin(BITMAP_UNION_INT,
                     Lists.newArrayList(t), Type.BIGINT, Type.VARCHAR,


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org