You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by ya...@apache.org on 2020/10/14 01:35:08 UTC
[incubator-doris] branch master updated: [Docs] update data types
doc and fix some typo (#4712)
This is an automated email from the ASF dual-hosted git repository.
yangzhg pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git
The following commit(s) were added to refs/heads/master by this push:
new a605b31 [Docs] update data types doc and fix some typo (#4712)
a605b31 is described below
commit a605b3160f27b2afac75ae9aa887f770a5ce7618
Author: xueyan.li <as...@163.com>
AuthorDate: Wed Oct 14 09:34:58 2020 +0800
[Docs] update data types doc and fix some typo (#4712)
* update data types doc and fix some typo
* update data types doc and fix some typo
Co-authored-by: lixueyan07 <li...@meituan.com>
---
docs/.vuepress/sidebar/en.js | 3 +-
docs/.vuepress/sidebar/zh-CN.js | 1 +
.../sql-statements/Data Types/BITMAP.md | 48 +++++++++++++++++++++
.../sql-statements/Data Types/HLL(HyperLogLog).md | 35 ----------------
.../sql-reference/sql-statements/Data Types/HLL.md | 49 ++++++++++++++++++++++
.../sql-statements/Data Types/VARCHAR.md | 4 +-
.../Data Types/{HLL.md => BITMAP.md} | 26 ++++++++----
.../sql-reference/sql-statements/Data Types/HLL.md | 20 +++++++--
.../sql-statements/Data Types/VARCHAR.md | 4 +-
9 files changed, 141 insertions(+), 49 deletions(-)
diff --git a/docs/.vuepress/sidebar/en.js b/docs/.vuepress/sidebar/en.js
index 8e27111..d6541cd 100644
--- a/docs/.vuepress/sidebar/en.js
+++ b/docs/.vuepress/sidebar/en.js
@@ -487,6 +487,7 @@ module.exports = [
directoryPath: "Data Types/",
children: [
"BIGINT",
+ "BITMAP",
"BOOLEAN",
"CHAR",
"DATE",
@@ -494,7 +495,7 @@ module.exports = [
"DECIMAL",
"DOUBLE",
"FLOAT",
- "HLL(HyperLogLog)",
+ "HLL",
"INT",
"SMALLINT",
"TINYINT",
diff --git a/docs/.vuepress/sidebar/zh-CN.js b/docs/.vuepress/sidebar/zh-CN.js
index c513e8d..b148691 100644
--- a/docs/.vuepress/sidebar/zh-CN.js
+++ b/docs/.vuepress/sidebar/zh-CN.js
@@ -492,6 +492,7 @@ module.exports = [
directoryPath: "Data Types/",
children: [
"BIGINT",
+ "BITMAP",
"BOOLEAN",
"CHAR",
"DATE",
diff --git a/docs/en/sql-reference/sql-statements/Data Types/BITMAP.md b/docs/en/sql-reference/sql-statements/Data Types/BITMAP.md
new file mode 100644
index 0000000..29d8a75
--- /dev/null
+++ b/docs/en/sql-reference/sql-statements/Data Types/BITMAP.md
@@ -0,0 +1,48 @@
+---
+{
+ "title": "BITMAP",
+ "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+#BITMAP
+## Description
+BITMAP
+
+BITMAP cannot be used as a key column, and the aggregation type is BITMAP_UNION when building the table.
+The user does not need to specify the length and default value. The length is controlled within the system according to the degree of data aggregation.
+And the BITMAP column can only be queried or used by supporting functions such as bitmap_union_count, bitmap_union, and bitmap_hash.
+
+The use of BITMAP in offline scenarios will affect the import speed. In the case of a large amount of data, the query speed will be slower than HLL and better than Count Distinct.
+Note: If BITMAP does not use a global dictionary in real-time scenarios, using bitmap_hash() may cause an error of about one-thousandth.
+
+## example
+
+ select hour, BITMAP_UNION_COUNT(pv) over(order by hour) uv from(
+ select hour, BITMAP_UNION(device_id) as pv
+ from metric_table -- Query the accumulated UV per hour
+ where datekey=20200922
+ group by hour order by 1
+ ) final;
+
+## keyword
+BITMAP
diff --git a/docs/en/sql-reference/sql-statements/Data Types/HLL(HyperLogLog).md b/docs/en/sql-reference/sql-statements/Data Types/HLL(HyperLogLog).md
deleted file mode 100644
index 7a511e2..0000000
--- a/docs/en/sql-reference/sql-statements/Data Types/HLL(HyperLogLog).md
+++ /dev/null
@@ -1,35 +0,0 @@
----
-{
- "title": "HLL (Hyloglog)",
- "language": "en"
-}
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-#HLL (Hyloglog)
-## Description
-MARKETING (M)
-A variable length string, M represents the length of a variable length string. The range of M is 1-16385.
-Users do not need to specify length and default values. Length is controlled within the system according to the aggregation degree of data
-And HLL columns can only be queried or used by matching hll_union_agg, hll_raw_agg, hll_cardinality, hll_hash.
-
-## keyword
-High loglog, hll, hyloglog
diff --git a/docs/en/sql-reference/sql-statements/Data Types/HLL.md b/docs/en/sql-reference/sql-statements/Data Types/HLL.md
new file mode 100644
index 0000000..999a897
--- /dev/null
+++ b/docs/en/sql-reference/sql-statements/Data Types/HLL.md
@@ -0,0 +1,49 @@
+---
+{
+ "title": "HLL (HyperLogLog)",
+ "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+#HLL (HyperLogLog)
+## Description
+HLL
+
+HLL cannot be used as a key column, and the aggregation type is HLL_UNION when create table.
+The user does not need to specify the length and default value.
+The length is controlled within the system according to the degree of data aggregation.
+And HLL columns can only be queried or used through the matching hll_union_agg, hll_raw_agg, hll_cardinality, and hll_hash.
+
+HLL is approximate count of distinct elements, and its performance is better than Count Distinct when the amount of data is large.
+The error of HLL is usually around 1%, sometimes up to 2%.
+
+## example
+
+ select hour, HLL_UNION_AGG(pv) over(order by hour) uv from(
+ select hour, HLL_RAW_AGG(device_id) as pv
+ from metric_table -- Query the accumulated UV per hour
+ where datekey=20200922
+ group by hour order by 1
+ ) final;
+
+## keyword
+HLL,HYPERLOGLOG
diff --git a/docs/en/sql-reference/sql-statements/Data Types/VARCHAR.md b/docs/en/sql-reference/sql-statements/Data Types/VARCHAR.md
index 268a562..ae7fa85 100644
--- a/docs/en/sql-reference/sql-statements/Data Types/VARCHAR.md
+++ b/docs/en/sql-reference/sql-statements/Data Types/VARCHAR.md
@@ -27,7 +27,9 @@ under the License.
# VARCHAR
## Description
MARKETING (M)
-A variable length string, M represents the length of a variable length string. The range of M is 1-65535.
+A variable length string, M represents the length of a variable length string. The range of M is 1-65533.
+
+Note: Variable length strings are stored in UTF-8 encoding, so usually English characters occupies 1 byte, and Chinese characters occupies 3 bytes.
## keyword
VARCHAR
diff --git a/docs/zh-CN/sql-reference/sql-statements/Data Types/HLL.md b/docs/zh-CN/sql-reference/sql-statements/Data Types/BITMAP.md
similarity index 51%
copy from docs/zh-CN/sql-reference/sql-statements/Data Types/HLL.md
copy to docs/zh-CN/sql-reference/sql-statements/Data Types/BITMAP.md
index 7357f4e..c92e20b 100644
--- a/docs/zh-CN/sql-reference/sql-statements/Data Types/HLL.md
+++ b/docs/zh-CN/sql-reference/sql-statements/Data Types/BITMAP.md
@@ -1,6 +1,6 @@
---
{
- "title": "HLL(HyperLogLog)",
+ "title": "BITMAP",
"language": "zh-CN"
}
---
@@ -24,13 +24,25 @@ specific language governing permissions and limitations
under the License.
-->
-# HLL(HyperLogLog)
+# BITMAP
## description
- VARCHAR(M)
- 变长字符串,M代表的是变长字符串的长度。M的范围是1-16385
- 用户不需要指定长度和默认值。长度根据数据的聚合程度系统内控制
- 并且HLL列只能通过配套的hll_union_agg、hll_raw_agg、hll_cardinality、hll_hash进行查询或使用
+ BITMAP
+ BITMAP不能作为key列使用,建表时配合聚合类型为BITMAP_UNION。
+ 用户不需要指定长度和默认值。长度根据数据的聚合程度系统内控制。
+ 并且BITMAP列只能通过配套的bitmap_union_count、bitmap_union、bitmap_hash等函数进行查询或使用。
+
+ 离线场景下使用BITMAP会影响导入速度,在数据量大的情况下查询速度会慢于HLL,并优于Count Distinct。
+ 注意:实时场景下BITMAP如果不使用全局字典,使用了bitmap_hash()可能会导致有千分之一左右的误差。
+
+## example
+
+ select hour, BITMAP_UNION_COUNT(pv) over(order by hour) uv from(
+ select hour, BITMAP_UNION(device_id) as pv
+ from metric_table -- 查询每小时的累计UV
+ where datekey=20200622
+ group by hour order by 1
+ ) final;
## keyword
- HLL,HYPERLOGLOG
+ BITMAP
diff --git a/docs/zh-CN/sql-reference/sql-statements/Data Types/HLL.md b/docs/zh-CN/sql-reference/sql-statements/Data Types/HLL.md
index 7357f4e..b261495 100644
--- a/docs/zh-CN/sql-reference/sql-statements/Data Types/HLL.md
+++ b/docs/zh-CN/sql-reference/sql-statements/Data Types/HLL.md
@@ -26,10 +26,22 @@ under the License.
# HLL(HyperLogLog)
## description
- VARCHAR(M)
- 变长字符串,M代表的是变长字符串的长度。M的范围是1-16385
- 用户不需要指定长度和默认值。长度根据数据的聚合程度系统内控制
- 并且HLL列只能通过配套的hll_union_agg、hll_raw_agg、hll_cardinality、hll_hash进行查询或使用
+ HLL
+ HLL不能作为key列使用,建表时配合聚合类型为HLL_UNION。
+ 用户不需要指定长度和默认值。长度根据数据的聚合程度系统内控制。
+ 并且HLL列只能通过配套的hll_union_agg、hll_raw_agg、hll_cardinality、hll_hash进行查询或使用。
+
+ HLL是模糊去重,在数据量大的情况性能优于Count Distinct。
+ HLL的误差通常在1%左右,有时会达到2%。
+
+## example
+
+ select hour, HLL_UNION_AGG(pv) over(order by hour) uv from(
+ select hour, HLL_RAW_AGG(device_id) as pv
+ from metric_table -- 查询每小时的累计UV
+ where datekey=20200622
+ group by hour order by 1
+ ) final;
## keyword
diff --git a/docs/zh-CN/sql-reference/sql-statements/Data Types/VARCHAR.md b/docs/zh-CN/sql-reference/sql-statements/Data Types/VARCHAR.md
index 5941678..178b56f 100644
--- a/docs/zh-CN/sql-reference/sql-statements/Data Types/VARCHAR.md
+++ b/docs/zh-CN/sql-reference/sql-statements/Data Types/VARCHAR.md
@@ -27,7 +27,9 @@ under the License.
# VARCHAR
## description
VARCHAR(M)
- 变长字符串,M代表的是变长字符串的长度。M的范围是1-65535
+ 变长字符串,M代表的是变长字符串的长度。M的范围是1-65533。
+
+ 注意:变长字符串是以UTF-8编码存储的,因此通常英文字符占1个字节,中文字符占3个字节。
## keyword
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org