You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by ji...@apache.org on 2022/07/29 05:05:35 UTC
[doris] branch master updated: [doc]Added auto_broadcast_join_threshold variable description (#11323)
This is an automated email from the ASF dual-hosted git repository.
jiafengzheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 6d0c59d4f0 [doc]Added auto_broadcast_join_threshold variable description (#11323)
6d0c59d4f0 is described below
commit 6d0c59d4f07be4eb0fa82d056c2ac5b6c0679ec7
Author: jiafeng.zhang <zh...@gmail.com>
AuthorDate: Fri Jul 29 13:05:30 2022 +0800
[doc]Added auto_broadcast_join_threshold variable description (#11323)
Add auto_broadcast_join_threshold variable description
---
docs/en/docs/advanced/variables.md | 21 +++++++++++++++++++++
docs/zh-CN/docs/advanced/variables.md | 20 ++++++++++++++++++++
2 files changed, 41 insertions(+)
diff --git a/docs/en/docs/advanced/variables.md b/docs/en/docs/advanced/variables.md
index a8960c4150..016ab933c1 100644
--- a/docs/en/docs/advanced/variables.md
+++ b/docs/en/docs/advanced/variables.md
@@ -114,6 +114,27 @@ Note that the comment must start with /*+ and can only follow the SELECT.
Used for compatibility with MySQL clients. No practical effect.
+* `auto_broadcast_join_threshold`
+
+ The maximum size in bytes of the table that will be broadcast to all nodes when a join is performed, broadcast can be disabled by setting this value to -1.
+
+ The system provides two join implementation methods, `broadcast join` and `shuffle join`.
+
+ `broadcast join` means that after conditional filtering the small table, broadcast it to each node where the large table is located to form an in-memory Hash table, and then stream the data of the large table for Hash Join.
+
+ `shuffle join` refers to hashing both small and large tables according to the join key, and then performing distributed join.
+
+ `broadcast join` has better performance when the data volume of the small table is small. On the contrary, shuffle join has better performance.
+
+ The system will automatically try to perform a Broadcast Join, or you can explicitly specify the implementation of each join operator. The system provides a configurable parameter `auto_broadcast_join_threshold`, which specifies the upper limit of the memory used by the hash table to the overall execution memory when `broadcast join` is used. The value ranges from 0 to 1, and the default value is 0.8. When the memory used by the system to calculate the hash table exceeds this limit, [...]
+
+ The overall execution memory here is: a fraction of what the query optimizer estimates
+
+ > Note:
+ >
+ > It is not recommended to use this parameter to adjust, if you must use a certain join, it is recommended to use hint, such as join[shuffle]
+
+
* `batch_size`
Used to specify the number of rows of a single packet transmitted by each node during query execution. By default, the number of rows of a packet is 1024 rows. That is, after the source node generates 1024 rows of data, it is packaged and sent to the destination node.
diff --git a/docs/zh-CN/docs/advanced/variables.md b/docs/zh-CN/docs/advanced/variables.md
index 5670c43120..3b938c42a2 100644
--- a/docs/zh-CN/docs/advanced/variables.md
+++ b/docs/zh-CN/docs/advanced/variables.md
@@ -113,6 +113,26 @@ SELECT /*+ SET_VAR(query_timeout = 1, enable_partition_cache=true) */ sleep(3);
用于兼容 MySQL 客户端。无实际作用。
+- `auto_broadcast_join_threshold`
+
+ 执行连接时将向所有节点广播的表的最大字节大小,通过将此值设置为 -1 可以禁用广播。
+
+ 系统提供了两种 Join 的实现方式,`broadcast join` 和 `shuffle join`。
+
+ `broadcast join` 是指将小表进行条件过滤后,将其广播到大表所在的各个节点上,形成一个内存 Hash 表,然后流式读出大表的数据进行 Hash Join。
+
+ `shuffle join` 是指将小表和大表都按照 Join 的 key 进行 Hash,然后进行分布式的 Join。
+
+ 当小表的数据量较小时,`broadcast join` 拥有更好的性能。反之,则shuffle join拥有更好的性能。
+
+ 系统会自动尝试进行 Broadcast Join,也可以显式指定每个join算子的实现方式。系统提供了可配置的参数 `auto_broadcast_join_threshold`,指定使用 `broadcast join` 时,hash table 使用的内存占整体执行内存比例的上限,取值范围为0到1,默认值为0.8。当系统计算hash table使用的内存会超过此限制时,会自动转换为使用 `shuffle join`
+
+ 这里的整体执行内存是:查询优化器做估算的一个比例
+
+ >注意:
+ >
+ >不建议用这个参数来调整,如果必须要使用某一种join,建议使用hint,比如 join[shuffle]
+
- `batch_size`
用于指定在查询执行过程中,各个节点传输的单个数据包的行数。默认一个数据包的行数为 1024 行,即源端节点每产生 1024 行数据后,打包发给目的节点。
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org