You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/03/04 09:23:26 UTC

[GitHub] [incubator-doris] qidaye commented on a change in pull request #5459: [Bucket Shuffle Join] Support the some featrue of Bucket Shuffle Join

qidaye commented on a change in pull request #5459:
URL: https://github.com/apache/incubator-doris/pull/5459#discussion_r587297331



##########
File path: docs/en/administrator-guide/bucket-shuffle-join.md
##########
@@ -0,0 +1,106 @@
+```
+---
+{
+    "title": "Bucket Shuffle Join",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+```
+# Bucket Shuffle Join
+
+Bucket Shuffle Join is a new function officially added in Doris 0.14. The purpose is to provide local optimization for some join queries to reduce the time-consuming of data transmission between nodes and speed up the query.
+
+It's design, implementation can be referred to [ISSUE 4394](https://github.com/apache/incubator-doris/issues/4394)。
+
+## Noun Interpretation
+
+* FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
+* BE: Backend, Doris's back-end node. Responsible for query execution and data storage.
+* Left table: the left table in join query. Perform probe expr. The order can be adjusted by join reorder.
+* Right table: the right table in join query. Perform build expr The order can be adjusted by join reorder.
+
+## Principle
+In addition to bucket shuffle join, Doris supports three types of join: `Shuffle Join, Broadcast Join, Colocate Join`.  Except `colorate join`, other types of join will lead to some network overhead.
+
+For example, there are join queries for table A and table B. the join method is hashjoin. The cost of different join types is as follows:
+* **Broadcast Join**: If table a has three executing hashjoinnodes according to the data distribution, table B needs to be sent to the three HashJoinNode. Its network overhead is `3B `, and its memory overhead is `3B`. 
+* **Shuffle Join**: Shuffle join will distribute the data of tables A and B to the nodes of the cluster according to hash calculation, so its network overhead is `A + B` and memory overhead is `B`.
+
+The data distribution information of each Doris table is saved in FE. If the join statement hits the data distribution column of the left table, we should use the data distribution information to reduce the network and memory overhead of the join query. This is the source of the idea of bucket shuffle join.
+
+![image.png](https://upload-images.jianshu.io/upload_images/8552201-c383fe84aeee13bc.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

Review comment:
       This image should be included in the Doris repository and should not rely on a third-party link.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org