You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/24 13:33:33 UTC
[GitHub] [incubator-doris] dataalive commented on a diff in pull request #9753: [doc]Add Doris join optimization documentation

dataalive commented on code in PR #9753:
URL: https://github.com/apache/incubator-doris/pull/9753#discussion_r880512484


##########
docs/en/advanced/join-optimization/doris-join-optimization.md:
##########
@@ -0,0 +1,222 @@
+---
+{
+    "title": "Doris Join optimization principle",
+    "language": "en"
+}
+
+
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Doris Join optimization principle
+
+Doris supports two physical operators, one is **Hash Join**, and the other is **Nest Loop Join**.
+
+- Hash Join: Create a hash table on the right table based on the equivalent join column, and the left table uses the hash table to perform join calculations in a streaming manner. Its limitation is that it can only be applied to equivalent joins.
+- Nest Loop Join: With two for loops, it is very intuitive. Then it is applicable to unequal-valued joins, such as: greater than or less than or the need to find a Cartesian product. It is a general join operator, but has poor performance.
+
+As a distributed MPP database, data shuffle needs to be performed during the Join process. Data needs to be split and scheduled to ensure that the final Join result is correct. As a simple example, assume that the relationship S and R are joined, and N represents the number of nodes participating in the join calculation; T represents the number of tuples in the relationship.
+
+
+
+## Doris Shuffle way
+
+1. Doris supports 4 Shuffle methods
+
+   1. BroadCast Join
+
+       It requires the full data of the right table to be sent to the left table, that is, each node participating in Join has the full data of the right table, that is, T(R).
+
+       Its applicable scenarios are more general, and it can support Hash Join and Nest loop Join at the same time, and its network overhead is N * T(R).
+
+   ![image-20220523152004731](/images/join/image-20220523152004731.png)
+
+   The data in the left table is not moved, and the data in the right table is sent to the scanning node of the data in the left table.
+
+2. Shuffle Join
+
+   When Hash Join is performed, the corresponding Hash value can be calculated through the Join column, and Hash bucketing can be performed.
+
+   Its network overhead is: T(R) + T(N), but it can only support Hash Join, because it also calculates buckets according to the conditions of Join.
+
+   ![image-20220523151902368](/images/join/image-20220523151902368.png)
+
+   The left and right table data are sent to different partition nodes according to the partition, and the calculated demerits are sent.
+
+3. Bucket Shuffle Join
+
+   Doris's table data itself is bucketed by Hash calculation, so you can use the properties of the bucketed columns of the table itself to shuffle the Join data. If two tables need to be joined, and the Join column is the bucket column of the left table, then the data in the left table can actually be calculated by sending the data into the buckets of the left table without moving the data in the right table.
+
+   Its network overhead is: T(R) is equivalent to only Shuffle the data in the right table.
+
+   ![image-20220523151653562](/images/join/image-20220523151653562.png)
+
+   The data in the left table does not move, and the data in the right table is sent to the node that scans the table in the left table according to the result of the partition calculation.
+
+4. Colocation 
+
+   It is similar to Bucket Shuffle Join, which means that the data has been shuffled according to the preset Join column scenario when data is imported. Then the join calculation can be performed directly without considering the Shuffle problem of the data during the actual query.
+
+   ![image-20220523151619754](/images/join/image-20220523151619754.png)
+
+   The data has been pre-partitioned, and the Join calculation is performed directly locally
+
+### Comparison of four Shuffle methods
+
+| Shuffle Mode | Network Overhead | Physical Operators | Applicable Scenarios |
+| -------------- | ------------- | ------------ ---- | --------------------------------------------- --------------- |
+| BroadCast | N * T(R) | Hash Join / Nest Loop Join | Universal |
+| Shuffle | T(S) + T(R) | Hash Join | General |
+| Bucket Shuffle | T(R) | Hash Join | There are distributed columns in the left table in the join condition, and the left table is executed as a single partition |
+| Colocate | 0 | Hash Join | There are distributed columns in the left table in the join condition, and the left and right tables belong to the same Colocate Group |
+
+N : The number of Instances participating in the Join calculation
+
+T(relation) : Tuple number of relation
+
+The flexibility of the above four methods is from high to low, and its requirements for this data distribution are becoming more and more strict, but the performance of Join calculation is also getting better and better.
+
+## Runtime Filter Join optimization
+
+Doris will build a hash table in the right table when performing Hash Join calculation, and the left table will stream through the hash table of the right table to obtain the join result. The RuntimeFilter makes full use of the Hash table of the right table. When the right table generates a hash table, a filter condition based on the hash table data is generated at the same time, and then pushed down to the data scanning node of the left table. In this way, Doris can perform data filtering at runtime.
+
+If the left table is a large table and the right table is a small table, then using the filter conditions generated by the left table, most of the data to be filtered in the Join layer can be filtered in advance when the data is read, so that a large amount of data can be filtered. Improve the performance of join queries.
+
+Currently Doris supports three types of RuntimeFilter
+
+- One is IN-IN, which is well understood, and pushes a hashset down to the data scanning node.
+- The second is BloomFilter, which uses the data of the hash table to construct a BloomFilter, and then pushes the BloomFilter down to the scanning node that queries the data. .
+- The last one is MinMax, which is a Range range. After the Range range is determined by the data in the right table, it is pushed down to the data scanning node.
+
+There are two requirements for the applicable scenarios of Runtime Filter:
+
+- The first requirement is that the right table is large and the left table is small, because building a Runtime Filter needs to bear the computational cost, including some memory overhead.
+- The second requirement is that there are few results from the join of the left and right tables, indicating that this join can filter out most of the data in the left table.
+
+When the above two conditions are met, turning on the Runtime Filter can achieve better results
+
+When the Join column is the Key column of the left table, the RuntimeFilter will be pushed down to the storage engine. Doris itself supports delayed materialization,
+
+Delayed materialization is simply like this: if you need to scan three columns A, B, and C, there is a filter condition on column A: A is equal to 2, if you want to scan 100 rows, you can scan 100 rows of column A first, Then filter through the filter condition A = 2. After filtering the results, read columns B and C, which can greatly reduce the data read IO. Therefore, if the Runtime Filter is generated on the Key column, and the delayed materialization of Doris itself is used to further improve the performance of the query.
+
+### Runtime Filter Type
+
+- Doris provides three different Runtime Filter types:
+  - The advantage of **IN** is that the effect filtering effect is obvious and fast. Its shortcomings are: First, it only applies to BroadCast. Second, when the right table exceeds a certain amount of data, it will fail. The current Doris configuration is 1024, that is, if the right table is larger than 1024, the Runtime Filter of IN will directly failed.
+  - The advantage of **MinMax** is that the overhead is relatively small. Its disadvantage is that it has a relatively good effect on numeric columns, but basically no effect on non-numeric columns.
+  - The feature of **Bloom Filter** is that it is universal, suitable for various types, and the effect is better. The disadvantage is that its configuration is more complicated and the calculation is high.
+
+## Join Reader
+
+Once the database involves multi-table Join, the order of Join has a great impact on the performance of the entire Join query. Assuming that there are three tables to join, refer to the following picture, the left is the a table and the b table to do the join first, the intermediate result has 2000 rows, and then the c table is joined.
+
+Next, look at the picture on the right and adjust the order of Join. Join the a table with the c table first, the intermediate result generated is only 100, and then finally join with the b table for calculation. The final join result is the same, but the intermediate result it generates has a 20x difference, which results in a big performance diff.
+
+![image-20220523152639123](/images/join/image-20220523152639123.png)
+
+- Doris currently supports the rule-based Join Reorder algorithm. Its logic is:
+  - Make joins with large tables and small tables as much as possible, and the intermediate results it generates are as small as possible.
+  - Put the conditional join table forward, that is to say, try to filter the conditional join table
+  - Hash Join has higher priority than Nest Loop Join, because Hash Join itself is much faster than Nest Loop Join.
+
+## Doris Join optimization method
+
+Doris Join tuning method:
+
+- Use the Profile provided by Doris itself to locate the bottleneck of the query. Profile records various information in Doris' entire query, which is first-hand information for performance tuning. .
+- Understand the Join mechanism of Doris, which is also the content shared with you in the second part. Only by knowing why and understanding its mechanism can we analyze why it is slow.
+- Use Session variables to change some behaviors of Join, so as to realize the tuning of Join.
+- Check the Query Plan to analyze whether this tuning is effective.
+
+The above 4 steps basically complete a standard Join tuning process, and then it is to actually query and verify it to see what the effect is.
+
+If the first 4 methods are connected in series, it still does not work. At this time, it may be necessary to rewrite the Join statement, or to adjust the data distribution. It is necessary to recheck whether the entire data distribution is reasonable, including querying the Join statement, and some manual adjustments may be required. Of course, this method has a relatively high mental cost, which means that further analysis is required only when the previous method does not work.
+
+## Optimization case practice
+
+### Case one

Review Comment:
   ```suggestion
   ### Case one -> Case 1 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org