You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/07/05 08:12:42 UTC

[GitHub] [incubator-doris] xinyiZzz opened a new pull request #6154: [DOC] Add docs of Runtime Filter

xinyiZzz opened a new pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154


   ## Proposed changes
   
   Add docs of Runtime Filter #6116
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [ ] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
   - [x] Documentation Update (if none of the other choices apply)
   - [ ] Code refactor (Modify the code structure, format the code, etc...)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._
   
   - [ ] I have created an issue on (Fix #ISSUE) and described the bug/feature there in detail
   - [ ] Compiling and unit tests pass locally with my changes
   - [ ] I have added tests that prove my fix is effective or that my feature works
   - [ ] If these changes need document changes, I have updated the document
   - [ ] Any dependent changes have been merged
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] pengxiangyu commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

pengxiangyu commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666649268



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### 设置Session变量
+除`runtime_filter_type`外，其他查询选项用于调整Runtime Filter在特定场景下达到最佳性能，通常只在性能测试后进行调整，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型，包括Bloom Filter、MinMax Filter、IN predicate三种类型，使用多个时用逗号分隔，注意需要加引号，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+
+- **Bloom Filter**: 有一定的误判率，导致过滤的数据比预期少一点，但不会导致最终结果不准确，在大部分情况下Bloom Filter都可以提升性能或对性能没有显著影响，但在部分情况下会导致性能降低。
+    - Bloom Filter构建和应用的开销较高，所以当过滤率较低时，或者左表数据量较少时，Bloom Filter可能会导致性能降低。
+    - 目前只有左表的Key列应用Bloom Filter才能下推到存储引擎，而测试结果显示Bloom Filter不下推到存储引擎时往往会导致性能降低。
+    - 目前Bloom Filter仅在ScanNode上使用表达式过滤时有短路(short-circuit)逻辑，即当假阳性率过高时，不继续使用Bloom Filter，但当Bloom Filter下推到存储引擎后没有短路逻辑，所以当过滤率较低时可能导致性能降低。
+- **MinMax Filter**: 包含最大值和最小值，从而过滤小于最小值和大于最大值的数据，MinMax Filter的过滤效果与join on clause中Key列的类型和左右表数据分布有关。
+    - 当join on clause中Key列的类型为int/bigint/double等时，极端情况下，如果左右表的最大最小值相同则没有效果，反之右表最大值小于左表最小值，或右表最小值大于左表最大值，则效果最好。
+    - 当join on clause中Key列的类型为varchar等时，应用MinMax Filter往往会导致性能降低。
+- **IN predicate**: 根据join on clause中Key列在右表上的所有值构建IN predicate，使用构建的IN predicate在左表上过滤，相比loom Filter构建和应用的开销更低，在右表数据量较少时往往性能更高。

Review comment:
       loom Filter拼写错误




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

xinyiZzz commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666701593



##########
File path: docs/en/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,225 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter is a new feature officially added in Doris 0.15. It is designed to dynamically generate filter conditions for certain Join queries at runtime to reduce the amount of scanned data, avoid unnecessary I/O and network transmission, and speed up the query.
+
+It's design, implementation and effects, please refer to [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116).
+
+## Noun Interpretation
+* FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
+* BE: Backend, the back-end node of Doris. Responsible for query execution and data storage.
+* Left table: the table on the left during Join query. Perform Probe operation. The order can be adjusted by Join Reorder.
+* Right table: the table on the right during Join query. Perform the Build operation. The order can be adjusted by Join Reorder.
+* Fragment: FE will convert the execution of specific SQL statements into corresponding fragments and send them to BE for execution. The corresponding Fragment is executed on the BE, and the results are aggregated and returned to the FE.
+* Join on clause: `Aa=Bb` in `A join B on Aa=Bb`, based on this to generate join conjuncts during query planning, including expr used by join Build and Probe, where Build expr is called in Runtime Filter src expr, Probe expr are called target expr in Runtime Filter.
+
+## Principle
+Runtime Filter is generated during query planning, constructed in HashJoinNode, and applied in ScanNode.
+
+For example, there is currently a Join query between the T1 table and the T2 table. Its Join mode is HashJoin. T1 is a fact table with 100,000 rows of data. T2 is a dimension table with 100 rows of data. Doris join The actual situation is:
+```

Review comment:
       Modified, thks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] hf200012 commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

hf200012 commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r663733418



##########
File path: docs/en/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter is a new feature officially added in Doris 0.15. It is designed to dynamically generate filter conditions for certain Join queries at runtime to reduce the amount of scanned data, avoid unnecessary I/O and network transmission, and speed up the query.
+
+It's design, implementation and effects, please refer to [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116).
+
+## Noun Interpretation
+* FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
+* BE: Backend, the back-end node of Doris. Responsible for query execution and data storage.
+* Left table: the table on the left during Join query. Perform Probe operation. The order can be adjusted by Join Reorder.
+* Right table: the table on the right during Join query. Perform the Build operation. The order can be adjusted by Join Reorder.
+* Fragment: FE will convert the execution of specific SQL statements into corresponding fragments and send them to BE for execution. The corresponding Fragment is executed on the BE, and the results are aggregated and returned to the FE.
+* Join on clause: `Aa=Bb` in `A join B on Aa=Bb`, based on this to generate join conjuncts during query planning, including expr used by join Build and Probe, where Build expr is called in Runtime Filter src expr, Probe expr are called target expr in Runtime Filter.
+
+## Principle
+Runtime Filter is generated during query planning, constructed in HashJoinNode, and applied in ScanNode.
+
+For example, there is currently a Join query between the T1 table and the T2 table. Its Join mode is HashJoin. T1 is a fact table with 100,000 rows of data. T2 is a dimension table with 100 rows of data. Doris join The actual situation is:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+Obviously, scanning data for T2 is much faster than T1. If we take the initiative to wait for a while and then scan T1, after T2 sends the scanned data record to HashJoinNode, HashJoinNode calculates a filter condition based on the data of T2, such as the maximum value of T2 data And the minimum value, or build a Bloom Filter, and then send this filter condition to ScanNode waiting to scan T1, the latter applies this filter condition and delivers the filtered data to HashJoinNode, thereby reducing the number of probe hash tables and network overhead. This filter condition is Runtime Filter, and the effect is as follows:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+If the filter condition (Runtime Filter) can be pushed down to the storage engine, in some cases, the index can be used to directly reduce the amount of scanned data, thereby greatly reducing the scanning time. The effect is as follows:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+It can be seen that, unlike predicate push-down and partition cutting, Runtime Filter is a filter condition dynamically generated at runtime, that is, when the query is run, the join on clause is parsed to determine the filter expression, and the expression is broadcast to ScanNode that is reading the left table , Thereby reducing the amount of scanned data, thereby reducing the number of probe hash table, avoiding unnecessary I/O and network transmission.
+
+Runtime Filter is mainly used to optimize joins for large tables. If the amount of data in the left table is too small, or the amount of data in the right table is too large, the Runtime Filter may not achieve the expected effect.
+
+## Usage
+
+### Set session variable
+In addition to `runtime_filter_type`, other query options are used to adjust the Runtime Filter to achieve the best performance in a specific scenario, usually only after the performance test, to optimize the query that is resource-intensive, takes a long enough time to run, and has a high enough frequency .
+
+#### 1.runtime_filter_type
+The types of Runtime Filter used include Bloom Filter, MinMax Filter, and IN predicate. When multiple types are used, they are separated by commas. Note that you need to add quotation marks. For example:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+By default, only IN predicate is used conservatively. In some cases, the performance is higher when Bloom Filter, MinMax Filter, and IN predicate are used at the same time.
+
+- **Bloom Filter**: There is a certain misjudgment rate, resulting in filtered data less than expected, but it will not lead to inaccurate final results. In most cases, Bloom Filter can improve performance or have no significant impact on performance , But in some cases it will cause performance degradation.
+    - Bloom Filter construction and application overhead is high, so when the filtering rate is low, or the amount of data in the left table is small, Bloom Filter may cause performance degradation.
+    - At present, only the Key column of the left table can be pushed down to the storage engine if the Bloom Filter is applied, and the test results show that the performance of the Bloom Filter is not pushed down to the storage engine.
+    - Currently Bloom Filter only has short-circuit logic when using expression filtering on ScanNode, that is, when the false positive rate is too high, the Bloom Filter will not continue to be used, but there is no short-circuit logic when the Bloom Filter is pushed down to the storage engine , So when the filtration rate is low, it may cause performance degradation.
+- **MinMax Filter**: Contains the maximum value and the minimum value, thereby filtering data smaller than the minimum value and greater than the maximum value. The filtering effect of the MinMax Filter is related to the type of the Key column in the join on clause and the data distribution of the left and right tables.
+    - When the type of the Key column in the join on clause is int/bigint/double, etc., in extreme cases, if the maximum and minimum values of the left and right tables are the same, there is no effect, otherwise the maximum value of the right table is less than the minimum value of the left table, or the minimum of the right table The value is greater than the maximum value in the left table, the effect is best.
+    - When the type of the Key column in the join on clause is varchar, etc., applying the MinMax Filter will often cause performance degradation.
+- **IN predicate**: Construct the IN predicate based on all the values of the Key listed in the join on clause on the right table, and use the constructed IN predicate to filter on the left table. Compared with the loom filter, the cost of construction and application is lower. The table on the right tends to have higher performance when the amount of data is small.
+    - By default, only the number of data rows in the right table is less than 1024 will be pushed down (can be adjusted by the `runtime_filter_max_in_num` in the session variable).
+    - Currently IN predicate does not implement a merge method, that is, it cannot be pushed down across Fragments, so currently when it is necessary to push down to the ScanNode of the left table of shuffle join, if Bloom Filter is not generated, then we will convert IN predicate to Bloom Filter for Process pushdown across Fragments, so even if the type only selects IN predicate, Bloom Filter may actually be applied;
+
+#### 2.runtime_filter_mode
+It is used to adjust the push-down strategy of Runtime Filter, including two strategies, LOCAL and GLOBAL, and the default setting is GLOBAL strategy.
+
+LOCAL: Relatively conservative, the constructed Runtime Filter can only be used in the same Fragment on the same instance (the smallest unit of query execution), that is, the Runtime Filter producer (the HashJoinNode that constructs the Filter) and the consumer (the ScanNode that uses the RuntimeFilter) The same Fragment, such as the general scene of broadcast join;
+
+GLOBAL: Relatively radical. In addition to satisfying the scenario of the LOCAL strategy, the Runtime Filter can also be combined and transmitted to different Fragments on different instances via the network. For example, Runtime Filter producers and consumers are in different Fragments, such as shuffle join. The query can be optimized in a wider range of scenarios.
+
+When building and applying Runtime Filters on different Fragments, the reasons and strategies for merging Runtime Filters can be found in [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)
+
+#### 3.runtime_filter_wait_time_ms
+After the Runtime Filter is turned on, the ScanNode that joins the left table will wait for a period of time for each Runtime Filter assigned to itself and then scan the data. The default wait is 1s (1000ms, in milliseconds), that is, if the ScanNode is assigned 3 Runtime Filters , Then it will wait at most 3s.
+
+Because it takes time to build and merge the Runtime Filter, ScanNode will try to push down the Runtime Filter that arrives within the waiting time to the storage engine. If the waiting time is exceeded, ScanNode will directly start scanning data using the Runtime Filter that has arrived.
+
+If the Runtime Filter arrives after ScanNode starts scanning, ScanNode will not push the Runtime Filter down to the storage engine. Instead, it will use expression filtering on ScanNode based on the Runtime Filter for the data that has been scanned from the storage engine. The scanned data will not apply the Runtime Filter, so the intermediate data size obtained will be larger than the optimal solution, but serious cracking can be avoided.
+
+If the cluster is busy and there are many resource-intensive or long-time-consuming queries on the cluster, consider increasing the waiting time to avoid missing optimization opportunities for complex queries. If the cluster load is light, and there are many small queries on the cluster that only take a few seconds, you can consider reducing the waiting time to avoid an increase of 1s for each query.
+
+#### 4.runtime_filters_max_num
+The maximum number of Bloom Filters in the Runtime Filter that can be applied to each query, the default is 10. Because Bloom Filter construction and application costs are high, so if the generated Bloom Filter exceeds the maximum allowed number, the Bloom Filter with large selectivity is retained. Currently, only the number of Bloom Filters is limited, because compared to MinMax Filter and IN predicate It is more expensive to build and apply.
+```
+Selectivity = (HashJoinNode Cardinality / HashJoinNode left child Cardinality)
+```
+Because the cardinality of FE is currently inaccurate, the selectivity of Bloom Filter calculation here is inaccurate, so in the end it may only randomly reserve part of Bloom Filter.
+
+#### 5. Bloom Filter related parameters
+- `runtime_bloom_filter_min_size`: the minimum length of Bloom Filter in Runtime Filter (in bytes), the default is 1048576 (1M);
+
+- `runtime_bloom_filter_max_size`: the maximum length of Bloom Filter in Runtime Filter (in bytes), the default is 16777216 (16M);
+
+- `runtime_bloom_filter_size`: The default length of Bloom Filter in Runtime Filter (in bytes), the default is 2097152 (2M);
+
+Because it is necessary to ensure that the length of the Bloom Filter constructed by each HashJoinNode is the same to be merged, the length of the Bloom Filter is currently calculated in the FE query planning.
+
+If you can get the number of data rows (Cardinality) in the statistical information of the join table on the right, you will try to use Cardinality as NDV, the default false detection rate fpp is 0.05, and the Bloom that contains NDV unique elements and the false detection rate is lower than fpp is calculated by the formula The minimum number of bytes required by the filter, rounded to the nearest power of 2 (log value with 2 as the base), and limits the upper and lower limits of the length of the final Bloom Filter.
+
+If the Cardinality of the join right table is not available, the default Bloom Filter length will be used.
+
+#### 6.runtime_filter_max_in_num
+If the number of rows in the right table of the join is greater than this value, we will not generate an IN predicate, and the default is 1024;
+
+### View Runtime Filter generated by query
+The query plan that can be displayed by the `explain` command includes the join on clause information used by each Fragment, as well as comments on the generation and use of the Runtime Filter by the Fragment, so as to confirm whether the Runtime Filter is applied to the desired join on clause.
+- The comment contained in the Fragment that generates the Runtime Filter, such as `runtime filters: filter_id[type] <- table.column`.
+- Use the comment contained in the fragment of Runtime Filter such as `runtime filters: filter_id[type] -> table.column`.
+
+The query in the following example uses a Runtime Filter with ID RF000.
+```
+CREATE TABLE test (t1 INT) DISTRIBUTED BY HASH (t1) BUCKETS 2 PROPERTIES("replication_num" = "1");
+INSERT INTO test VALUES (1), (2), (3), (4);
+
+CREATE TABLE test2 (t2 INT) DISTRIBUTED BY HASH (t2) BUCKETS 2 PROPERTIES("replication_num" = "1");
+INSERT INTO test2 VALUES (3), (4), (5);
+
+EXPLAIN SELECT t1 FROM test JOIN test2 where test.t1 = test2.t2;
++-------------------------------------------------------------------+
+| Explain String                                                    |
++-------------------------------------------------------------------+
+| PLAN FRAGMENT 0                                                   |
+|  OUTPUT EXPRS:`t1`                                                |
+|                                                                   |
+|   4:EXCHANGE                                                      |
+|                                                                   |
+| PLAN FRAGMENT 1                                                   |
+|  OUTPUT EXPRS:                                                    |
+|   PARTITION: HASH_PARTITIONED: `default_cluster:ssb`.`test`.`t1`  |
+|                                                                   |
+|   2:HASH JOIN                                                     |
+|   |  join op: INNER JOIN (BUCKET_SHUFFLE)                         |
+|   |  equal join conjunct: `test`.`t1` = `test2`.`t2`              |
+|   |  runtime filters: RF000[in] <- `test2`.`t2`                   |
+|   |                                                               |
+|   |----3:EXCHANGE                                                 |
+|   |                                                               |
+|   0:OlapScanNode                                                  |
+|      TABLE: test                                                  |
+|      runtime filters: RF000[in] -> `test`.`t1`                    |
+|                                                                   |
+| PLAN FRAGMENT 2                                                   |
+|  OUTPUT EXPRS:                                                    |
+|   PARTITION: HASH_PARTITIONED: `default_cluster:ssb`.`test2`.`t2` |
+|                                                                   |
+|   1:OlapScanNode                                                  |
+|      TABLE: test2                                                 |
++-------------------------------------------------------------------+
+-- 上面`runtime filters`的行显示了`PLAN FRAGMENT 1`的`2:HASH JOIN`生成了ID为RF000的IN predicate，
+-- 其中`test2`.`t2`的key values仅在运行时可知，
+-- 在`0:OlapScanNode`使用了该IN predicate用于在读取`test`.`t1`时过滤不必要的数据。
+
+SELECT t1 FROM test JOIN test2 where test.t1 = test2.t2; 
+-- 返回2行结果[3, 4];
+
+-- 通过query的profile（set is_report_success=true;）可以查看查询内部工作的详细信息，
+-- 包括每个Runtime Filter是否下推、等待耗时、以及OLAP_SCAN_NODE从prepare到接收到Runtime Filter的总时长。
+RuntimeFilter:in:
+    -  HasPushDownToEngine:  true
+    -  AWaitTimeCost:  0ns
+    -  EffectTimeCost:  2.76ms
+
+-- 此外，在profile的OLAP_SCAN_NODE中还可以查看Runtime Filter下推后的过滤效果和耗时。
+    -  RowsVectorPredFiltered:  9.320008M  (9320008)
+    -  VectorPredEvalTime:  364.39ms
+```
+

Review comment:
       It should be English here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

xinyiZzz commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666702514



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### 设置Session变量

Review comment:
       Modified, thks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

xinyiZzz commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r663752066



##########
File path: docs/en/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter is a new feature officially added in Doris 0.15. It is designed to dynamically generate filter conditions for certain Join queries at runtime to reduce the amount of scanned data, avoid unnecessary I/O and network transmission, and speed up the query.
+
+It's design, implementation and effects, please refer to [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116).
+
+## Noun Interpretation
+* FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
+* BE: Backend, the back-end node of Doris. Responsible for query execution and data storage.
+* Left table: the table on the left during Join query. Perform Probe operation. The order can be adjusted by Join Reorder.
+* Right table: the table on the right during Join query. Perform the Build operation. The order can be adjusted by Join Reorder.
+* Fragment: FE will convert the execution of specific SQL statements into corresponding fragments and send them to BE for execution. The corresponding Fragment is executed on the BE, and the results are aggregated and returned to the FE.
+* Join on clause: `Aa=Bb` in `A join B on Aa=Bb`, based on this to generate join conjuncts during query planning, including expr used by join Build and Probe, where Build expr is called in Runtime Filter src expr, Probe expr are called target expr in Runtime Filter.
+
+## Principle
+Runtime Filter is generated during query planning, constructed in HashJoinNode, and applied in ScanNode.
+
+For example, there is currently a Join query between the T1 table and the T2 table. Its Join mode is HashJoin. T1 is a fact table with 100,000 rows of data. T2 is a dimension table with 100 rows of data. Doris join The actual situation is:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+Obviously, scanning data for T2 is much faster than T1. If we take the initiative to wait for a while and then scan T1, after T2 sends the scanned data record to HashJoinNode, HashJoinNode calculates a filter condition based on the data of T2, such as the maximum value of T2 data And the minimum value, or build a Bloom Filter, and then send this filter condition to ScanNode waiting to scan T1, the latter applies this filter condition and delivers the filtered data to HashJoinNode, thereby reducing the number of probe hash tables and network overhead. This filter condition is Runtime Filter, and the effect is as follows:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+If the filter condition (Runtime Filter) can be pushed down to the storage engine, in some cases, the index can be used to directly reduce the amount of scanned data, thereby greatly reducing the scanning time. The effect is as follows:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+It can be seen that, unlike predicate push-down and partition cutting, Runtime Filter is a filter condition dynamically generated at runtime, that is, when the query is run, the join on clause is parsed to determine the filter expression, and the expression is broadcast to ScanNode that is reading the left table , Thereby reducing the amount of scanned data, thereby reducing the number of probe hash table, avoiding unnecessary I/O and network transmission.
+
+Runtime Filter is mainly used to optimize joins for large tables. If the amount of data in the left table is too small, or the amount of data in the right table is too large, the Runtime Filter may not achieve the expected effect.
+
+## Usage
+
+### Set session variable
+In addition to `runtime_filter_type`, other query options are used to adjust the Runtime Filter to achieve the best performance in a specific scenario, usually only after the performance test, to optimize the query that is resource-intensive, takes a long enough time to run, and has a high enough frequency .
+
+#### 1.runtime_filter_type
+The types of Runtime Filter used include Bloom Filter, MinMax Filter, and IN predicate. When multiple types are used, they are separated by commas. Note that you need to add quotation marks. For example:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+By default, only IN predicate is used conservatively. In some cases, the performance is higher when Bloom Filter, MinMax Filter, and IN predicate are used at the same time.
+
+- **Bloom Filter**: There is a certain misjudgment rate, resulting in filtered data less than expected, but it will not lead to inaccurate final results. In most cases, Bloom Filter can improve performance or have no significant impact on performance , But in some cases it will cause performance degradation.
+    - Bloom Filter construction and application overhead is high, so when the filtering rate is low, or the amount of data in the left table is small, Bloom Filter may cause performance degradation.
+    - At present, only the Key column of the left table can be pushed down to the storage engine if the Bloom Filter is applied, and the test results show that the performance of the Bloom Filter is not pushed down to the storage engine.
+    - Currently Bloom Filter only has short-circuit logic when using expression filtering on ScanNode, that is, when the false positive rate is too high, the Bloom Filter will not continue to be used, but there is no short-circuit logic when the Bloom Filter is pushed down to the storage engine , So when the filtration rate is low, it may cause performance degradation.
+- **MinMax Filter**: Contains the maximum value and the minimum value, thereby filtering data smaller than the minimum value and greater than the maximum value. The filtering effect of the MinMax Filter is related to the type of the Key column in the join on clause and the data distribution of the left and right tables.
+    - When the type of the Key column in the join on clause is int/bigint/double, etc., in extreme cases, if the maximum and minimum values of the left and right tables are the same, there is no effect, otherwise the maximum value of the right table is less than the minimum value of the left table, or the minimum of the right table The value is greater than the maximum value in the left table, the effect is best.
+    - When the type of the Key column in the join on clause is varchar, etc., applying the MinMax Filter will often cause performance degradation.
+- **IN predicate**: Construct the IN predicate based on all the values of the Key listed in the join on clause on the right table, and use the constructed IN predicate to filter on the left table. Compared with the loom filter, the cost of construction and application is lower. The table on the right tends to have higher performance when the amount of data is small.
+    - By default, only the number of data rows in the right table is less than 1024 will be pushed down (can be adjusted by the `runtime_filter_max_in_num` in the session variable).
+    - Currently IN predicate does not implement a merge method, that is, it cannot be pushed down across Fragments, so currently when it is necessary to push down to the ScanNode of the left table of shuffle join, if Bloom Filter is not generated, then we will convert IN predicate to Bloom Filter for Process pushdown across Fragments, so even if the type only selects IN predicate, Bloom Filter may actually be applied;
+
+#### 2.runtime_filter_mode
+It is used to adjust the push-down strategy of Runtime Filter, including two strategies, LOCAL and GLOBAL, and the default setting is GLOBAL strategy.
+
+LOCAL: Relatively conservative, the constructed Runtime Filter can only be used in the same Fragment on the same instance (the smallest unit of query execution), that is, the Runtime Filter producer (the HashJoinNode that constructs the Filter) and the consumer (the ScanNode that uses the RuntimeFilter) The same Fragment, such as the general scene of broadcast join;
+
+GLOBAL: Relatively radical. In addition to satisfying the scenario of the LOCAL strategy, the Runtime Filter can also be combined and transmitted to different Fragments on different instances via the network. For example, Runtime Filter producers and consumers are in different Fragments, such as shuffle join. The query can be optimized in a wider range of scenarios.
+
+When building and applying Runtime Filters on different Fragments, the reasons and strategies for merging Runtime Filters can be found in [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)
+
+#### 3.runtime_filter_wait_time_ms
+After the Runtime Filter is turned on, the ScanNode that joins the left table will wait for a period of time for each Runtime Filter assigned to itself and then scan the data. The default wait is 1s (1000ms, in milliseconds), that is, if the ScanNode is assigned 3 Runtime Filters , Then it will wait at most 3s.
+
+Because it takes time to build and merge the Runtime Filter, ScanNode will try to push down the Runtime Filter that arrives within the waiting time to the storage engine. If the waiting time is exceeded, ScanNode will directly start scanning data using the Runtime Filter that has arrived.
+
+If the Runtime Filter arrives after ScanNode starts scanning, ScanNode will not push the Runtime Filter down to the storage engine. Instead, it will use expression filtering on ScanNode based on the Runtime Filter for the data that has been scanned from the storage engine. The scanned data will not apply the Runtime Filter, so the intermediate data size obtained will be larger than the optimal solution, but serious cracking can be avoided.
+
+If the cluster is busy and there are many resource-intensive or long-time-consuming queries on the cluster, consider increasing the waiting time to avoid missing optimization opportunities for complex queries. If the cluster load is light, and there are many small queries on the cluster that only take a few seconds, you can consider reducing the waiting time to avoid an increase of 1s for each query.
+
+#### 4.runtime_filters_max_num
+The maximum number of Bloom Filters in the Runtime Filter that can be applied to each query, the default is 10. Because Bloom Filter construction and application costs are high, so if the generated Bloom Filter exceeds the maximum allowed number, the Bloom Filter with large selectivity is retained. Currently, only the number of Bloom Filters is limited, because compared to MinMax Filter and IN predicate It is more expensive to build and apply.
+```
+Selectivity = (HashJoinNode Cardinality / HashJoinNode left child Cardinality)
+```
+Because the cardinality of FE is currently inaccurate, the selectivity of Bloom Filter calculation here is inaccurate, so in the end it may only randomly reserve part of Bloom Filter.
+
+#### 5. Bloom Filter related parameters
+- `runtime_bloom_filter_min_size`: the minimum length of Bloom Filter in Runtime Filter (in bytes), the default is 1048576 (1M);
+
+- `runtime_bloom_filter_max_size`: the maximum length of Bloom Filter in Runtime Filter (in bytes), the default is 16777216 (16M);
+
+- `runtime_bloom_filter_size`: The default length of Bloom Filter in Runtime Filter (in bytes), the default is 2097152 (2M);
+
+Because it is necessary to ensure that the length of the Bloom Filter constructed by each HashJoinNode is the same to be merged, the length of the Bloom Filter is currently calculated in the FE query planning.
+
+If you can get the number of data rows (Cardinality) in the statistical information of the join table on the right, you will try to use Cardinality as NDV, the default false detection rate fpp is 0.05, and the Bloom that contains NDV unique elements and the false detection rate is lower than fpp is calculated by the formula The minimum number of bytes required by the filter, rounded to the nearest power of 2 (log value with 2 as the base), and limits the upper and lower limits of the length of the final Bloom Filter.
+
+If the Cardinality of the join right table is not available, the default Bloom Filter length will be used.
+
+#### 6.runtime_filter_max_in_num
+If the number of rows in the right table of the join is greater than this value, we will not generate an IN predicate, and the default is 1024;
+
+### View Runtime Filter generated by query
+The query plan that can be displayed by the `explain` command includes the join on clause information used by each Fragment, as well as comments on the generation and use of the Runtime Filter by the Fragment, so as to confirm whether the Runtime Filter is applied to the desired join on clause.
+- The comment contained in the Fragment that generates the Runtime Filter, such as `runtime filters: filter_id[type] <- table.column`.
+- Use the comment contained in the fragment of Runtime Filter such as `runtime filters: filter_id[type] -> table.column`.
+
+The query in the following example uses a Runtime Filter with ID RF000.
+```
+CREATE TABLE test (t1 INT) DISTRIBUTED BY HASH (t1) BUCKETS 2 PROPERTIES("replication_num" = "1");
+INSERT INTO test VALUES (1), (2), (3), (4);
+
+CREATE TABLE test2 (t2 INT) DISTRIBUTED BY HASH (t2) BUCKETS 2 PROPERTIES("replication_num" = "1");
+INSERT INTO test2 VALUES (3), (4), (5);
+
+EXPLAIN SELECT t1 FROM test JOIN test2 where test.t1 = test2.t2;
++-------------------------------------------------------------------+
+| Explain String                                                    |
++-------------------------------------------------------------------+
+| PLAN FRAGMENT 0                                                   |
+|  OUTPUT EXPRS:`t1`                                                |
+|                                                                   |
+|   4:EXCHANGE                                                      |
+|                                                                   |
+| PLAN FRAGMENT 1                                                   |
+|  OUTPUT EXPRS:                                                    |
+|   PARTITION: HASH_PARTITIONED: `default_cluster:ssb`.`test`.`t1`  |
+|                                                                   |
+|   2:HASH JOIN                                                     |
+|   |  join op: INNER JOIN (BUCKET_SHUFFLE)                         |
+|   |  equal join conjunct: `test`.`t1` = `test2`.`t2`              |
+|   |  runtime filters: RF000[in] <- `test2`.`t2`                   |
+|   |                                                               |
+|   |----3:EXCHANGE                                                 |
+|   |                                                               |
+|   0:OlapScanNode                                                  |
+|      TABLE: test                                                  |
+|      runtime filters: RF000[in] -> `test`.`t1`                    |
+|                                                                   |
+| PLAN FRAGMENT 2                                                   |
+|  OUTPUT EXPRS:                                                    |
+|   PARTITION: HASH_PARTITIONED: `default_cluster:ssb`.`test2`.`t2` |
+|                                                                   |
+|   1:OlapScanNode                                                  |
+|      TABLE: test2                                                 |
++-------------------------------------------------------------------+
+-- 上面`runtime filters`的行显示了`PLAN FRAGMENT 1`的`2:HASH JOIN`生成了ID为RF000的IN predicate，
+-- 其中`test2`.`t2`的key values仅在运行时可知，
+-- 在`0:OlapScanNode`使用了该IN predicate用于在读取`test`.`t1`时过滤不必要的数据。
+
+SELECT t1 FROM test JOIN test2 where test.t1 = test2.t2; 
+-- 返回2行结果[3, 4];
+
+-- 通过query的profile（set is_report_success=true;）可以查看查询内部工作的详细信息，
+-- 包括每个Runtime Filter是否下推、等待耗时、以及OLAP_SCAN_NODE从prepare到接收到Runtime Filter的总时长。
+RuntimeFilter:in:
+    -  HasPushDownToEngine:  true
+    -  AWaitTimeCost:  0ns
+    -  EffectTimeCost:  2.76ms
+
+-- 此外，在profile的OLAP_SCAN_NODE中还可以查看Runtime Filter下推后的过滤效果和耗时。
+    -  RowsVectorPredFiltered:  9.320008M  (9320008)
+    -  VectorPredEvalTime:  364.39ms
+```
+

Review comment:
       Modified, thks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

xinyiZzz commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r667313990



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### 设置Session变量
+除`runtime_filter_type`外，其他查询选项用于调整Runtime Filter在特定场景下达到最佳性能，通常只在性能测试后进行调整，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型，包括Bloom Filter、MinMax Filter、IN predicate三种类型，使用多个时用逗号分隔，注意需要加引号，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+

Review comment:
       Bloom Filter的概念和原理感觉没必要在这里解释了，解释起来繁琐，这应该算基础知识…




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] hf200012 commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

hf200012 commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r663733418



##########
File path: docs/en/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter is a new feature officially added in Doris 0.15. It is designed to dynamically generate filter conditions for certain Join queries at runtime to reduce the amount of scanned data, avoid unnecessary I/O and network transmission, and speed up the query.
+
+It's design, implementation and effects, please refer to [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116).
+
+## Noun Interpretation
+* FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
+* BE: Backend, the back-end node of Doris. Responsible for query execution and data storage.
+* Left table: the table on the left during Join query. Perform Probe operation. The order can be adjusted by Join Reorder.
+* Right table: the table on the right during Join query. Perform the Build operation. The order can be adjusted by Join Reorder.
+* Fragment: FE will convert the execution of specific SQL statements into corresponding fragments and send them to BE for execution. The corresponding Fragment is executed on the BE, and the results are aggregated and returned to the FE.
+* Join on clause: `Aa=Bb` in `A join B on Aa=Bb`, based on this to generate join conjuncts during query planning, including expr used by join Build and Probe, where Build expr is called in Runtime Filter src expr, Probe expr are called target expr in Runtime Filter.
+
+## Principle
+Runtime Filter is generated during query planning, constructed in HashJoinNode, and applied in ScanNode.
+
+For example, there is currently a Join query between the T1 table and the T2 table. Its Join mode is HashJoin. T1 is a fact table with 100,000 rows of data. T2 is a dimension table with 100 rows of data. Doris join The actual situation is:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+Obviously, scanning data for T2 is much faster than T1. If we take the initiative to wait for a while and then scan T1, after T2 sends the scanned data record to HashJoinNode, HashJoinNode calculates a filter condition based on the data of T2, such as the maximum value of T2 data And the minimum value, or build a Bloom Filter, and then send this filter condition to ScanNode waiting to scan T1, the latter applies this filter condition and delivers the filtered data to HashJoinNode, thereby reducing the number of probe hash tables and network overhead. This filter condition is Runtime Filter, and the effect is as follows:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+If the filter condition (Runtime Filter) can be pushed down to the storage engine, in some cases, the index can be used to directly reduce the amount of scanned data, thereby greatly reducing the scanning time. The effect is as follows:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+It can be seen that, unlike predicate push-down and partition cutting, Runtime Filter is a filter condition dynamically generated at runtime, that is, when the query is run, the join on clause is parsed to determine the filter expression, and the expression is broadcast to ScanNode that is reading the left table , Thereby reducing the amount of scanned data, thereby reducing the number of probe hash table, avoiding unnecessary I/O and network transmission.
+
+Runtime Filter is mainly used to optimize joins for large tables. If the amount of data in the left table is too small, or the amount of data in the right table is too large, the Runtime Filter may not achieve the expected effect.
+
+## Usage
+
+### Set session variable
+In addition to `runtime_filter_type`, other query options are used to adjust the Runtime Filter to achieve the best performance in a specific scenario, usually only after the performance test, to optimize the query that is resource-intensive, takes a long enough time to run, and has a high enough frequency .
+
+#### 1.runtime_filter_type
+The types of Runtime Filter used include Bloom Filter, MinMax Filter, and IN predicate. When multiple types are used, they are separated by commas. Note that you need to add quotation marks. For example:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+By default, only IN predicate is used conservatively. In some cases, the performance is higher when Bloom Filter, MinMax Filter, and IN predicate are used at the same time.
+
+- **Bloom Filter**: There is a certain misjudgment rate, resulting in filtered data less than expected, but it will not lead to inaccurate final results. In most cases, Bloom Filter can improve performance or have no significant impact on performance , But in some cases it will cause performance degradation.
+    - Bloom Filter construction and application overhead is high, so when the filtering rate is low, or the amount of data in the left table is small, Bloom Filter may cause performance degradation.
+    - At present, only the Key column of the left table can be pushed down to the storage engine if the Bloom Filter is applied, and the test results show that the performance of the Bloom Filter is not pushed down to the storage engine.
+    - Currently Bloom Filter only has short-circuit logic when using expression filtering on ScanNode, that is, when the false positive rate is too high, the Bloom Filter will not continue to be used, but there is no short-circuit logic when the Bloom Filter is pushed down to the storage engine , So when the filtration rate is low, it may cause performance degradation.
+- **MinMax Filter**: Contains the maximum value and the minimum value, thereby filtering data smaller than the minimum value and greater than the maximum value. The filtering effect of the MinMax Filter is related to the type of the Key column in the join on clause and the data distribution of the left and right tables.
+    - When the type of the Key column in the join on clause is int/bigint/double, etc., in extreme cases, if the maximum and minimum values of the left and right tables are the same, there is no effect, otherwise the maximum value of the right table is less than the minimum value of the left table, or the minimum of the right table The value is greater than the maximum value in the left table, the effect is best.
+    - When the type of the Key column in the join on clause is varchar, etc., applying the MinMax Filter will often cause performance degradation.
+- **IN predicate**: Construct the IN predicate based on all the values of the Key listed in the join on clause on the right table, and use the constructed IN predicate to filter on the left table. Compared with the loom filter, the cost of construction and application is lower. The table on the right tends to have higher performance when the amount of data is small.
+    - By default, only the number of data rows in the right table is less than 1024 will be pushed down (can be adjusted by the `runtime_filter_max_in_num` in the session variable).
+    - Currently IN predicate does not implement a merge method, that is, it cannot be pushed down across Fragments, so currently when it is necessary to push down to the ScanNode of the left table of shuffle join, if Bloom Filter is not generated, then we will convert IN predicate to Bloom Filter for Process pushdown across Fragments, so even if the type only selects IN predicate, Bloom Filter may actually be applied;
+
+#### 2.runtime_filter_mode
+It is used to adjust the push-down strategy of Runtime Filter, including two strategies, LOCAL and GLOBAL, and the default setting is GLOBAL strategy.
+
+LOCAL: Relatively conservative, the constructed Runtime Filter can only be used in the same Fragment on the same instance (the smallest unit of query execution), that is, the Runtime Filter producer (the HashJoinNode that constructs the Filter) and the consumer (the ScanNode that uses the RuntimeFilter) The same Fragment, such as the general scene of broadcast join;
+
+GLOBAL: Relatively radical. In addition to satisfying the scenario of the LOCAL strategy, the Runtime Filter can also be combined and transmitted to different Fragments on different instances via the network. For example, Runtime Filter producers and consumers are in different Fragments, such as shuffle join. The query can be optimized in a wider range of scenarios.
+
+When building and applying Runtime Filters on different Fragments, the reasons and strategies for merging Runtime Filters can be found in [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)
+
+#### 3.runtime_filter_wait_time_ms
+After the Runtime Filter is turned on, the ScanNode that joins the left table will wait for a period of time for each Runtime Filter assigned to itself and then scan the data. The default wait is 1s (1000ms, in milliseconds), that is, if the ScanNode is assigned 3 Runtime Filters , Then it will wait at most 3s.
+
+Because it takes time to build and merge the Runtime Filter, ScanNode will try to push down the Runtime Filter that arrives within the waiting time to the storage engine. If the waiting time is exceeded, ScanNode will directly start scanning data using the Runtime Filter that has arrived.
+
+If the Runtime Filter arrives after ScanNode starts scanning, ScanNode will not push the Runtime Filter down to the storage engine. Instead, it will use expression filtering on ScanNode based on the Runtime Filter for the data that has been scanned from the storage engine. The scanned data will not apply the Runtime Filter, so the intermediate data size obtained will be larger than the optimal solution, but serious cracking can be avoided.
+
+If the cluster is busy and there are many resource-intensive or long-time-consuming queries on the cluster, consider increasing the waiting time to avoid missing optimization opportunities for complex queries. If the cluster load is light, and there are many small queries on the cluster that only take a few seconds, you can consider reducing the waiting time to avoid an increase of 1s for each query.
+
+#### 4.runtime_filters_max_num
+The maximum number of Bloom Filters in the Runtime Filter that can be applied to each query, the default is 10. Because Bloom Filter construction and application costs are high, so if the generated Bloom Filter exceeds the maximum allowed number, the Bloom Filter with large selectivity is retained. Currently, only the number of Bloom Filters is limited, because compared to MinMax Filter and IN predicate It is more expensive to build and apply.
+```
+Selectivity = (HashJoinNode Cardinality / HashJoinNode left child Cardinality)
+```
+Because the cardinality of FE is currently inaccurate, the selectivity of Bloom Filter calculation here is inaccurate, so in the end it may only randomly reserve part of Bloom Filter.
+
+#### 5. Bloom Filter related parameters
+- `runtime_bloom_filter_min_size`: the minimum length of Bloom Filter in Runtime Filter (in bytes), the default is 1048576 (1M);
+
+- `runtime_bloom_filter_max_size`: the maximum length of Bloom Filter in Runtime Filter (in bytes), the default is 16777216 (16M);
+
+- `runtime_bloom_filter_size`: The default length of Bloom Filter in Runtime Filter (in bytes), the default is 2097152 (2M);
+
+Because it is necessary to ensure that the length of the Bloom Filter constructed by each HashJoinNode is the same to be merged, the length of the Bloom Filter is currently calculated in the FE query planning.
+
+If you can get the number of data rows (Cardinality) in the statistical information of the join table on the right, you will try to use Cardinality as NDV, the default false detection rate fpp is 0.05, and the Bloom that contains NDV unique elements and the false detection rate is lower than fpp is calculated by the formula The minimum number of bytes required by the filter, rounded to the nearest power of 2 (log value with 2 as the base), and limits the upper and lower limits of the length of the final Bloom Filter.
+
+If the Cardinality of the join right table is not available, the default Bloom Filter length will be used.
+
+#### 6.runtime_filter_max_in_num
+If the number of rows in the right table of the join is greater than this value, we will not generate an IN predicate, and the default is 1024;
+
+### View Runtime Filter generated by query
+The query plan that can be displayed by the `explain` command includes the join on clause information used by each Fragment, as well as comments on the generation and use of the Runtime Filter by the Fragment, so as to confirm whether the Runtime Filter is applied to the desired join on clause.
+- The comment contained in the Fragment that generates the Runtime Filter, such as `runtime filters: filter_id[type] <- table.column`.
+- Use the comment contained in the fragment of Runtime Filter such as `runtime filters: filter_id[type] -> table.column`.
+
+The query in the following example uses a Runtime Filter with ID RF000.
+```
+CREATE TABLE test (t1 INT) DISTRIBUTED BY HASH (t1) BUCKETS 2 PROPERTIES("replication_num" = "1");
+INSERT INTO test VALUES (1), (2), (3), (4);
+
+CREATE TABLE test2 (t2 INT) DISTRIBUTED BY HASH (t2) BUCKETS 2 PROPERTIES("replication_num" = "1");
+INSERT INTO test2 VALUES (3), (4), (5);
+
+EXPLAIN SELECT t1 FROM test JOIN test2 where test.t1 = test2.t2;
++-------------------------------------------------------------------+
+| Explain String                                                    |
++-------------------------------------------------------------------+
+| PLAN FRAGMENT 0                                                   |
+|  OUTPUT EXPRS:`t1`                                                |
+|                                                                   |
+|   4:EXCHANGE                                                      |
+|                                                                   |
+| PLAN FRAGMENT 1                                                   |
+|  OUTPUT EXPRS:                                                    |
+|   PARTITION: HASH_PARTITIONED: `default_cluster:ssb`.`test`.`t1`  |
+|                                                                   |
+|   2:HASH JOIN                                                     |
+|   |  join op: INNER JOIN (BUCKET_SHUFFLE)                         |
+|   |  equal join conjunct: `test`.`t1` = `test2`.`t2`              |
+|   |  runtime filters: RF000[in] <- `test2`.`t2`                   |
+|   |                                                               |
+|   |----3:EXCHANGE                                                 |
+|   |                                                               |
+|   0:OlapScanNode                                                  |
+|      TABLE: test                                                  |
+|      runtime filters: RF000[in] -> `test`.`t1`                    |
+|                                                                   |
+| PLAN FRAGMENT 2                                                   |
+|  OUTPUT EXPRS:                                                    |
+|   PARTITION: HASH_PARTITIONED: `default_cluster:ssb`.`test2`.`t2` |
+|                                                                   |
+|   1:OlapScanNode                                                  |
+|      TABLE: test2                                                 |
++-------------------------------------------------------------------+
+-- 上面`runtime filters`的行显示了`PLAN FRAGMENT 1`的`2:HASH JOIN`生成了ID为RF000的IN predicate，
+-- 其中`test2`.`t2`的key values仅在运行时可知，
+-- 在`0:OlapScanNode`使用了该IN predicate用于在读取`test`.`t1`时过滤不必要的数据。
+
+SELECT t1 FROM test JOIN test2 where test.t1 = test2.t2; 
+-- 返回2行结果[3, 4];
+
+-- 通过query的profile（set is_report_success=true;）可以查看查询内部工作的详细信息，
+-- 包括每个Runtime Filter是否下推、等待耗时、以及OLAP_SCAN_NODE从prepare到接收到Runtime Filter的总时长。
+RuntimeFilter:in:
+    -  HasPushDownToEngine:  true
+    -  AWaitTimeCost:  0ns
+    -  EffectTimeCost:  2.76ms
+
+-- 此外，在profile的OLAP_SCAN_NODE中还可以查看Runtime Filter下推后的过滤效果和耗时。
+    -  RowsVectorPredFiltered:  9.320008M  (9320008)
+    -  VectorPredEvalTime:  364.39ms
+```
+

Review comment:
       It should be English here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

xinyiZzz commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666701641



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:

Review comment:
       Modified, thks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] pengxiangyu commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

pengxiangyu commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666643868



##########
File path: docs/en/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,225 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter is a new feature officially added in Doris 0.15. It is designed to dynamically generate filter conditions for certain Join queries at runtime to reduce the amount of scanned data, avoid unnecessary I/O and network transmission, and speed up the query.
+
+It's design, implementation and effects, please refer to [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116).
+
+## Noun Interpretation
+* FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
+* BE: Backend, the back-end node of Doris. Responsible for query execution and data storage.
+* Left table: the table on the left during Join query. Perform Probe operation. The order can be adjusted by Join Reorder.
+* Right table: the table on the right during Join query. Perform the Build operation. The order can be adjusted by Join Reorder.
+* Fragment: FE will convert the execution of specific SQL statements into corresponding fragments and send them to BE for execution. The corresponding Fragment is executed on the BE, and the results are aggregated and returned to the FE.
+* Join on clause: `Aa=Bb` in `A join B on Aa=Bb`, based on this to generate join conjuncts during query planning, including expr used by join Build and Probe, where Build expr is called in Runtime Filter src expr, Probe expr are called target expr in Runtime Filter.
+
+## Principle
+Runtime Filter is generated during query planning, constructed in HashJoinNode, and applied in ScanNode.
+
+For example, there is currently a Join query between the T1 table and the T2 table. Its Join mode is HashJoin. T1 is a fact table with 100,000 rows of data. T2 is a dimension table with 100 rows of data. Doris join The actual situation is:
+```

Review comment:
       这里写100 rows，下面图里为什么是2000?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

xinyiZzz commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666702210



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### 设置Session变量
+除`runtime_filter_type`外，其他查询选项用于调整Runtime Filter在特定场景下达到最佳性能，通常只在性能测试后进行调整，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型，包括Bloom Filter、MinMax Filter、IN predicate三种类型，使用多个时用逗号分隔，注意需要加引号，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+
+- **Bloom Filter**: 有一定的误判率，导致过滤的数据比预期少一点，但不会导致最终结果不准确，在大部分情况下Bloom Filter都可以提升性能或对性能没有显著影响，但在部分情况下会导致性能降低。
+    - Bloom Filter构建和应用的开销较高，所以当过滤率较低时，或者左表数据量较少时，Bloom Filter可能会导致性能降低。
+    - 目前只有左表的Key列应用Bloom Filter才能下推到存储引擎，而测试结果显示Bloom Filter不下推到存储引擎时往往会导致性能降低。
+    - 目前Bloom Filter仅在ScanNode上使用表达式过滤时有短路(short-circuit)逻辑，即当假阳性率过高时，不继续使用Bloom Filter，但当Bloom Filter下推到存储引擎后没有短路逻辑，所以当过滤率较低时可能导致性能降低。
+- **MinMax Filter**: 包含最大值和最小值，从而过滤小于最小值和大于最大值的数据，MinMax Filter的过滤效果与join on clause中Key列的类型和左右表数据分布有关。
+    - 当join on clause中Key列的类型为int/bigint/double等时，极端情况下，如果左右表的最大最小值相同则没有效果，反之右表最大值小于左表最小值，或右表最小值大于左表最大值，则效果最好。
+    - 当join on clause中Key列的类型为varchar等时，应用MinMax Filter往往会导致性能降低。
+- **IN predicate**: 根据join on clause中Key列在右表上的所有值构建IN predicate，使用构建的IN predicate在左表上过滤，相比loom Filter构建和应用的开销更低，在右表数据量较少时往往性能更高。

Review comment:
       Modified, thks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] pengxiangyu commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

pengxiangyu commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666715065



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### 设置Session变量
+除`runtime_filter_type`外，其他查询选项用于调整Runtime Filter在特定场景下达到最佳性能，通常只在性能测试后进行调整，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型，包括Bloom Filter、MinMax Filter、IN predicate三种类型，使用多个时用逗号分隔，注意需要加引号，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+

Review comment:
       Bloom Filter如何过滤的，没有说明白。




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] pengxiangyu commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

pengxiangyu commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666711570



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,279 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为2000，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### Runtime Filter查询选项
+
+与Runtime Filter相关的查询选项信息，请参阅以下部分:
+
+- 第一个查询选项是调整使用的Runtime Filter类型，大多数情况下，您只需要调整这一个选项，其他选项保持默认即可。
+
+  - `runtime_filter_type`: 包括Bloom Filter、MinMax Filter、IN predicate，默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+
+- 其他查询选项通常仅在某些特定场景下，才需进一步调整以达到最优效果。通常只在性能测试后，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+  - `runtime_filter_mode`: 用于调整Runtime Filter的下推策略，包括OFF、LOCAL、GLOBAL三种策略，默认设置为GLOBAL策略
+
+  - `runtime_filter_wait_time_ms`: 左表的ScanNode等待每个Runtime Filter的时间，默认1000ms
+
+  - `runtime_filters_max_num`: 每个查询可应用的Runtime Filter中Bloom Filter的最大数量，默认10
+
+  - `runtime_bloom_filter_min_size`: Runtime Filter中Bloom Filter的最小长度，默认1048576（1M）
+
+  - `runtime_bloom_filter_max_size`: Runtime Filter中Bloom Filter的最大长度，默认16777216（16M）
+
+  - `runtime_bloom_filter_size`: Runtime Filter中Bloom Filter的默认长度，默认2097152（2M）
+
+  - `runtime_filter_max_in_num`: 如果join右表数据行数大于这个值，我们将不生成IN predicate，默认1024
+
+下面对查询选项做进一步说明。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型。
+
+**类型**: 数字(1, 2, 4)或者相对应的助记符字符串(IN, BLOOM_FILTER, MIN_MAX)，默认1(IN predicate)，使用多个时用逗号分隔，注意需要加引号，或者将任意多个类型的数字相加，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+等价于:
+```
+set runtime_filter_type=7;
+```
+
+**使用注意事项**
+
+- **Bloom Filter**: 有一定的误判率，导致过滤的数据比预期少一点，但不会导致最终结果不准确，在大部分情况下Bloom Filter都可以提升性能或对性能没有显著影响，但在部分情况下会导致性能降低。
+    - Bloom Filter构建和应用的开销较高，所以当过滤率较低时，或者左表数据量较少时，Bloom Filter可能会导致性能降低。
+    - 目前只有左表的Key列应用Bloom Filter才能下推到存储引擎，而测试结果显示Bloom Filter不下推到存储引擎时往往会导致性能降低。
+    - 目前Bloom Filter仅在ScanNode上使用表达式过滤时有短路(short-circuit)逻辑，即当假阳性率过高时，不继续使用Bloom Filter，但当Bloom Filter下推到存储引擎后没有短路逻辑，所以当过滤率较低时可能导致性能降低。
+
+- **MinMax Filter**: 包含最大值和最小值，从而过滤小于最小值和大于最大值的数据，MinMax Filter的过滤效果与join on clause中Key列的类型和左右表数据分布有关。
+    - 当join on clause中Key列的类型为int/bigint/double等时，极端情况下，如果左右表的最大最小值相同则没有效果，反之右表最大值小于左表最小值，或右表最小值大于左表最大值，则效果最好。
+    - 当join on clause中Key列的类型为varchar等时，应用MinMax Filter往往会导致性能降低。
+
+- **IN predicate**: 根据join on clause中Key列在右表上的所有值构建IN predicate，使用构建的IN predicate在左表上过滤，相比Bloom Filter构建和应用的开销更低，在右表数据量较少时往往性能更高。
+    - 默认只有右表数据行数少于1024才会下推（可通过session变量中的`runtime_filter_max_in_num`调整）。
+    - 目前IN predicate没有实现合并方法，即无法跨Fragment下推，所以目前当需要下推给shuffle join左表的ScanNode时，如果没有生成Bloom Filter，那么我们会将IN predicate转为Bloom Filter，用于处理跨Fragment下推，所以即使类型只选择了IN predicate，实际也可能应用了Bloom Filter；
+
+#### 2.runtime_filter_mode
+用于控制Runtime Filter在instance之间传输的范围。
+
+**类型**: 数字(0, 1, 2)或者相对应的助记符字符串(OFF, LOCAL, GLOBAL)，默认2(GLOBAL)。
+
+**使用注意事项**
+
+LOCAL：相对保守，构建的Runtime Filter只能在同一个instance（查询执行的最小单元）上同一个Fragment中使用，即Runtime Filter生产者（构建Filter的HashJoinNode）和消费者（使用RuntimeFilter的ScanNode）在同一个Fragment，比如broadcast join的一般场景；
+
+GLOBAL：相对激进，除满足LOCAL策略的场景外，还可以将Runtime Filter合并后通过网络传输到不同instance上的不同Fragment中使用，比如Runtime Filter生产者和消费者在不同Fragment，比如shuffle join。
+
+大多数情况下GLOBAL策略可以在更广泛的场景对查询进行优化，但在有些shuffle join中生成和合并Runtime Filter的开销超过给查询带来的性能优势，可以考虑更改为LOCAL策略。
+
+如果集群中涉及的join查询不会因为Runtime Filter而提高性能，您可以将设置更改为OFF，从而完全关闭该功能。
+
+在不同Fragment上构建和应用Runtime Filter时，需要合并Runtime Filter的原因和策略可参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)
+
+#### 3.runtime_filter_wait_time_ms
+Runtime Filter的等待耗时。
+
+**类型**: 整数，默认1000，单位ms
+
+**使用注意事项**
+
+在开启Runtime Filter后，左表的ScanNode会为每一个分配给自己的Runtime Filter等待一段时间再扫描数据，即如果ScanNode被分配了3个Runtime Filter，那么它最多会等待3000ms。
+
+因为Runtime Filter的构建和合并均需要时间，ScanNode会尝试将等待时间内到达的Runtime Filter下推到存储引擎，如果超过等待时间后，ScanNode会使用已经到达的Runtime Filter直接开始扫描数据。
+
+如果Runtime Filter在ScanNode开始扫描之后到达，则ScanNode不会将该Runtime Filter下推到存储引擎，而是对已经从存储引擎扫描上来的数据，在ScanNode上基于该Runtime Filter使用表达式过滤，之前已经扫描的数据则不会应用该Runtime Filter，这样得到的中间数据规模会大于最优解，但可以避免严重的裂化。
+
+如果集群比较繁忙，并且集群上有许多资源密集型或长耗时的查询，可以考虑增加等待时间，以避免复杂查询错过优化机会。如果集群负载较轻，并且集群上有许多只需要几秒的小查询，可以考虑减少等待时间，以避免每个查询增加1s的延迟。
+

Review comment:
       错过优化机会是什么意思？如何错过的？如何才能避免错过？

##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,279 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为2000，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### Runtime Filter查询选项
+
+与Runtime Filter相关的查询选项信息，请参阅以下部分:
+
+- 第一个查询选项是调整使用的Runtime Filter类型，大多数情况下，您只需要调整这一个选项，其他选项保持默认即可。
+
+  - `runtime_filter_type`: 包括Bloom Filter、MinMax Filter、IN predicate，默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+
+- 其他查询选项通常仅在某些特定场景下，才需进一步调整以达到最优效果。通常只在性能测试后，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+  - `runtime_filter_mode`: 用于调整Runtime Filter的下推策略，包括OFF、LOCAL、GLOBAL三种策略，默认设置为GLOBAL策略
+
+  - `runtime_filter_wait_time_ms`: 左表的ScanNode等待每个Runtime Filter的时间，默认1000ms
+
+  - `runtime_filters_max_num`: 每个查询可应用的Runtime Filter中Bloom Filter的最大数量，默认10
+
+  - `runtime_bloom_filter_min_size`: Runtime Filter中Bloom Filter的最小长度，默认1048576（1M）
+
+  - `runtime_bloom_filter_max_size`: Runtime Filter中Bloom Filter的最大长度，默认16777216（16M）
+
+  - `runtime_bloom_filter_size`: Runtime Filter中Bloom Filter的默认长度，默认2097152（2M）
+
+  - `runtime_filter_max_in_num`: 如果join右表数据行数大于这个值，我们将不生成IN predicate，默认1024
+
+下面对查询选项做进一步说明。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型。
+
+**类型**: 数字(1, 2, 4)或者相对应的助记符字符串(IN, BLOOM_FILTER, MIN_MAX)，默认1(IN predicate)，使用多个时用逗号分隔，注意需要加引号，或者将任意多个类型的数字相加，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+等价于:
+```
+set runtime_filter_type=7;
+```
+
+**使用注意事项**
+
+- **Bloom Filter**: 有一定的误判率，导致过滤的数据比预期少一点，但不会导致最终结果不准确，在大部分情况下Bloom Filter都可以提升性能或对性能没有显著影响，但在部分情况下会导致性能降低。
+    - Bloom Filter构建和应用的开销较高，所以当过滤率较低时，或者左表数据量较少时，Bloom Filter可能会导致性能降低。
+    - 目前只有左表的Key列应用Bloom Filter才能下推到存储引擎，而测试结果显示Bloom Filter不下推到存储引擎时往往会导致性能降低。
+    - 目前Bloom Filter仅在ScanNode上使用表达式过滤时有短路(short-circuit)逻辑，即当假阳性率过高时，不继续使用Bloom Filter，但当Bloom Filter下推到存储引擎后没有短路逻辑，所以当过滤率较低时可能导致性能降低。
+
+- **MinMax Filter**: 包含最大值和最小值，从而过滤小于最小值和大于最大值的数据，MinMax Filter的过滤效果与join on clause中Key列的类型和左右表数据分布有关。
+    - 当join on clause中Key列的类型为int/bigint/double等时，极端情况下，如果左右表的最大最小值相同则没有效果，反之右表最大值小于左表最小值，或右表最小值大于左表最大值，则效果最好。
+    - 当join on clause中Key列的类型为varchar等时，应用MinMax Filter往往会导致性能降低。
+
+- **IN predicate**: 根据join on clause中Key列在右表上的所有值构建IN predicate，使用构建的IN predicate在左表上过滤，相比Bloom Filter构建和应用的开销更低，在右表数据量较少时往往性能更高。
+    - 默认只有右表数据行数少于1024才会下推（可通过session变量中的`runtime_filter_max_in_num`调整）。
+    - 目前IN predicate没有实现合并方法，即无法跨Fragment下推，所以目前当需要下推给shuffle join左表的ScanNode时，如果没有生成Bloom Filter，那么我们会将IN predicate转为Bloom Filter，用于处理跨Fragment下推，所以即使类型只选择了IN predicate，实际也可能应用了Bloom Filter；
+
+#### 2.runtime_filter_mode
+用于控制Runtime Filter在instance之间传输的范围。
+
+**类型**: 数字(0, 1, 2)或者相对应的助记符字符串(OFF, LOCAL, GLOBAL)，默认2(GLOBAL)。
+
+**使用注意事项**
+
+LOCAL：相对保守，构建的Runtime Filter只能在同一个instance（查询执行的最小单元）上同一个Fragment中使用，即Runtime Filter生产者（构建Filter的HashJoinNode）和消费者（使用RuntimeFilter的ScanNode）在同一个Fragment，比如broadcast join的一般场景；
+
+GLOBAL：相对激进，除满足LOCAL策略的场景外，还可以将Runtime Filter合并后通过网络传输到不同instance上的不同Fragment中使用，比如Runtime Filter生产者和消费者在不同Fragment，比如shuffle join。
+
+大多数情况下GLOBAL策略可以在更广泛的场景对查询进行优化，但在有些shuffle join中生成和合并Runtime Filter的开销超过给查询带来的性能优势，可以考虑更改为LOCAL策略。
+
+如果集群中涉及的join查询不会因为Runtime Filter而提高性能，您可以将设置更改为OFF，从而完全关闭该功能。
+
+在不同Fragment上构建和应用Runtime Filter时，需要合并Runtime Filter的原因和策略可参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)
+
+#### 3.runtime_filter_wait_time_ms
+Runtime Filter的等待耗时。
+
+**类型**: 整数，默认1000，单位ms
+
+**使用注意事项**
+
+在开启Runtime Filter后，左表的ScanNode会为每一个分配给自己的Runtime Filter等待一段时间再扫描数据，即如果ScanNode被分配了3个Runtime Filter，那么它最多会等待3000ms。
+

Review comment:
       为什么是3000ms？不是并行的么？




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] pengxiangyu commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

pengxiangyu commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666700186



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### 设置Session变量
+除`runtime_filter_type`外，其他查询选项用于调整Runtime Filter在特定场景下达到最佳性能，通常只在性能测试后进行调整，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型，包括Bloom Filter、MinMax Filter、IN predicate三种类型，使用多个时用逗号分隔，注意需要加引号，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+
+- **Bloom Filter**: 有一定的误判率，导致过滤的数据比预期少一点，但不会导致最终结果不准确，在大部分情况下Bloom Filter都可以提升性能或对性能没有显著影响，但在部分情况下会导致性能降低。
+    - Bloom Filter构建和应用的开销较高，所以当过滤率较低时，或者左表数据量较少时，Bloom Filter可能会导致性能降低。
+    - 目前只有左表的Key列应用Bloom Filter才能下推到存储引擎，而测试结果显示Bloom Filter不下推到存储引擎时往往会导致性能降低。
+    - 目前Bloom Filter仅在ScanNode上使用表达式过滤时有短路(short-circuit)逻辑，即当假阳性率过高时，不继续使用Bloom Filter，但当Bloom Filter下推到存储引擎后没有短路逻辑，所以当过滤率较低时可能导致性能降低。
+- **MinMax Filter**: 包含最大值和最小值，从而过滤小于最小值和大于最大值的数据，MinMax Filter的过滤效果与join on clause中Key列的类型和左右表数据分布有关。
+    - 当join on clause中Key列的类型为int/bigint/double等时，极端情况下，如果左右表的最大最小值相同则没有效果，反之右表最大值小于左表最小值，或右表最小值大于左表最大值，则效果最好。
+    - 当join on clause中Key列的类型为varchar等时，应用MinMax Filter往往会导致性能降低。
+- **IN predicate**: 根据join on clause中Key列在右表上的所有值构建IN predicate，使用构建的IN predicate在左表上过滤，相比loom Filter构建和应用的开销更低，在右表数据量较少时往往性能更高。
+    - 默认只有右表数据行数少于1024才会下推（可通过session变量中的`runtime_filter_max_in_num`调整）。
+    - 目前IN predicate没有实现合并方法，即无法跨Fragment下推，所以目前当需要下推给shuffle join左表的ScanNode时，如果没有生成Bloom Filter，那么我们会将IN predicate转为Bloom Filter，用于处理跨Fragment下推，所以即使类型只选择了IN predicate，实际也可能应用了Bloom Filter；
+
+#### 2.runtime_filter_mode
+用于调整Runtime Filter的下推策略，包括LOCAL和GLOBAL两种策略，默认设置为GLOBAL策略。
+
+LOCAL：相对保守，构建的Runtime Filter只能在同一个instance（查询执行的最小单元）上同一个Fragment中使用，即Runtime Filter生产者（构建Filter的HashJoinNode）和消费者（使用RuntimeFilter的ScanNode）在同一个Fragment，比如broadcast join的一般场景；
+
+GLOBAL：相对激进，除满足LOCAL策略的场景外，还可以将Runtime Filter合并后通过网络传输到不同instance上的不同Fragment中使用，比如Runtime Filter生产者和消费者在不同Fragment，比如shuffle join，这可以在更广泛的场景对查询进行优化。
+
+在不同Fragment上构建和应用Runtime Filter时，需要合并Runtime Filter的原因和策略可参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)
+
+#### 3.runtime_filter_wait_time_ms
+在开启Runtime Filter后，join左表的ScanNode会为每一个分配给自己的Runtime Filter等待一段时间再扫描数据，默认等待1s（1000ms，以毫秒为单位），即如果ScanNode被分配了3个Runtime Filter，那么它最多会等待3s。

Review comment:
       为什么是3秒？不是并行的么？




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] pengxiangyu commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

pengxiangyu commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666649137



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### 设置Session变量
+除`runtime_filter_type`外，其他查询选项用于调整Runtime Filter在特定场景下达到最佳性能，通常只在性能测试后进行调整，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型，包括Bloom Filter、MinMax Filter、IN predicate三种类型，使用多个时用逗号分隔，注意需要加引号，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+

Review comment:
       分别解释一下Bloom Filter   MinMax Filter   IN predicate的具体算法吧。




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

xinyiZzz commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666701920



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### 设置Session变量
+除`runtime_filter_type`外，其他查询选项用于调整Runtime Filter在特定场景下达到最佳性能，通常只在性能测试后进行调整，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型，包括Bloom Filter、MinMax Filter、IN predicate三种类型，使用多个时用逗号分隔，注意需要加引号，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+

Review comment:
       在下面有对这三个filter说明




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

xinyiZzz commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r667314273



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,279 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为2000，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### Runtime Filter查询选项
+
+与Runtime Filter相关的查询选项信息，请参阅以下部分:
+
+- 第一个查询选项是调整使用的Runtime Filter类型，大多数情况下，您只需要调整这一个选项，其他选项保持默认即可。
+
+  - `runtime_filter_type`: 包括Bloom Filter、MinMax Filter、IN predicate，默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+
+- 其他查询选项通常仅在某些特定场景下，才需进一步调整以达到最优效果。通常只在性能测试后，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+  - `runtime_filter_mode`: 用于调整Runtime Filter的下推策略，包括OFF、LOCAL、GLOBAL三种策略，默认设置为GLOBAL策略
+
+  - `runtime_filter_wait_time_ms`: 左表的ScanNode等待每个Runtime Filter的时间，默认1000ms
+
+  - `runtime_filters_max_num`: 每个查询可应用的Runtime Filter中Bloom Filter的最大数量，默认10
+
+  - `runtime_bloom_filter_min_size`: Runtime Filter中Bloom Filter的最小长度，默认1048576（1M）
+
+  - `runtime_bloom_filter_max_size`: Runtime Filter中Bloom Filter的最大长度，默认16777216（16M）
+
+  - `runtime_bloom_filter_size`: Runtime Filter中Bloom Filter的默认长度，默认2097152（2M）
+
+  - `runtime_filter_max_in_num`: 如果join右表数据行数大于这个值，我们将不生成IN predicate，默认1024
+
+下面对查询选项做进一步说明。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型。
+
+**类型**: 数字(1, 2, 4)或者相对应的助记符字符串(IN, BLOOM_FILTER, MIN_MAX)，默认1(IN predicate)，使用多个时用逗号分隔，注意需要加引号，或者将任意多个类型的数字相加，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+等价于:
+```
+set runtime_filter_type=7;
+```
+
+**使用注意事项**
+
+- **Bloom Filter**: 有一定的误判率，导致过滤的数据比预期少一点，但不会导致最终结果不准确，在大部分情况下Bloom Filter都可以提升性能或对性能没有显著影响，但在部分情况下会导致性能降低。
+    - Bloom Filter构建和应用的开销较高，所以当过滤率较低时，或者左表数据量较少时，Bloom Filter可能会导致性能降低。
+    - 目前只有左表的Key列应用Bloom Filter才能下推到存储引擎，而测试结果显示Bloom Filter不下推到存储引擎时往往会导致性能降低。
+    - 目前Bloom Filter仅在ScanNode上使用表达式过滤时有短路(short-circuit)逻辑，即当假阳性率过高时，不继续使用Bloom Filter，但当Bloom Filter下推到存储引擎后没有短路逻辑，所以当过滤率较低时可能导致性能降低。
+
+- **MinMax Filter**: 包含最大值和最小值，从而过滤小于最小值和大于最大值的数据，MinMax Filter的过滤效果与join on clause中Key列的类型和左右表数据分布有关。
+    - 当join on clause中Key列的类型为int/bigint/double等时，极端情况下，如果左右表的最大最小值相同则没有效果，反之右表最大值小于左表最小值，或右表最小值大于左表最大值，则效果最好。
+    - 当join on clause中Key列的类型为varchar等时，应用MinMax Filter往往会导致性能降低。
+
+- **IN predicate**: 根据join on clause中Key列在右表上的所有值构建IN predicate，使用构建的IN predicate在左表上过滤，相比Bloom Filter构建和应用的开销更低，在右表数据量较少时往往性能更高。
+    - 默认只有右表数据行数少于1024才会下推（可通过session变量中的`runtime_filter_max_in_num`调整）。
+    - 目前IN predicate没有实现合并方法，即无法跨Fragment下推，所以目前当需要下推给shuffle join左表的ScanNode时，如果没有生成Bloom Filter，那么我们会将IN predicate转为Bloom Filter，用于处理跨Fragment下推，所以即使类型只选择了IN predicate，实际也可能应用了Bloom Filter；
+
+#### 2.runtime_filter_mode
+用于控制Runtime Filter在instance之间传输的范围。
+
+**类型**: 数字(0, 1, 2)或者相对应的助记符字符串(OFF, LOCAL, GLOBAL)，默认2(GLOBAL)。
+
+**使用注意事项**
+
+LOCAL：相对保守，构建的Runtime Filter只能在同一个instance（查询执行的最小单元）上同一个Fragment中使用，即Runtime Filter生产者（构建Filter的HashJoinNode）和消费者（使用RuntimeFilter的ScanNode）在同一个Fragment，比如broadcast join的一般场景；
+
+GLOBAL：相对激进，除满足LOCAL策略的场景外，还可以将Runtime Filter合并后通过网络传输到不同instance上的不同Fragment中使用，比如Runtime Filter生产者和消费者在不同Fragment，比如shuffle join。
+
+大多数情况下GLOBAL策略可以在更广泛的场景对查询进行优化，但在有些shuffle join中生成和合并Runtime Filter的开销超过给查询带来的性能优势，可以考虑更改为LOCAL策略。
+
+如果集群中涉及的join查询不会因为Runtime Filter而提高性能，您可以将设置更改为OFF，从而完全关闭该功能。
+
+在不同Fragment上构建和应用Runtime Filter时，需要合并Runtime Filter的原因和策略可参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)
+
+#### 3.runtime_filter_wait_time_ms
+Runtime Filter的等待耗时。
+
+**类型**: 整数，默认1000，单位ms
+
+**使用注意事项**
+
+在开启Runtime Filter后，左表的ScanNode会为每一个分配给自己的Runtime Filter等待一段时间再扫描数据，即如果ScanNode被分配了3个Runtime Filter，那么它最多会等待3000ms。
+
+因为Runtime Filter的构建和合并均需要时间，ScanNode会尝试将等待时间内到达的Runtime Filter下推到存储引擎，如果超过等待时间后，ScanNode会使用已经到达的Runtime Filter直接开始扫描数据。
+
+如果Runtime Filter在ScanNode开始扫描之后到达，则ScanNode不会将该Runtime Filter下推到存储引擎，而是对已经从存储引擎扫描上来的数据，在ScanNode上基于该Runtime Filter使用表达式过滤，之前已经扫描的数据则不会应用该Runtime Filter，这样得到的中间数据规模会大于最优解，但可以避免严重的裂化。
+
+如果集群比较繁忙，并且集群上有许多资源密集型或长耗时的查询，可以考虑增加等待时间，以避免复杂查询错过优化机会。如果集群负载较轻，并且集群上有许多只需要几秒的小查询，可以考虑减少等待时间，以避免每个查询增加1s的延迟。
+

Review comment:
       错过：如果没有在规定时间内等到Runtime filter到达，会不使用Runtime FIlter直接开始扫描，即没用上runtime filter
   避免：可以考虑增加等待时间




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] pengxiangyu commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

pengxiangyu commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666647298



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:

Review comment:
       是100还是2000？




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] morningman commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

morningman commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666615732



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### 设置Session变量

Review comment:
       建议在这里先把所有runtime filter相关参数罗列一些，然后说明，在绝大多数情况下，只需要修改 runtime_filter_type，其他参数保持默认即可。在某些特定场景下，才需要进一步的调整其他参数已达到最优效果




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

xinyiZzz commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r666702437



##########
File path: docs/zh-CN/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter 是在 Doris 0.15 版本中正式加入的新功能。旨在为某些 Join 查询在运行时动态生成过滤条件，来减少扫描的数据量，避免不必要的I/O和网络传输，从而加速查询。
+
+它的设计、实现和效果可以参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)。
+
+## 名词解释
+* FE：Frontend，Doris 的前端节点。负责元数据管理和请求接入。
+* BE：Backend，Doris 的后端节点。负责查询执行和数据存储。
+* 左表：Join查询时，左边的表。进行Probe操作。可被Join Reorder调整顺序。
+* 右表：Join查询时，右边的表。进行Build操作。可被Join Reorder调整顺序。
+* Fragment：FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment，并将结果汇聚返回给FE。
+* Join on clause: `A join B on A.a=B.b`中的`A.a=B.b`，在查询规划时基于此生成join conjuncts，包含join Build和Probe使用的expr，其中Build expr在Runtime Filter中称为src expr，Probe expr在Runtime Filter中称为target expr。
+
+## 原理
+Runtime Filter在查询规划时生成，在HashJoinNode中构建，在ScanNode中应用。
+
+举个例子，当前存在T1表与T2表的Join查询，它的Join方式为HashJoin，T1是一张事实表，数据行数为100000，T2是一张维度表，数据行数为100，Doris join的实际情况是:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+显而易见对T2扫描数据要远远快于T1，如果我们主动等待一段时间再扫描T1，等T2将扫描的数据记录交给HashJoinNode后，HashJoinNode根据T2的数据计算出一个过滤条件，比如T2数据的最大和最小值，或者构建一个Bloom Filter，接着将这个过滤条件发给等待扫描T1的ScanNode，后者应用这个过滤条件，将过滤后的数据交给HashJoinNode，从而减少probe hash table的次数和网络开销，这个过滤条件就是Runtime Filter，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+如果能将过滤条件（Runtime Filter）下推到存储引擎，则某些情况下可以利用索引来直接减少扫描的数据量，从而大大减少扫描耗时，效果如下:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+可见，和谓词下推、分区裁剪不同，Runtime Filter是在运行时动态生成的过滤条件，即在查询运行时解析join on clause确定过滤表达式，并将表达式广播给正在读取左表的ScanNode，从而减少扫描的数据量，进而减少probe hash table的次数，避免不必要的I/O和网络传输。
+
+Runtime Filter主要用于优化针对大表的join，如果左表的数据量太小，或者右表的数据量太大，则Runtime Filter可能不会取得预期效果。
+
+## 使用方式
+
+### 设置Session变量
+除`runtime_filter_type`外，其他查询选项用于调整Runtime Filter在特定场景下达到最佳性能，通常只在性能测试后进行调整，针对资源密集型、运行耗时足够长且频率足够高的查询进行优化。
+
+#### 1.runtime_filter_type
+使用的Runtime Filter类型，包括Bloom Filter、MinMax Filter、IN predicate三种类型，使用多个时用逗号分隔，注意需要加引号，例如:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+默认会保守的只使用IN predicate，部分情况下同时使用Bloom Filter、MinMax Filter、IN predicate时性能更高。
+
+- **Bloom Filter**: 有一定的误判率，导致过滤的数据比预期少一点，但不会导致最终结果不准确，在大部分情况下Bloom Filter都可以提升性能或对性能没有显著影响，但在部分情况下会导致性能降低。
+    - Bloom Filter构建和应用的开销较高，所以当过滤率较低时，或者左表数据量较少时，Bloom Filter可能会导致性能降低。
+    - 目前只有左表的Key列应用Bloom Filter才能下推到存储引擎，而测试结果显示Bloom Filter不下推到存储引擎时往往会导致性能降低。
+    - 目前Bloom Filter仅在ScanNode上使用表达式过滤时有短路(short-circuit)逻辑，即当假阳性率过高时，不继续使用Bloom Filter，但当Bloom Filter下推到存储引擎后没有短路逻辑，所以当过滤率较低时可能导致性能降低。
+- **MinMax Filter**: 包含最大值和最小值，从而过滤小于最小值和大于最大值的数据，MinMax Filter的过滤效果与join on clause中Key列的类型和左右表数据分布有关。
+    - 当join on clause中Key列的类型为int/bigint/double等时，极端情况下，如果左右表的最大最小值相同则没有效果，反之右表最大值小于左表最小值，或右表最小值大于左表最大值，则效果最好。
+    - 当join on clause中Key列的类型为varchar等时，应用MinMax Filter往往会导致性能降低。
+- **IN predicate**: 根据join on clause中Key列在右表上的所有值构建IN predicate，使用构建的IN predicate在左表上过滤，相比loom Filter构建和应用的开销更低，在右表数据量较少时往往性能更高。
+    - 默认只有右表数据行数少于1024才会下推（可通过session变量中的`runtime_filter_max_in_num`调整）。
+    - 目前IN predicate没有实现合并方法，即无法跨Fragment下推，所以目前当需要下推给shuffle join左表的ScanNode时，如果没有生成Bloom Filter，那么我们会将IN predicate转为Bloom Filter，用于处理跨Fragment下推，所以即使类型只选择了IN predicate，实际也可能应用了Bloom Filter；
+
+#### 2.runtime_filter_mode
+用于调整Runtime Filter的下推策略，包括LOCAL和GLOBAL两种策略，默认设置为GLOBAL策略。
+
+LOCAL：相对保守，构建的Runtime Filter只能在同一个instance（查询执行的最小单元）上同一个Fragment中使用，即Runtime Filter生产者（构建Filter的HashJoinNode）和消费者（使用RuntimeFilter的ScanNode）在同一个Fragment，比如broadcast join的一般场景；
+
+GLOBAL：相对激进，除满足LOCAL策略的场景外，还可以将Runtime Filter合并后通过网络传输到不同instance上的不同Fragment中使用，比如Runtime Filter生产者和消费者在不同Fragment，比如shuffle join，这可以在更广泛的场景对查询进行优化。
+
+在不同Fragment上构建和应用Runtime Filter时，需要合并Runtime Filter的原因和策略可参阅 [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)
+
+#### 3.runtime_filter_wait_time_ms
+在开启Runtime Filter后，join左表的ScanNode会为每一个分配给自己的Runtime Filter等待一段时间再扫描数据，默认等待1s（1000ms，以毫秒为单位），即如果ScanNode被分配了3个Runtime Filter，那么它最多会等待3s。

Review comment:
       不是，目前是在一个循环里面，一个一个等待的




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

xinyiZzz commented on a change in pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154#discussion_r663752066



##########
File path: docs/en/administrator-guide/runtime-filter.md
##########
@@ -0,0 +1,223 @@
+---
+{
+    "title": "Runtime Filter",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Runtime Filter
+Runtime Filter is a new feature officially added in Doris 0.15. It is designed to dynamically generate filter conditions for certain Join queries at runtime to reduce the amount of scanned data, avoid unnecessary I/O and network transmission, and speed up the query.
+
+It's design, implementation and effects, please refer to [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116).
+
+## Noun Interpretation
+* FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
+* BE: Backend, the back-end node of Doris. Responsible for query execution and data storage.
+* Left table: the table on the left during Join query. Perform Probe operation. The order can be adjusted by Join Reorder.
+* Right table: the table on the right during Join query. Perform the Build operation. The order can be adjusted by Join Reorder.
+* Fragment: FE will convert the execution of specific SQL statements into corresponding fragments and send them to BE for execution. The corresponding Fragment is executed on the BE, and the results are aggregated and returned to the FE.
+* Join on clause: `Aa=Bb` in `A join B on Aa=Bb`, based on this to generate join conjuncts during query planning, including expr used by join Build and Probe, where Build expr is called in Runtime Filter src expr, Probe expr are called target expr in Runtime Filter.
+
+## Principle
+Runtime Filter is generated during query planning, constructed in HashJoinNode, and applied in ScanNode.
+
+For example, there is currently a Join query between the T1 table and the T2 table. Its Join mode is HashJoin. T1 is a fact table with 100,000 rows of data. T2 is a dimension table with 100 rows of data. Doris join The actual situation is:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 100000                  | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+Obviously, scanning data for T2 is much faster than T1. If we take the initiative to wait for a while and then scan T1, after T2 sends the scanned data record to HashJoinNode, HashJoinNode calculates a filter condition based on the data of T2, such as the maximum value of T2 data And the minimum value, or build a Bloom Filter, and then send this filter condition to ScanNode waiting to scan T1, the latter applies this filter condition and delivers the filtered data to HashJoinNode, thereby reducing the number of probe hash tables and network overhead. This filter condition is Runtime Filter, and the effect is as follows:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 100000                  | 2000
+|        T1                        T2
+|
+```
+If the filter condition (Runtime Filter) can be pushed down to the storage engine, in some cases, the index can be used to directly reduce the amount of scanned data, thereby greatly reducing the scanning time. The effect is as follows:
+```
+|          >      HashJoinNode     <
+|         |                         |
+|         | 6000                    | 2000
+|         |                         |
+|   OlapScanNode              OlapScanNode
+|         ^                         ^   
+|         | 6000                    | 2000
+|        T1                        T2
+|
+```
+It can be seen that, unlike predicate push-down and partition cutting, Runtime Filter is a filter condition dynamically generated at runtime, that is, when the query is run, the join on clause is parsed to determine the filter expression, and the expression is broadcast to ScanNode that is reading the left table , Thereby reducing the amount of scanned data, thereby reducing the number of probe hash table, avoiding unnecessary I/O and network transmission.
+
+Runtime Filter is mainly used to optimize joins for large tables. If the amount of data in the left table is too small, or the amount of data in the right table is too large, the Runtime Filter may not achieve the expected effect.
+
+## Usage
+
+### Set session variable
+In addition to `runtime_filter_type`, other query options are used to adjust the Runtime Filter to achieve the best performance in a specific scenario, usually only after the performance test, to optimize the query that is resource-intensive, takes a long enough time to run, and has a high enough frequency .
+
+#### 1.runtime_filter_type
+The types of Runtime Filter used include Bloom Filter, MinMax Filter, and IN predicate. When multiple types are used, they are separated by commas. Note that you need to add quotation marks. For example:
+```
+set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
+```
+By default, only IN predicate is used conservatively. In some cases, the performance is higher when Bloom Filter, MinMax Filter, and IN predicate are used at the same time.
+
+- **Bloom Filter**: There is a certain misjudgment rate, resulting in filtered data less than expected, but it will not lead to inaccurate final results. In most cases, Bloom Filter can improve performance or have no significant impact on performance , But in some cases it will cause performance degradation.
+    - Bloom Filter construction and application overhead is high, so when the filtering rate is low, or the amount of data in the left table is small, Bloom Filter may cause performance degradation.
+    - At present, only the Key column of the left table can be pushed down to the storage engine if the Bloom Filter is applied, and the test results show that the performance of the Bloom Filter is not pushed down to the storage engine.
+    - Currently Bloom Filter only has short-circuit logic when using expression filtering on ScanNode, that is, when the false positive rate is too high, the Bloom Filter will not continue to be used, but there is no short-circuit logic when the Bloom Filter is pushed down to the storage engine , So when the filtration rate is low, it may cause performance degradation.
+- **MinMax Filter**: Contains the maximum value and the minimum value, thereby filtering data smaller than the minimum value and greater than the maximum value. The filtering effect of the MinMax Filter is related to the type of the Key column in the join on clause and the data distribution of the left and right tables.
+    - When the type of the Key column in the join on clause is int/bigint/double, etc., in extreme cases, if the maximum and minimum values of the left and right tables are the same, there is no effect, otherwise the maximum value of the right table is less than the minimum value of the left table, or the minimum of the right table The value is greater than the maximum value in the left table, the effect is best.
+    - When the type of the Key column in the join on clause is varchar, etc., applying the MinMax Filter will often cause performance degradation.
+- **IN predicate**: Construct the IN predicate based on all the values of the Key listed in the join on clause on the right table, and use the constructed IN predicate to filter on the left table. Compared with the loom filter, the cost of construction and application is lower. The table on the right tends to have higher performance when the amount of data is small.
+    - By default, only the number of data rows in the right table is less than 1024 will be pushed down (can be adjusted by the `runtime_filter_max_in_num` in the session variable).
+    - Currently IN predicate does not implement a merge method, that is, it cannot be pushed down across Fragments, so currently when it is necessary to push down to the ScanNode of the left table of shuffle join, if Bloom Filter is not generated, then we will convert IN predicate to Bloom Filter for Process pushdown across Fragments, so even if the type only selects IN predicate, Bloom Filter may actually be applied;
+
+#### 2.runtime_filter_mode
+It is used to adjust the push-down strategy of Runtime Filter, including two strategies, LOCAL and GLOBAL, and the default setting is GLOBAL strategy.
+
+LOCAL: Relatively conservative, the constructed Runtime Filter can only be used in the same Fragment on the same instance (the smallest unit of query execution), that is, the Runtime Filter producer (the HashJoinNode that constructs the Filter) and the consumer (the ScanNode that uses the RuntimeFilter) The same Fragment, such as the general scene of broadcast join;
+
+GLOBAL: Relatively radical. In addition to satisfying the scenario of the LOCAL strategy, the Runtime Filter can also be combined and transmitted to different Fragments on different instances via the network. For example, Runtime Filter producers and consumers are in different Fragments, such as shuffle join. The query can be optimized in a wider range of scenarios.
+
+When building and applying Runtime Filters on different Fragments, the reasons and strategies for merging Runtime Filters can be found in [ISSUE 6116](https://github.com/apache/incubator-doris/issues/6116)
+
+#### 3.runtime_filter_wait_time_ms
+After the Runtime Filter is turned on, the ScanNode that joins the left table will wait for a period of time for each Runtime Filter assigned to itself and then scan the data. The default wait is 1s (1000ms, in milliseconds), that is, if the ScanNode is assigned 3 Runtime Filters , Then it will wait at most 3s.
+
+Because it takes time to build and merge the Runtime Filter, ScanNode will try to push down the Runtime Filter that arrives within the waiting time to the storage engine. If the waiting time is exceeded, ScanNode will directly start scanning data using the Runtime Filter that has arrived.
+
+If the Runtime Filter arrives after ScanNode starts scanning, ScanNode will not push the Runtime Filter down to the storage engine. Instead, it will use expression filtering on ScanNode based on the Runtime Filter for the data that has been scanned from the storage engine. The scanned data will not apply the Runtime Filter, so the intermediate data size obtained will be larger than the optimal solution, but serious cracking can be avoided.
+
+If the cluster is busy and there are many resource-intensive or long-time-consuming queries on the cluster, consider increasing the waiting time to avoid missing optimization opportunities for complex queries. If the cluster load is light, and there are many small queries on the cluster that only take a few seconds, you can consider reducing the waiting time to avoid an increase of 1s for each query.
+
+#### 4.runtime_filters_max_num
+The maximum number of Bloom Filters in the Runtime Filter that can be applied to each query, the default is 10. Because Bloom Filter construction and application costs are high, so if the generated Bloom Filter exceeds the maximum allowed number, the Bloom Filter with large selectivity is retained. Currently, only the number of Bloom Filters is limited, because compared to MinMax Filter and IN predicate It is more expensive to build and apply.
+```
+Selectivity = (HashJoinNode Cardinality / HashJoinNode left child Cardinality)
+```
+Because the cardinality of FE is currently inaccurate, the selectivity of Bloom Filter calculation here is inaccurate, so in the end it may only randomly reserve part of Bloom Filter.
+
+#### 5. Bloom Filter related parameters
+- `runtime_bloom_filter_min_size`: the minimum length of Bloom Filter in Runtime Filter (in bytes), the default is 1048576 (1M);
+
+- `runtime_bloom_filter_max_size`: the maximum length of Bloom Filter in Runtime Filter (in bytes), the default is 16777216 (16M);
+
+- `runtime_bloom_filter_size`: The default length of Bloom Filter in Runtime Filter (in bytes), the default is 2097152 (2M);
+
+Because it is necessary to ensure that the length of the Bloom Filter constructed by each HashJoinNode is the same to be merged, the length of the Bloom Filter is currently calculated in the FE query planning.
+
+If you can get the number of data rows (Cardinality) in the statistical information of the join table on the right, you will try to use Cardinality as NDV, the default false detection rate fpp is 0.05, and the Bloom that contains NDV unique elements and the false detection rate is lower than fpp is calculated by the formula The minimum number of bytes required by the filter, rounded to the nearest power of 2 (log value with 2 as the base), and limits the upper and lower limits of the length of the final Bloom Filter.
+
+If the Cardinality of the join right table is not available, the default Bloom Filter length will be used.
+
+#### 6.runtime_filter_max_in_num
+If the number of rows in the right table of the join is greater than this value, we will not generate an IN predicate, and the default is 1024;
+
+### View Runtime Filter generated by query
+The query plan that can be displayed by the `explain` command includes the join on clause information used by each Fragment, as well as comments on the generation and use of the Runtime Filter by the Fragment, so as to confirm whether the Runtime Filter is applied to the desired join on clause.
+- The comment contained in the Fragment that generates the Runtime Filter, such as `runtime filters: filter_id[type] <- table.column`.
+- Use the comment contained in the fragment of Runtime Filter such as `runtime filters: filter_id[type] -> table.column`.
+
+The query in the following example uses a Runtime Filter with ID RF000.
+```
+CREATE TABLE test (t1 INT) DISTRIBUTED BY HASH (t1) BUCKETS 2 PROPERTIES("replication_num" = "1");
+INSERT INTO test VALUES (1), (2), (3), (4);
+
+CREATE TABLE test2 (t2 INT) DISTRIBUTED BY HASH (t2) BUCKETS 2 PROPERTIES("replication_num" = "1");
+INSERT INTO test2 VALUES (3), (4), (5);
+
+EXPLAIN SELECT t1 FROM test JOIN test2 where test.t1 = test2.t2;
++-------------------------------------------------------------------+
+| Explain String                                                    |
++-------------------------------------------------------------------+
+| PLAN FRAGMENT 0                                                   |
+|  OUTPUT EXPRS:`t1`                                                |
+|                                                                   |
+|   4:EXCHANGE                                                      |
+|                                                                   |
+| PLAN FRAGMENT 1                                                   |
+|  OUTPUT EXPRS:                                                    |
+|   PARTITION: HASH_PARTITIONED: `default_cluster:ssb`.`test`.`t1`  |
+|                                                                   |
+|   2:HASH JOIN                                                     |
+|   |  join op: INNER JOIN (BUCKET_SHUFFLE)                         |
+|   |  equal join conjunct: `test`.`t1` = `test2`.`t2`              |
+|   |  runtime filters: RF000[in] <- `test2`.`t2`                   |
+|   |                                                               |
+|   |----3:EXCHANGE                                                 |
+|   |                                                               |
+|   0:OlapScanNode                                                  |
+|      TABLE: test                                                  |
+|      runtime filters: RF000[in] -> `test`.`t1`                    |
+|                                                                   |
+| PLAN FRAGMENT 2                                                   |
+|  OUTPUT EXPRS:                                                    |
+|   PARTITION: HASH_PARTITIONED: `default_cluster:ssb`.`test2`.`t2` |
+|                                                                   |
+|   1:OlapScanNode                                                  |
+|      TABLE: test2                                                 |
++-------------------------------------------------------------------+
+-- 上面`runtime filters`的行显示了`PLAN FRAGMENT 1`的`2:HASH JOIN`生成了ID为RF000的IN predicate，
+-- 其中`test2`.`t2`的key values仅在运行时可知，
+-- 在`0:OlapScanNode`使用了该IN predicate用于在读取`test`.`t1`时过滤不必要的数据。
+
+SELECT t1 FROM test JOIN test2 where test.t1 = test2.t2; 
+-- 返回2行结果[3, 4];
+
+-- 通过query的profile（set is_report_success=true;）可以查看查询内部工作的详细信息，
+-- 包括每个Runtime Filter是否下推、等待耗时、以及OLAP_SCAN_NODE从prepare到接收到Runtime Filter的总时长。
+RuntimeFilter:in:
+    -  HasPushDownToEngine:  true
+    -  AWaitTimeCost:  0ns
+    -  EffectTimeCost:  2.76ms
+
+-- 此外，在profile的OLAP_SCAN_NODE中还可以查看Runtime Filter下推后的过滤效果和耗时。
+    -  RowsVectorPredFiltered:  9.320008M  (9320008)
+    -  VectorPredEvalTime:  364.39ms
+```
+

Review comment:
       Modified, thks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] morningman merged pull request #6154: [DOC] Add docs of Runtime Filter

Posted by GitBox <gi...@apache.org>.

morningman merged pull request #6154:
URL: https://github.com/apache/incubator-doris/pull/6154


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org