You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/04/07 09:12:45 UTC

[GitHub] [incubator-doris] starocean999 opened a new issue, #8894: [Enhancement] Push predicate to subquery

starocean999 opened a new issue, #8894:
URL: https://github.com/apache/incubator-doris/issues/8894

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   
   TPCH q17:
   select
       sum(l_extendedprice) / 7.0 as avg_yearly
   from
       lineitem,
       part
   where
       p_partkey = l_partkey
       and p_brand = 'Brand#23'
       and p_container = 'MED BOX'
       and l_quantity < (
           select
               0.2 * avg(l_quantity)
           from
               lineitem
           where
               l_partkey = p_partkey
       );
   
   If we rewrite the query as follows, the scan rows of lineitem in subquery will much less than old q17
   as well as the less row count for group node. 
   
   TPCH Q17(rewrite):
   select
       sum(l_extendedprice) / 7.0 as avg_yearly
   from
       lineitem,
       part p1
   where
       p_partkey = l_partkey
       and p_brand = 'Brand#23'
       and p_container = 'MED BOX'
       and l_quantity < (
           select
               0.2 * avg(l_quantity)
           from
               lineitem,
               part p2
           where
               l_partkey = p1.p_partkey
               and  l_partkey = p2.p_partkey 
               and p2.p_brand = 'Brand#23'
               and p2.p_container = 'MED BOX'
       );
   
   Although we might push some predicate to subquery to minimize the original scan row count, it will also introduce a new table and a new join node, which results more rows to scan, filter and join. In TPCH Q2 and Q17, the pushed table is relatively small, so get better performance after pushing. But normally, a CBO optimizer is needed to decide if the pushing is worth. At last, if table spool optimizer is implemented, we can always push the predicates.
   
   ### Solution
   
   1. Push the predicate to subquery in some case.
   2. Add a session variable to control if enable this behavior.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org