You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/21 04:47:07 UTC

[GitHub] [arrow-datafusion] waynexia commented on a diff in pull request #3861: Fix 3635 redundant projections

waynexia commented on code in PR #3861:
URL: https://github.com/apache/arrow-datafusion/pull/3861#discussion_r1001359583


##########
benchmarks/expected-plans/q21.txt:
##########
@@ -7,15 +7,18 @@ Sort: numwait DESC NULLS FIRST, supplier.s_name ASC NULLS LAST
             Inner Join: l1.l_orderkey = orders.o_orderkey
               Inner Join: supplier.s_suppkey = l1.l_suppkey
                 TableScan: supplier projection=[s_suppkey, s_name, s_nationkey]
-                Filter: l1.l_receiptdate > l1.l_commitdate
-                  SubqueryAlias: l1
-                    TableScan: lineitem projection=[l_orderkey, l_suppkey, l_commitdate, l_receiptdate]
-              Filter: orders.o_orderstatus = Utf8("F")
-                TableScan: orders projection=[o_orderkey, o_orderstatus]
-            Filter: nation.n_name = Utf8("SAUDI ARABIA")
-              TableScan: nation projection=[n_nationkey, n_name]
+                Filter: l1.l_receiptdate > l1.l_commitdatel1.l_commitdatel1.l_receiptdate
+                  Projection: l1.l_receiptdate > l1.l_commitdate AS l1.l_receiptdate > l1.l_commitdatel1.l_commitdatel1.l_receiptdate, l1.l_orderkey, l1.l_suppkey
+                    SubqueryAlias: l1
+                      TableScan: lineitem projection=[l_orderkey, l_suppkey, l_commitdate, l_receiptdate]
+              Filter: orders.o_orderstatus = Utf8("F")Utf8("F")orders.o_orderstatus
+                Projection: orders.o_orderstatus = Utf8("F") AS orders.o_orderstatus = Utf8("F")Utf8("F")orders.o_orderstatus, orders.o_orderkey
+                  TableScan: orders projection=[o_orderkey, o_orderstatus]
+            Filter: nation.n_name = Utf8("SAUDI ARABIA")Utf8("SAUDI ARABIA")nation.n_name
+              Projection: nation.n_name = Utf8("SAUDI ARABIA") AS nation.n_name = Utf8("SAUDI ARABIA")Utf8("SAUDI ARABIA")nation.n_name, nation.n_nationkey

Review Comment:
   I try `EXPLAIN VERBOSE` and it looks like the `FilterPushdown` rule generates a redundant expr. Here is the log:
   ```plaintext
   | logical_plan after reduce_cross_join                       | SAME TEXT AS ABOVE                                                                                                                                                                                                                                                                  |
   | logical_plan after common_sub_expression_eliminate         | SAME TEXT AS ABOVE                                                                                                                                                                                                                                                                  |
   | logical_plan after eliminate_limit                         | SAME TEXT AS ABOVE                                                                                                                                                                                                                                                                  |
   | logical_plan after projection_push_down                    | Sort: numwait DESC NULLS FIRST, supplier.s_name ASC NULLS LAST                                                                                                                                                                                                                      |
   |                                                            |   Projection: supplier.s_name, COUNT(UInt8(1)) AS numwait                                                                                                                                                                                                                           |
   |                                                            |     Aggregate: groupBy=[[supplier.s_name]], aggr=[[COUNT(UInt8(1))]]                                                                                                                                                                                                                |
   |                                                            |       Filter: orders.o_orderstatus = Utf8("F") AND l1.l_receiptdate > l1.l_commitdate AND nation.n_name = Utf8("SAUDI ARABIA")                                                                                                                                                      |
   |                                                            |         Anti Join: l1.l_orderkey = l3.l_orderkey Filter: l3.l_suppkey != l1.l_suppkey                                                                                                                                                                                               |
   |                                                            |           Filter: orders.o_orderstatus = Utf8("F") AND l1.l_receiptdate > l1.l_commitdate AND nation.n_name = Utf8("SAUDI ARABIA")                                                                                                                                                  |
   |                                                            |             Semi Join: l1.l_orderkey = l2.l_orderkey Filter: l2.l_suppkey != l1.l_suppkey                                                                                                                                                                                           |
   |                                                            |               Inner Join: supplier.s_nationkey = nation.n_nationkey                                                                                                                                                                                                                 |
   |                                                            |                 Inner Join: l1.l_orderkey = orders.o_orderkey                                                                                                                                                                                                                       |
   |                                                            |                   Inner Join: supplier.s_suppkey = l1.l_suppkey                                                                                                                                                                                                                     |
   |                                                            |                     TableScan: supplier projection=[s_suppkey, s_name, s_nationkey]                                                                                                                                                                                                 |
   |                                                            |                     SubqueryAlias: l1                                                                                                                                                                                                                                               |
   |                                                            |                       TableScan: lineitem projection=[l_orderkey, l_suppkey, l_commitdate, l_receiptdate]                                                                                                                                                                           |
   |                                                            |                   TableScan: orders projection=[o_orderkey, o_orderstatus]                                                                                                                                                                                                          |
   |                                                            |                 TableScan: nation projection=[n_nationkey, n_name]                                                                                                                                                                                                                  |
   |                                                            |               SubqueryAlias: l2                                                                                                                                                                                                                                                     |
   |                                                            |                 TableScan: lineitem projection=[l_orderkey, l_suppkey]                                                                                                                                                                                                              |
   |                                                            |           Filter: l3.l_receiptdate > l3.l_commitdate                                                                                                                                                                                                                                |
   |                                                            |             SubqueryAlias: l3                                                                                                                                                                                                                                                       |
   |                                                            |               TableScan: lineitem projection=[l_orderkey, l_suppkey, l_commitdate, l_receiptdate]                                                                                                                                                                                   |
   | logical_plan after rewrite_disjunctive_predicate           | SAME TEXT AS ABOVE                                                                                                                                                                                                                                                                  |
   | logical_plan after reduce_outer_join                       | SAME TEXT AS ABOVE                                                                                                                                                                                                                                                                  |
   | logical_plan after filter_push_down                        | Sort: numwait DESC NULLS FIRST, supplier.s_name ASC NULLS LAST                                                                                                                                                                                                                      |
   |                                                            |   Projection: supplier.s_name, COUNT(UInt8(1)) AS numwait                                                                                                                                                                                                                           |
   |                                                            |     Aggregate: groupBy=[[supplier.s_name]], aggr=[[COUNT(UInt8(1))]]                                                                                                                                                                                                                |
   |                                                            |       Anti Join: l1.l_orderkey = l3.l_orderkey Filter: l3.l_suppkey != l1.l_suppkey                                                                                                                                                                                                 |
   |                                                            |         Semi Join: l1.l_orderkey = l2.l_orderkey Filter: l2.l_suppkey != l1.l_suppkey                                                                                                                                                                                               |
   |                                                            |           Inner Join: supplier.s_nationkey = nation.n_nationkey                                                                                                                                                                                                                     |
   |                                                            |             Inner Join: l1.l_orderkey = orders.o_orderkey                                                                                                                                                                                                                           |
   |                                                            |               Inner Join: supplier.s_suppkey = l1.l_suppkey                                                                                                                                                                                                                         |
   |                                                            |                 TableScan: supplier projection=[s_suppkey, s_name, s_nationkey]                                                                                                                                                                                                     |
   |                                                            |                 Filter: l1.l_receiptdate > l1.l_commitdate AND l1.l_receiptdate > l1.l_commitdate                                                                                                                                                                                   |
   |                                                            |                   SubqueryAlias: l1                                                                                                                                                                                                                                                 |
   |                                                            |                     TableScan: lineitem projection=[l_orderkey, l_suppkey, l_commitdate, l_receiptdate]                                                                                                                                                                             |
   |                                                            |               Filter: orders.o_orderstatus = Utf8("F") AND orders.o_orderstatus = Utf8("F")                                                                                                                                                                                         |
   |                                                            |                 TableScan: orders projection=[o_orderkey, o_orderstatus]                                                                                                                                                                                                            |
   |                                                            |             Filter: nation.n_name = Utf8("SAUDI ARABIA") AND nation.n_name = Utf8("SAUDI ARABIA")                                                                                                                                                                                   |
   |                                                            |               TableScan: nation projection=[n_nationkey, n_name]                                                                                                                                                                                                                    |
   |                                                            |           SubqueryAlias: l2                                                                                                                                                                                                                                                         |
   |                                                            |             TableScan: lineitem projection=[l_orderkey, l_suppkey]                                                                                                                                                                                                                  |
   |                                                            |         Filter: l3.l_receiptdate > l3.l_commitdate                                                                                                                                                                                                                                  |
   |                                                            |           SubqueryAlias: l3                                                                                                                                                                                                                                                         |
   |                                                            |             TableScan: lineitem projection=[l_orderkey, l_suppkey, l_commitdate, l_receiptdate]                                                                                                                                                                                     |
   | logical_plan after limit_push_down                         | SAME TEXT AS ABOVE                                                                                                                                                                                                                                                                  |
   | logical_plan after single_distinct_aggregation_to_group_by | SAME TEXT AS ABOVE                                                                                                                                                                                                                                                                  |
   | logical_plan after type_coercion                           | SAME TEXT AS ABOVE             
   ```
   
   And the redundant `AND` in filter is eliminated in the following `simplify_expressions` rule, but the projection introduced by `CommonSubexpressionEliminate` left.
   
   I haven't dug into it. But this is not in the scope of this PR I suppose. I'll open an issue to track it. (the causality among rules are really interesting😋)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org