You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/21 09:23:09 UTC

[GitHub] [arrow-datafusion] Ted-Jiang opened a new pull request, #2580: Support limit pushdown through left right outer join

Ted-Jiang opened a new pull request, #2580:
URL: https://github.com/apache/arrow-datafusion/pull/2580

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   run
   ```
    explain select * from order left  join item  on  order.o_orderkey = item.l_orderkey  limit 1;
   ```
   
   before
   ```
   | logical_plan  | Limit: 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
   |               |   Projection: #order.o_orderkey, #order.o_custkey, #order.o_orderstatus, #order.o_totalprice, #order.o_orderdate, #order.o_orderpriority, #order.o_clerk, #order.o_shippriority, #order.o_comment, #item.l_orderkey, #item.l_partkey, #item.l_suppkey, #item.l_linenumber, #item.l_quantity, #item.l_extendedprice, #item.l_discount, #item.l_tax, #item.l_returnflag, #item.l_linestatus, #item.l_shipdate, #item.l_commitdate, #item.l_receiptdate, #item.l_shipinstruct, #item.l_shipmode, #item.l_comment                                                                                                                                                                                                                                                                                                      |
   |               |     Left Join: #order.o_orderkey = #item.l_orderkey                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
   |               |       TableScan: order projection=Some([0, 1, 2, 3, 4, 5, 6, 7, 8])                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
   |               |       TableScan: item projection=Some([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
   ```
   
   after:
   ```
   | logical_plan  | Limit: 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
   |               |   Projection: #order.o_orderkey, #order.o_custkey, #order.o_orderstatus, #order.o_totalprice, #order.o_orderdate, #order.o_orderpriority, #order.o_clerk, #order.o_shippriority, #order.o_comment, #item.l_orderkey, #item.l_partkey, #item.l_suppkey, #item.l_linenumber, #item.l_quantity, #item.l_extendedprice, #item.l_discount, #item.l_tax, #item.l_returnflag, #item.l_linestatus, #item.l_shipdate, #item.l_commitdate, #item.l_receiptdate, #item.l_shipinstruct, #item.l_shipmode, #item.l_comment                                                                                                                                                                                                                                                                                                      |
   |               |     Left Join: #order.o_orderkey = #item.l_orderkey                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
   |               |       TableScan: order projection=Some([0, 1, 2, 3, 4, 5, 6, 7, 8]), limit=1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
   |               |       TableScan: item projection=Some([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
   
   ```
   Closes #2579.
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on PR #2580:
URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1133628760

   > Yes, you are right @Ted-Jiang -- I was confused about the side of the join 🤦
   > 
   > Upon more thought I think this is correct.
   > 
   > Thank you again.
   > 
   > I'll leave it open for another day or so before merging in case anyone else has thoughts
   
   Thanks @alamb ❤️.
   I find this rule in spark
   https://github.com/apache/spark/blob/efe43306fcab18f076f755c81c0406ebc1a5fee9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L707


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2580:
URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1133953414

   And now there is a conflicts 🤦 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang closed pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
Ted-Jiang closed pull request #2580: Support limit pushdown through left right outer join
URL: https://github.com/apache/arrow-datafusion/pull/2580


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2580:
URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1135013241

   ![Screen Shot 2022-05-23 at 2 38 36 PM](https://user-images.githubusercontent.com/490673/169885111-87efdc66-6f1d-4b15-aca6-6d7049c516a0.png)
   
   Strangely github won't let this PR be reopened 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
jackwener commented on code in PR #2580:
URL: https://github.com/apache/arrow-datafusion/pull/2580#discussion_r878828246


##########
datafusion/core/src/optimizer/limit_push_down.rs:
##########
@@ -157,25 +158,99 @@ fn limit_push_down(
                 )?),
             }))
         }
+        (LogicalPlan::Join(Join { join_type, .. }), upper_limit) => match join_type {
+            JoinType::Left => {
+                //if LeftOuter join push limit to left
+                generate_push_down_join(
+                    _optimizer,
+                    _execution_props,
+                    plan,
+                    upper_limit,
+                    None,
+                )
+            }
+            JoinType::Right =>
+            //if RightOuter join  push limit to right

Review Comment:
   Minor advice
   ```suggestion
               // If RightOuter join  push limit to right
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang closed pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
Ted-Jiang closed pull request #2580: Support limit pushdown through left right outer join
URL: https://github.com/apache/arrow-datafusion/pull/2580


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2580:
URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1135014883

   https://github.com/apache/arrow-datafusion/pull/2596 is now merged 👏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2580:
URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1135012881

   reopened as part of https://github.com/apache/arrow-datafusion/pull/2596


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2580:
URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1133902905

   it seems as if there is a clippy error now 😢 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2580:
URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1133626760

   > One question is df LEFT JOIN is equal to LEFT OUTER JOIN?
   
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on PR #2580:
URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1133607491

   > Thank you for the contribution @Ted-Jiang , howe I don't think this is a valid optimization.
   > 
   > Specifically, because a join can filter out rows, if you limit the input you may actually end up with fewer output rows than the limit.
   > 
   > Consider this input:
   > 
   > `left`:
   > 
   > l
   > 1
   > 2
   > ...
   > 100
   > `right`:
   > 
   > r
   > 99
   > 100
   > The output of `select * from left LEFT JOIN right ON (l = r)` should be:
   > 
   > l	r
   > 99	99
   > 100	100
   > However, if you push the limit down to the scan on `left` it would only send this into the JOIN
   > 
   > `left`:
   > 
   > l
   > 1
   > 2
   > ```
   > 
   > And thus would produce no output. 
   > 
   > If we want to optimize limits in Joins, I think it would have to be done in the Join Operator itself (to stop producing rows once the limit is hit). However, as long as the Join operator is producing rows in batches, the effect of implementing an internal limit will likely be small (because it would save only a part of one output batch, when the limit on the output of a join is hit)
   > ```
   
   I think this situation is `select * from left LEFT JOIN right ON (l = r)` without limit. There will no limit in left table_scan
   will still produce
   
   > `left`:
   > 
   > l
   > 1
   > 
   > ...
   > 100
   
   But this rule will apply `select * from left LEFT JOIN right ON (l = r) limit 2`
   which left table will get the limit  will send two values to `join`, i think this is right
   and the result will be
   
   ```
   l	r
   1       Null
   2	Null
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang closed pull request #2580: Support limit pushdown through left right outer join

Posted by GitBox <gi...@apache.org>.
Ted-Jiang closed pull request #2580: Support limit pushdown through left right outer join
URL: https://github.com/apache/arrow-datafusion/pull/2580


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org