You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/24 08:16:53 UTC

[GitHub] [spark] leesf opened a new pull request #35648: [SPARK-38313] Remove deprecated logic in SortMergeJoinExec

leesf opened a new pull request #35648:
URL: https://github.com/apache/spark/pull/35648


   ### What changes were proposed in this pull request?
   
   Remove deprecated logic in SortMergeJoinExec in full outer sort merge, while the [PR](https://github.com/apache/spark/pull/34612) handles compare result separately, so the original code to handle `<` logic would be removed.
   
   
   ### Why are the changes needed?
   Make code cleaner
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leesf commented on pull request #35648: [SPARK-38313] Remove deprecated logic in SortMergeJoinExec

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #35648:
URL: https://github.com/apache/spark/pull/35648#issuecomment-1049599897


   @c21 @cloud-fan would you please help to review the PR when you have time? Thanks in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35648: [SPARK-38313] Remove deprecated logic in SortMergeJoinExec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35648:
URL: https://github.com/apache/spark/pull/35648#issuecomment-1051320521


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leesf commented on pull request #35648: [SPARK-38313] Remove deprecated logic in SortMergeJoinExec

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #35648:
URL: https://github.com/apache/spark/pull/35648#issuecomment-1053787319


   > > However in which case will it exists unmatched rows?
   > 
   > @leesf - Beside the equi join condition, we also have non-equi condition. e.g.
   > 
   > ```
   > SELECT *
   > FROM t1
   > JOIN t2
   > ON 
   >   t1.k = t2.k
   >   AND t1.v != t2.v     <- this is the non-equi condition
   > ```
   > 
   > So rows in left, right buffers have same values for join keys (i.e. `k` column here), but not necessarily pass the non-equi join condition (i.e. `v` column here). `SortMergeJoinExec.condition` is the non-equi condition (i.e. `t1.v != t2.v` here).
   
   Thanks for you explanation and example, you are right and i going to close the PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35648: [SPARK-38313] Remove deprecated logic in SortMergeJoinExec

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35648:
URL: https://github.com/apache/spark/pull/35648#issuecomment-1050490313


   There are test failures, looks like this change breaks some queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leesf edited a comment on pull request #35648: [SPARK-38313] Remove deprecated logic in SortMergeJoinExec

Posted by GitBox <gi...@apache.org>.
leesf edited a comment on pull request #35648:
URL: https://github.com/apache/spark/pull/35648#issuecomment-1053787319


   > > However in which case will it exists unmatched rows?
   > 
   > @leesf - Beside the equi join condition, we also have non-equi condition. e.g.
   > 
   > ```
   > SELECT *
   > FROM t1
   > JOIN t2
   > ON 
   >   t1.k = t2.k
   >   AND t1.v != t2.v     <- this is the non-equi condition
   > ```
   > 
   > So rows in left, right buffers have same values for join keys (i.e. `k` column here), but not necessarily pass the non-equi join condition (i.e. `v` column here). `SortMergeJoinExec.condition` is the non-equi condition (i.e. `t1.v != t2.v` here).
   
   Thanks for you explanation and example, you are right and i am going to close the PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #35648: [SPARK-38313] Remove deprecated logic in SortMergeJoinExec

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #35648:
URL: https://github.com/apache/spark/pull/35648#issuecomment-1050503495


   > However in which case will it exists unmatched rows?
   
   @leesf - Beside the equi join condition, we also have non-equi condition. e.g.
   
   ```
   SELECT *
   FROM t1
   JOIN t2
   ON 
     t1.k = t2.k
     AND t1.v != t2.v     <- this is the non-equi condition
   ```
   
   So rows in left, right buffers have same values for join keys (i.e. `k` column here), but not necessarily pass the non-equi join condition (i.e. `v` column here). `SortMergeJoinExec.condition` is the non-equi condition (i.e. `t1.v != t2.v` here).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leesf closed pull request #35648: [SPARK-38313] Remove deprecated logic in SortMergeJoinExec

Posted by GitBox <gi...@apache.org>.
leesf closed pull request #35648:
URL: https://github.com/apache/spark/pull/35648


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] leesf commented on pull request #35648: [SPARK-38313] Remove deprecated logic in SortMergeJoinExec

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #35648:
URL: https://github.com/apache/spark/pull/35648#issuecomment-1050434148


   > join still needs to output one row
   
   @c21 Thanks for the comment. However in which case will it exists unmatched rows? rows in `leftMatches`/`rightMatches` are the same, so it should be all matched. And also would you please explain more about the `but may still not match because the join condition needs to be passed.`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org