You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rdblue <gi...@git.apache.org> on 2017/10/24 21:55:57 UTC

[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

GitHub user rdblue opened a pull request:

    https://github.com/apache/spark/pull/19568

    SPARK-22345: Fix sort-merge joins with conditions and codegen.

    ## What changes were proposed in this pull request?
    
    This adds a joined row to sort-merge join codegen. That joined row is used to generate code for filter expressions, which may fall back to using the result row. Previously, the right side of the join was used, which is incorrect (the non-codegen implementations use a joined row).
    
    ## How was this patch tested?
    
    Current tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/spark SPARK-22345-fix-sort-merge-codegen

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19568.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19568
    
----
commit 4afb088a4fa2127cab7467cc56f58cd77bd8c251
Author: Ryan Blue <bl...@apache.org>
Date:   2017-10-24T20:21:50Z

    SPARK-22345: Fix sort-merge joins with conditions and codegen.
    
    Code for the condition was generated to depend on the right row instead
    of the joined row.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r146974005
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -585,21 +585,26 @@ case class SortMergeJoinExec(
     
         val iterator = ctx.freshName("iterator")
         val numOutput = metricTerm(ctx, "numOutputRows")
    +    val joinedRow = ctx.freshName("joined")
    --- End diff --
    
    It ended up being a bit more complicated. There are two problems. The first is what this fixes, which is that the INPUT_ROW in the codegen context points to the wrong row. This is fixed and now has a test that fails if you uncomment the line that sets INPUT_ROW.
    
    The second problem is in the check for `CodegenFallback` fails to check whether the condition supports codegen in some plans. To get the test to fail, I had to add a projection to exercise the [path where this happens](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L524). I'll add a second commit for this problem.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r146928894
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -585,21 +585,26 @@ case class SortMergeJoinExec(
     
         val iterator = ctx.freshName("iterator")
         val numOutput = metricTerm(ctx, "numOutputRows")
    +    val joinedRow = ctx.freshName("joined")
    --- End diff --
    
    The joined row should always be used for correctness. We don't know what code the expression will generate, so we should plan on always passing the correct input row. Setting left and right on a joined row is a cheap operation, so I'd rather do it correctly than rely on something brittle like `isInstanceOf[CodegenFallback]`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by rdblue <gi...@git.apache.org>.
GitHub user rdblue reopened a pull request:

    https://github.com/apache/spark/pull/19568

    SPARK-22345: Fix sort-merge joins with conditions and codegen.

    ## What changes were proposed in this pull request?
    
    This adds a joined row to sort-merge join codegen. That joined row is used to generate code for filter expressions, which may fall back to using the result row. Previously, the right side of the join was used, which is incorrect (the non-codegen implementations use a joined row).
    
    ## How was this patch tested?
    
    Current tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/spark SPARK-22345-fix-sort-merge-codegen

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19568.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19568
    
----

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    @DonnyZone, I don't know of any cases that use codgen after the fix for `CodegenFallback`, but I think this is still a good idea.
    
    If Spark is going to generate code, it should generate correct code. That means either we remove the codegen implementation from `CodegenFallback`, or we fix the row passed in by sort-merge join. I'd rather fix sort-merge join because we may want to change the behavior to codegen as much of the condition as possible later on.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83090/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r146970055
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -615,6 +620,7 @@ case class SortMergeJoinExec(
         }
     
         s"""
    +       |$joinedRow = new JoinedRow();
    --- End diff --
    
    Fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    **[Test build #83056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83056/testReport)** for PR 19568 at commit [`3431778`](https://github.com/apache/spark/commit/343177838e4382fee2b9b102b1ead335a3fb24cb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    **[Test build #83023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83023/testReport)** for PR 19568 at commit [`4afb088`](https://github.com/apache/spark/commit/4afb088a4fa2127cab7467cc56f58cd77bd8c251).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    Could you please change title from `SPARK-22345` to `[SPARK-22345]`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r146743704
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -615,6 +620,7 @@ case class SortMergeJoinExec(
         }
     
         s"""
    +       |$joinedRow = new JoinedRow();
    --- End diff --
    
    Since we generate Java program, is it necessary to declare `JoinedRow` type for `$joinedRow`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    **[Test build #83023 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83023/testReport)** for PR 19568 at commit [`4afb088`](https://github.com/apache/spark/commit/4afb088a4fa2127cab7467cc56f58cd77bd8c251).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r146975643
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -585,21 +585,26 @@ case class SortMergeJoinExec(
     
         val iterator = ctx.freshName("iterator")
         val numOutput = metricTerm(ctx, "numOutputRows")
    +    val joinedRow = ctx.freshName("joined")
    --- End diff --
    
    The second problem was fixed in this commit: https://github.com/apache/spark/commit/6b6dd682e84d3b03d0b15fbd81a0d16729e521d2
    
    I still think that the codegen problem should be fixed. Detecting `CodgenFallback` is imperfect, but will still generate code and run it. I think we should either remove codegen from `CodegenFallback` or add this fix to ensure that code works, even if we don't expect to run it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    @dongjoon-hyun, yes, I'm currently working on it. I just wanted to get the rest up.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r146720363
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -615,6 +620,7 @@ case class SortMergeJoinExec(
         }
     
         s"""
    +       |$joinedRow = new JoinedRow();
    --- End diff --
    
    Hi, @rdblue .
    Since this is a correctness issue, can we have a test case including the incorrect result you met?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    @gatorsmile, I think it would be better to fix codegen than to prevent it from happening with an assertion. If `CodegenFallback` can produce fallback code, why not allow it to when necessary?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83023/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue closed the pull request at:

    https://github.com/apache/spark/pull/19568


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by DonnyZone <gi...@git.apache.org>.
Github user DonnyZone commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    @rdblue Could you find any cases that can trigger the problem of wrong INPUT_ROW in SortMergeJoinExec after the fix (https://github.com/apache/spark/pull/18656) for CollapseCodegenStages rule? I tried to do it before.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    **[Test build #83090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83090/testReport)** for PR 19568 at commit [`146d791`](https://github.com/apache/spark/commit/146d7918b38415980370617a9a97a5aaf657d2e8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r147470383
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala ---
    @@ -228,6 +230,27 @@ class InnerJoinSuite extends SparkPlanTest with SharedSQLContext {
         )
       )
     
    +  testInnerJoin(
    --- End diff --
    
    It still can pass without the changes in this PR. What is the purpose of this test case?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r146755690
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -615,6 +620,7 @@ case class SortMergeJoinExec(
         }
     
         s"""
    +       |$joinedRow = new JoinedRow();
    --- End diff --
    
    Yea, I think we need.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r147471530
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala ---
    @@ -124,7 +125,8 @@ class InnerJoinSuite extends SparkPlanTest with SharedSQLContext {
             rightPlan: SparkPlan) = {
           val sortMergeJoin = joins.SortMergeJoinExec(leftKeys, rightKeys, Inner, boundCondition,
             leftPlan, rightPlan)
    -      EnsureRequirements(spark.sessionState.conf).apply(sortMergeJoin)
    +      EnsureRequirements(spark.sessionState.conf)
    +          .apply(ProjectExec(sortMergeJoin.output, sortMergeJoin))
    --- End diff --
    
    In 2.1.1, an extra project causes `WholeStageCodegenExec` to not detect that the expression contains `CodegenFallback`. This is no longer the case. Like I said, there is no longer a good way to test what happens when `CodegenFallback` generates code. If there were, I'd use that here to test the case.
    
    I guess I could add a testing case to `WholeStageCodegenExec` to make sure the code is generated correctly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83056/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r146928318
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -615,6 +620,7 @@ case class SortMergeJoinExec(
         }
     
         s"""
    +       |$joinedRow = new JoinedRow();
    --- End diff --
    
    Yeah, that's causing the test failures. This is a typo from some restructuring I did to get this upstream.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    **[Test build #83090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83090/testReport)** for PR 19568 at commit [`146d791`](https://github.com/apache/spark/commit/146d7918b38415980370617a9a97a5aaf657d2e8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by DonnyZone <gi...@git.apache.org>.
Github user DonnyZone commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    This PR is similar to the initial commit when I try to fix SPARK-21441 in  https://github.com/apache/spark/pull/18656 (https://github.com/apache/spark/pull/18656/commits/92dc106aec59a0f2755d7621d2d03831250cccc0). 
    (1) The INPUT_ROW in SortMergeJoinExec's codegen context points to the wrong row.  In general, it works well. However, this behavior potentially causes wrong result or even NPE in `bindReference`. I think it is necessary to fix it.
    (2) The `CollapseCodegenStages` rule before 2.1.1 has an issue which may lead to code generation even the SortMergeJoinExec contains CodegenFallback expressions, when it has an umbrella SparkPlan (e.g., ProjectExec). Consequently, it triggers the above potential issue. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r147470946
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala ---
    @@ -124,7 +125,8 @@ class InnerJoinSuite extends SparkPlanTest with SharedSQLContext {
             rightPlan: SparkPlan) = {
           val sortMergeJoin = joins.SortMergeJoinExec(leftKeys, rightKeys, Inner, boundCondition,
             leftPlan, rightPlan)
    -      EnsureRequirements(spark.sessionState.conf).apply(sortMergeJoin)
    +      EnsureRequirements(spark.sessionState.conf)
    +          .apply(ProjectExec(sortMergeJoin.output, sortMergeJoin))
    --- End diff --
    
    Why we need to change this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r147471054
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala ---
    @@ -228,6 +230,27 @@ class InnerJoinSuite extends SparkPlanTest with SharedSQLContext {
         )
       )
     
    +  testInnerJoin(
    --- End diff --
    
    This test fails in 2.1.1 and versions before https://github.com/apache/spark/commit/6b6dd682e84d3b03d0b15fbd81a0d16729e521d2. I'm not sure how to exercise the code generated by CodegenFallback with that fix, but this test is valid for the 2.1.1 branch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    **[Test build #83056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83056/testReport)** for PR 19568 at commit [`3431778`](https://github.com/apache/spark/commit/343177838e4382fee2b9b102b1ead335a3fb24cb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19568
  
    @rdblue Yes, the current implementation implicitly assumes the rule `CollapseCodegenStages ` excludes all the illegal cases. How about adding an `assert` to do the check that the condition of `SortMergeJoinExec` does not have `CodegenFallback ` expressions? Also write a code comment to explain `CollapseCodegenStages ` guarantees the assumption?
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r146757237
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -585,21 +585,26 @@ case class SortMergeJoinExec(
     
         val iterator = ctx.freshName("iterator")
         val numOutput = metricTerm(ctx, "numOutputRows")
    +    val joinedRow = ctx.freshName("joined")
         val (beforeLoop, condCheck) = if (condition.isDefined) {
           // Split the code of creating variables based on whether it's used by condition or not.
           val loaded = ctx.freshName("loaded")
           val (leftBefore, leftAfter) = splitVarsByCondition(left.output, leftVars)
           val (rightBefore, rightAfter) = splitVarsByCondition(right.output, rightVars)
    +
           // Generate code for condition
    +      ctx.INPUT_ROW = joinedRow
    --- End diff --
    
    We should also leave a comment explaining why add a JoinedRow as INPUT_ROW.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r146756914
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -585,21 +585,26 @@ case class SortMergeJoinExec(
     
         val iterator = ctx.freshName("iterator")
         val numOutput = metricTerm(ctx, "numOutputRows")
    +    val joinedRow = ctx.freshName("joined")
    --- End diff --
    
    I think we only need to do this when there is `CodegenFallback` in the condition expressions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19568: SPARK-22345: Fix sort-merge joins with conditions...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19568#discussion_r147222233
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -585,21 +585,26 @@ case class SortMergeJoinExec(
     
         val iterator = ctx.freshName("iterator")
         val numOutput = metricTerm(ctx, "numOutputRows")
    +    val joinedRow = ctx.freshName("joined")
         val (beforeLoop, condCheck) = if (condition.isDefined) {
           // Split the code of creating variables based on whether it's used by condition or not.
           val loaded = ctx.freshName("loaded")
           val (leftBefore, leftAfter) = splitVarsByCondition(left.output, leftVars)
           val (rightBefore, rightAfter) = splitVarsByCondition(right.output, rightVars)
    +
           // Generate code for condition
    +      ctx.INPUT_ROW = joinedRow
    --- End diff --
    
    Fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org