You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2017/11/19 00:27:39 UTC

[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/19781

    [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCopyResult in Sort, HashAggregate, and  BroadcastHashJoin

    ## What changes were proposed in this pull request?
    I found #19656 causes some bugs, for example, it changed the result set of `q6` in tpcds:
    - w/o pr19658
    ```
    +-----+---+
    |state|cnt|
    +-----+---+
    |   MA| 10|
    |   AK| 10|
    |   AZ| 11|
    |   ME| 13|
    |   VT| 14|
    |   NV| 15|
    |   NH| 16|
    |   UT| 17|
    |   NJ| 21|
    |   MD| 22|
    |   WY| 25|
    |   NM| 26|
    |   OR| 31|
    |   WA| 36|
    |   ND| 38|
    |   ID| 39|
    |   SC| 45|
    |   WV| 50|
    |   FL| 51|
    |   OK| 53|
    |   MT| 53|
    |   CO| 57|
    |   AR| 58|
    |   NY| 58|
    |   PA| 62|
    |   AL| 63|
    |   LA| 63|
    |   SD| 70|
    |   WI| 80|
    | null| 81|
    |   MI| 82|
    |   NC| 82|
    |   MS| 83|
    |   CA| 84|
    |   MN| 85|
    |   MO| 88|
    |   IL| 95|
    |   IA|102|
    |   TN|102|
    |   IN|103|
    |   KY|104|
    |   NE|113|
    |   OH|114|
    |   VA|130|
    |   KS|139|
    |   GA|168|
    |   TX|216|
    +-----+---+
    ```
    - w/   pr19658
    ```
    +-----+---+
    |state|cnt|
    +-----+---+
    |   RI| 14|
    |   AK| 16|
    |   FL| 20|
    |   NJ| 21|
    |   NM| 21|
    |   NV| 22|
    |   MA| 22|
    |   MD| 22|
    |   UT| 22|
    |   AZ| 25|
    |   SC| 28|
    |   AL| 36|
    |   MT| 36|
    |   WA| 39|
    |   ND| 41|
    |   MI| 44|
    |   AR| 45|
    |   OR| 47|
    |   OK| 52|
    |   PA| 53|
    |   LA| 55|
    |   CO| 55|
    |   NY| 64|
    |   WV| 66|
    |   SD| 72|
    |   MS| 73|
    |   NC| 79|
    |   IN| 82|
    | null| 85|
    |   ID| 88|
    |   MN| 91|
    |   WI| 95|
    |   IL| 96|
    |   MO| 97|
    |   CA|109|
    |   CA|109|
    |   TN|114|
    |   NE|115|
    |   KY|128|
    |   OH|131|
    |   IA|156|
    |   TX|160|
    |   VA|182|
    |   KS|211|
    |   GA|230|
    +-----+---+
    ```
    This pr is to keep the original logic of `CodegenContext.copyResult` in some plans.
    
    ## How was this patch tested?
    Existing tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SPARK-22445-bugfix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19781.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19781
    
----
commit 9797041aa9138386f26d1f6c259da302f918ab5d
Author: Takeshi Yamamuro <ya...@apache.org>
Date:   2017-11-19T00:12:46Z

    bugfix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19781#discussion_r151963572
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ---
    @@ -76,7 +76,7 @@ case class BroadcastHashJoinExec(
         streamedPlan.asInstanceOf[CodegenSupport].inputRDDs()
       }
     
    -  override def needCopyResult: Boolean = joinType match {
    +  private def checkNeedCopyResultFromJoinType: Boolean = joinType match {
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84024/testReport)** for PR 19781 at commit [`c06228a`](https://github.com/apache/spark/commit/c06228a1bc7780045d6e7366d4b9e97e24456727).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #83990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83990/testReport)** for PR 19781 at commit [`9797041`](https://github.com/apache/spark/commit/9797041aa9138386f26d1f6c259da302f918ab5d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19781#discussion_r151961655
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ---
    @@ -90,6 +90,10 @@ case class BroadcastHashJoinExec(
         case _ => false
    --- End diff --
    
    oh, ya. you're right. I'll drop the build side. thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84028/testReport)** for PR 19781 at commit [`d65e5db`](https://github.com/apache/spark/commit/d65e5db00559af45f729cb6c2cc67a843aba4525).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84025 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84025/testReport)** for PR 19781 at commit [`da45016`](https://github.com/apache/spark/commit/da4501652c8cc0795e2ec587c7efa7312f20c514).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect stream-side child'...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    thanks ,merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84028/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84028/testReport)** for PR 19781 at commit [`d65e5db`](https://github.com/apache/spark/commit/d65e5db00559af45f729cb6c2cc67a843aba4525).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    LGTM pending jenkins


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83990/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19781#discussion_r151967403
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ---
    @@ -83,11 +83,12 @@ case class BroadcastHashJoinExec(
           // them in a buffer, because these result rows share one UnsafeRow instance. Note that here
           // we wait for the broadcast to be finished, which is a no-op because it's already finished
           // when we wait it in `doProduce`.
    -      !buildPlan.executeBroadcast[HashedRelation]().value.keyIsUnique
    +      streamedPlan.asInstanceOf[CodegenSupport].needCopyResult ||
    +        !buildPlan.executeBroadcast[HashedRelation]().value.keyIsUnique
     
         // Other joins types(semi, anti, existence) can at most produce one result row for one input
         // row from the streamed side, so no need to copy the result rows.
    --- End diff --
    
    and the 2 comments becomes:
    ```
    case _: InnerLike | LeftOuter | RightOuter =>
      // For inner and outer joins, one row from the streamed side may produce multiple result rows,
      // if the build side has duplicated keys. Note that here ...
    
    case _ => Other joins types(semi, anti, existence) can at most produce one result row for one input row from the streamed side.
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect stream-side...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19781#discussion_r152455868
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
    @@ -857,4 +857,29 @@ class JoinSuite extends QueryTest with SharedSQLContext {
     
         joinQueries.foreach(assertJoinOrdering)
       }
    +
    +  test("SPARK-22445 Respect stream-side child's needCopyResult in BroadcastHashJoin") {
    +    val df1 = Seq((2, 3), (2, 5), (2, 2), (3, 8), (2, 1)).toDF("k", "v1")
    +    val df2 = Seq((2, 8), (3, 7), (3, 4), (1, 2)).toDF("k", "v2")
    +    val df3 = Seq((1, 1), (3, 2), (4, 3), (5, 1)).toDF("k", "v3")
    +
    +    withSQLConf(
    +        SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1",
    +        SQLConf.JOIN_REORDER_ENABLED.key -> "false") {
    +      val df = df1.join(df2, "k").join(functions.broadcast(df3), "k")
    +      val plan = df.queryExecution.sparkPlan
    +
    +      // Check if `needCopyResult` in `BroadcastHashJoin` is correct when smj->bhj
    --- End diff --
    
    `q6` also failed when smj->bhj


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84091/testReport)** for PR 19781 at commit [`d2b149b`](https://github.com/apache/spark/commit/d2b149bb392e7fd38b734b9ae120c92b9f0ece48).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect stream-side child'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84091/testReport)** for PR 19781 at commit [`d2b149b`](https://github.com/apache/spark/commit/d2b149bb392e7fd38b734b9ae120c92b9f0ece48).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84024/testReport)** for PR 19781 at commit [`c06228a`](https://github.com/apache/spark/commit/c06228a1bc7780045d6e7366d4b9e97e24456727).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84027/testReport)** for PR 19781 at commit [`d65e5db`](https://github.com/apache/spark/commit/d65e5db00559af45f729cb6c2cc67a843aba4525).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    I tried though, I couldn't easily make the failed (and simple) test cases. But, I feel we should do. plz give me hours to try again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19781#discussion_r152168668
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ---
    @@ -76,20 +76,23 @@ case class BroadcastHashJoinExec(
         streamedPlan.asInstanceOf[CodegenSupport].inputRDDs()
       }
     
    -  override def needCopyResult: Boolean = joinType match {
    +  private def multipleOutputForOneInput: Boolean = joinType match {
         case _: InnerLike | LeftOuter | RightOuter =>
           // For inner and outer joins, one row from the streamed side may produce multiple result rows,
    -      // if the build side has duplicated keys. Then we need to copy the result rows before putting
    -      // them in a buffer, because these result rows share one UnsafeRow instance. Note that here
    -      // we wait for the broadcast to be finished, which is a no-op because it's already finished
    -      // when we wait it in `doProduce`.
    +      // if the build side has duplicated keys. Note that here we wait for the broadcast to be
    +      // finished, which is a no-op because it's already finished when we wait it in `doProduce`.
           !buildPlan.executeBroadcast[HashedRelation]().value.keyIsUnique
     
         // Other joins types(semi, anti, existence) can at most produce one result row for one input
    -    // row from the streamed side, so no need to copy the result rows.
    +    // row from the streamed side.
         case _ => false
       }
     
    +  // If the streaming side needs to copy result, this join plan needs to copy too. Otherwise,
    +  // this join plan only needs to copy result if it may output multiple rows for one input.
    +  override def needCopyResult: Boolean =
    +    streamedPlan.asInstanceOf[CodegenSupport].needCopyResult || multipleOutputForOneInput
    --- End diff --
    
    For the regression test, I think we can have a streamedPlan which needs copying result (Sample with replacement or Expand, etc.), and performs a broadcast join which either an inner join where the build side has unique keys, or a semi join.
    
    So before this, `needCopyResult` is false, but after this, it is true.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19781#discussion_r151924692
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ---
    @@ -124,9 +124,11 @@ case class SortExec(
       // Name of sorter variable used in codegen.
       private var sorterVariable: String = _
     
    -  // The result rows come from the sort buffer, so this operator doesn't need to copy its result
    -  // even if its child does.
    -  override def needCopyResult: Boolean = false
    +  private var _needCopyResult: Option[Boolean] = None
    --- End diff --
    
    do you mean `doProduce` may not be called?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84024/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84027/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    @cloud-fan WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19781#discussion_r151966937
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ---
    @@ -83,11 +83,12 @@ case class BroadcastHashJoinExec(
           // them in a buffer, because these result rows share one UnsafeRow instance. Note that here
           // we wait for the broadcast to be finished, which is a no-op because it's already finished
           // when we wait it in `doProduce`.
    -      !buildPlan.executeBroadcast[HashedRelation]().value.keyIsUnique
    +      streamedPlan.asInstanceOf[CodegenSupport].needCopyResult ||
    +        !buildPlan.executeBroadcast[HashedRelation]().value.keyIsUnique
     
         // Other joins types(semi, anti, existence) can at most produce one result row for one input
         // row from the streamed side, so no need to copy the result rows.
    --- End diff --
    
    sorry I realized then we need to update the comment too... the final proposal:
    ```
    def multipleOutputForOneInput: Boolean = ...
    // If the streaming side needs copy result, this join plan needs to copy too. Otherwise, this join plan only needs to copy result if it may output multiple rows for one input.
    override def needCopyResult: Boolean = streamedPlan.asInstanceOf[CodegenSupport].needCopyResult || multipleOutputForOneInput
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84025/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84026/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect stream-side...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19781


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    I finally found the failure case by a simple query. I'll update soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect stream-side child'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84091/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19781#discussion_r151959078
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ---
    @@ -90,6 +90,10 @@ case class BroadcastHashJoinExec(
         case _ => false
    --- End diff --
    
    ah i see where is the problem. Even the join produce one row for one input, we still need to copy the result if child needs it. Since it's a broadcast join, we only need to check the streaming side?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    BTW can we add a regression test?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84027/testReport)** for PR 19781 at commit [`d65e5db`](https://github.com/apache/spark/commit/d65e5db00559af45f729cb6c2cc67a843aba4525).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19781#discussion_r151953815
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ---
    @@ -124,9 +124,11 @@ case class SortExec(
       // Name of sorter variable used in codegen.
       private var sorterVariable: String = _
     
    -  // The result rows come from the sort buffer, so this operator doesn't need to copy its result
    -  // even if its child does.
    -  override def needCopyResult: Boolean = false
    +  private var _needCopyResult: Option[Boolean] = None
    --- End diff --
    
    my bad, I rechecked again and I found we didn't need the change in `SortExec` and `HashAggregateExec`. I'll update soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84025 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84025/testReport)** for PR 19781 at commit [`da45016`](https://github.com/apache/spark/commit/da4501652c8cc0795e2ec587c7efa7312f20c514).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #83990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83990/testReport)** for PR 19781 at commit [`9797041`](https://github.com/apache/spark/commit/9797041aa9138386f26d1f6c259da302f918ab5d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19781#discussion_r151963108
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ---
    @@ -76,7 +76,7 @@ case class BroadcastHashJoinExec(
         streamedPlan.asInstanceOf[CodegenSupport].inputRDDs()
       }
     
    -  override def needCopyResult: Boolean = joinType match {
    +  private def checkNeedCopyResultFromJoinType: Boolean = joinType match {
    --- End diff --
    
    I think we can merge these 2 methods:
    ```
    case _: InnerLike | LeftOuter | RightOuter =>
      streamedPlan.asInstanceOf[CodegenSupport].needCopyResult || !buildPlan.executeBroadcast[HashedRelation]().value.keyIsUnique
    
    case _ => streamedPlan.asInstanceOf[CodegenSupport].needCopyResult
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84026/testReport)** for PR 19781 at commit [`aabbb9a`](https://github.com/apache/spark/commit/aabbb9ac09c96fb883ec88cdbdc1ad80b46c3328).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    **[Test build #84026 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84026/testReport)** for PR 19781 at commit [`aabbb9a`](https://github.com/apache/spark/commit/aabbb9ac09c96fb883ec88cdbdc1ad80b46c3328).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect stream-side child'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19781
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org