You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BoleynSu <gi...@git.apache.org> on 2017/08/03 17:57:20 UTC

[GitHub] spark pull request #18836: Update SortMergeJoinExec.scala

GitHub user BoleynSu opened a pull request:

    https://github.com/apache/spark/pull/18836

    Update SortMergeJoinExec.scala

    fix a bug in outputOrdering
    
    ## What changes were proposed in this pull request?
    
    Change `case Inner` to `case _: InnerLike` so that Cross will be handled properly.
    
    ## How was this patch tested?
    
    No unit tests are needed.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BoleynSu/spark patch-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18836.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18836
    
----
commit 54ba6bedb36e57698404dbaccb7d5639bf64770c
Author: Boleyn Su <bo...@gmail.com>
Date:   2017-08-03T17:55:26Z

    Update SortMergeJoinExec.scala
    
    fix a bug in outputOrdering

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18836: Update SortMergeJoinExec.scala

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18836#discussion_r131596479
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -82,7 +82,7 @@ case class SortMergeJoinExec(
     
       override def outputOrdering: Seq[SortOrder] = joinType match {
         // For inner join, orders of both sides keys should be kept.
    -    case Inner =>
    +    case _: InnerLike =>
    --- End diff --
    
    Yeah that makes sense


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18836: Update SortMergeJoinExec.scala

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18836#discussion_r131239029
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -82,7 +82,7 @@ case class SortMergeJoinExec(
     
       override def outputOrdering: Seq[SortOrder] = joinType match {
         // For inner join, orders of both sides keys should be kept.
    -    case Inner =>
    +    case _: InnerLike =>
    --- End diff --
    
    Can someone explain to me what is being fixed here? The other `InnerLike` variant, `Cross`, does not get planned using a `SortMergeJoin`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18836: Update SortMergeJoinExec.scala

Posted by BoleynSu <gi...@git.apache.org>.
Github user BoleynSu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18836#discussion_r131316095
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -82,7 +82,7 @@ case class SortMergeJoinExec(
     
       override def outputOrdering: Seq[SortOrder] = joinType match {
         // For inner join, orders of both sides keys should be kept.
    -    case Inner =>
    +    case _: InnerLike =>
    --- End diff --
    
    I think we can get a SortMergeJoin paln with Cross, e.g.  `select distinct a.i + 1,a.* from T a cross join T t where a.i > 1 and t.i = a.i group by a.i having a.i > 2`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18836: Update SortMergeJoinExec.scala

Posted by BoleynSu <gi...@git.apache.org>.
Github user BoleynSu commented on the issue:

    https://github.com/apache/spark/pull/18836
  
    @gatorsmile I am not familiar with the PR process, it is great that you can take it over. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18836: Update SortMergeJoinExec.scala

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18836#discussion_r131557287
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -82,7 +82,7 @@ case class SortMergeJoinExec(
     
       override def outputOrdering: Seq[SortOrder] = joinType match {
         // For inner join, orders of both sides keys should be kept.
    -    case Inner =>
    +    case _: InnerLike =>
    --- End diff --
    
    Even worse, this could cause an exception
    ```Scala
    val df = Seq((1, 1)).toDF("i", "j")
    df.createOrReplaceTempView("T")
    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
      sql("select * from (select a.i from T a cross join T t where t.i = a.i) as t1 " +
        "cross join T t2 where t2.i = t1.i").explain(true)
    }
    ```
    It will return the following error:
    ```
    SortMergeJoinExec should not take Cross as the JoinType
    java.lang.IllegalArgumentException: SortMergeJoinExec should not take Cross as the JoinType
    	at org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputOrdering(SortMergeJoinExec.scala:100)
    	at org.apache.spark.sql.execution.ProjectExec
    ```
    
    We need to backport it to 2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18836: Update SortMergeJoinExec.scala

Posted by BoleynSu <gi...@git.apache.org>.
Github user BoleynSu commented on the issue:

    https://github.com/apache/spark/pull/18836
  
    A test case to make the existing code fail.
    @srowen I am sorry that this pull request is not well formatted but I just want to help.
    ```scala
    import org.apache.spark.sql.SparkSession
    
    object Test extends App {
      val spark = SparkSession.builder().master("local").appName("test").getOrCreate()
      import spark.sqlContext.implicits._
      case class T(i: Int)
      spark.sparkContext.parallelize(List(T(1), T(3), T(3))).toDF.createOrReplaceTempView("T")
      val in = "select distinct a.i + 1,a.* from T a cross join T t where a.i > 1 and t.i = a.i group by a.i having a.i > 2"
      val sql = spark.sql(in)
      sql.queryExecution.executedPlan.children.map { x =>
        x.children.map { x =>
          x.children.map { x =>
            x.children.map { x =>
              x.children.map { x =>
                x.children.map { x =>
                  println(x.outputOrdering)
                }
              }
            }
          }
        }
      }
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18836: Update SortMergeJoinExec.scala

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18836
  
    Thanks for fixing this. Please follow the contribution guideline. 
    
    Also, you need to add a test case. You can follow what we did in this PR: https://github.com/apache/spark/pull/17339


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18836: Update SortMergeJoinExec.scala

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18836
  
    You didn't read the link above, I take it?
     http://spark.apache.org/contributing.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18836: Update SortMergeJoinExec.scala

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18836
  
    @BoleynSu Do you want to continue the PR? or you want us to take it over?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18836: Update SortMergeJoinExec.scala

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18836
  
    @BoleynSu Sure, I can do it. Will give all the credits to you. Please continue to help us report new issues or fixes. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18836: Update SortMergeJoinExec.scala

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18836


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18836: Update SortMergeJoinExec.scala

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18836
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org