You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/02/07 02:52:20 UTC

[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/11106

    [SPARK-13225] [SQL] Support Intersect All/Distinct [WIP]

    In the SQL2003 Syntax, INTERSECT supports both ALL and DISTINCT
    ```
    <SELECT statement1>
    INTERSECT [ALL | DISTINCT]
    <SELECT statement2>
    ```
    
    This PR is to support both. 
       - [x] Enable both optional key words `ALL` and `DISTINCT` when parsing `INTERSECT` in SQL Parser;
       - [x] Add a new option `INTERSECT ALL `. It avoids adding extra `DISTINCT` above `Left-semi JOIN` after conversion from Intersect to `Left-semi JOIN`. 
       - [ ] Add the corresponding APIs of Dataframe and Dataset.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark intersectDistinct

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11106
    
----
commit 95afce5f02f86c1aa3cf22f81c79d8d9e46c1e0a
Author: gatorsmile <ga...@gmail.com>
Date:   2016-02-07T00:52:48Z

    "intersect all"

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181436898
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181489271
  
    **[Test build #50924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50924/consoleFull)** for PR 11106 at commit [`796e725`](https://github.com/apache/spark/commit/796e725955e95505a5c2108ee3691be8beecd8a7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180994216
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180994217
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50896/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-182650276
  
    Sure, let me close it first. Will continue to work on it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181908431
  
    **[Test build #50977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50977/consoleFull)** for PR 11106 at commit [`8a0a0b2`](https://github.com/apache/spark/commit/8a0a0b23ad1b553ad558adf61c3d73774b89e7db).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11106#discussion_r52114891
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -594,8 +594,9 @@ class Dataset[T] private[sql](
        * and thus is not affected by a custom `equals` function defined on `T`.
        * @since 1.6.0
        */
    -  def intersect(other: Dataset[T]): Dataset[T] = withPlan[T](other)(Intersect)
    -
    +  def intersect(other: Dataset[T]): Dataset[T] = withPlan[T](other){ (left, right) =>
    --- End diff --
    
    Nit: no-space between ){


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181164211
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180928225
  
    **[Test build #50885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50885/consoleFull)** for PR 11106 at commit [`95afce5`](https://github.com/apache/spark/commit/95afce5f02f86c1aa3cf22f81c79d8d9e46c1e0a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Intersect(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181227825
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11106#discussion_r52272979
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -223,16 +222,22 @@ object HiveTypeCoercion {
         def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
           case p if p.analyzed => p
     
    -      case s @ SetOperation(left, right) if s.childrenResolved &&
    +      case s @ Except(left, right) if s.childrenResolved &&
               left.output.length == right.output.length && !s.resolved =>
             val newChildren: Seq[LogicalPlan] = buildNewChildrenWithWiderTypes(left :: right :: Nil)
             assert(newChildren.length == 2)
    -        s.makeCopy(Array(newChildren.head, newChildren.last))
    +        Except(newChildren.head, newChildren.last)
    +
    +      case s @ Intersect(left, right, distinct) if s.childrenResolved &&
    +          left.output.length == right.output.length && !s.resolved =>
    +        val newChildren: Seq[LogicalPlan] = buildNewChildrenWithWiderTypes(left :: right :: Nil)
    --- End diff --
    
    Minor: there must be exactly two children in the result. We could just do a pattern match on the sequence and be done:
    
        val Seq(newLeft, newRight) = buildNewChildrenWithWiderTypes(left :: right :: Nil)
        Intersect(newLeft, newRight, distinct)
    
    The same applies to the `Except` case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181164076
  
    **[Test build #50906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50906/consoleFull)** for PR 11106 at commit [`796e725`](https://github.com/apache/spark/commit/796e725955e95505a5c2108ee3691be8beecd8a7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181150749
  
    **[Test build #50906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50906/consoleFull)** for PR 11106 at commit [`796e725`](https://github.com/apache/spark/commit/796e725955e95505a5c2108ee3691be8beecd8a7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181227828
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50911/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180982746
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180983890
  
    **[Test build #50896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50896/consoleFull)** for PR 11106 at commit [`95afce5`](https://github.com/apache/spark/commit/95afce5f02f86c1aa3cf22f81c79d8d9e46c1e0a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11106#discussion_r52114881
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1059,19 +1059,24 @@ object ReplaceDistinctWithAggregate extends Rule[LogicalPlan] {
      * {{{
      *   SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
      *   ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
    + *   SELECT a1, a2 FROM Tab1 INTERSECT ALL SELECT b1, b2 FROM Tab2
    + *   ==>  SELECT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
      * }}}
      *
    - * Note:
    - * 1. This rule is only applicable to INTERSECT DISTINCT. Do not use it for INTERSECT ALL.
    - * 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated
    + * Note: This rule has to be done after de-duplicating the attributes; otherwise, the generated
      *    join conditions will be incorrect.
      */
     object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case Intersect(left, right) =>
    +    case Intersect(left, right, distinct) =>
           assert(left.output.size == right.output.size)
           val joinCond = left.output.zip(right.output).map { case (l, r) => EqualNullSafe(l, r) }
    --- End diff --
    
    I think you actually use ```EqualNullSafe.tupled``` here and map over that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11106#discussion_r52272371
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1059,19 +1059,24 @@ object ReplaceDistinctWithAggregate extends Rule[LogicalPlan] {
      * {{{
      *   SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
      *   ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
    + *   SELECT a1, a2 FROM Tab1 INTERSECT ALL SELECT b1, b2 FROM Tab2
    + *   ==>  SELECT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
      * }}}
      *
    - * Note:
    - * 1. This rule is only applicable to INTERSECT DISTINCT. Do not use it for INTERSECT ALL.
    - * 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated
    + * Note: This rule has to be done after de-duplicating the attributes; otherwise, the generated
      *    join conditions will be incorrect.
      */
     object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case Intersect(left, right) =>
    +    case Intersect(left, right, distinct) =>
           assert(left.output.size == right.output.size)
    -      val joinCond = left.output.zip(right.output).map { case (l, r) => EqualNullSafe(l, r) }
    -      Distinct(Join(left, right, LeftSemi, joinCond.reduceLeftOption(And)))
    +      val joinCond = left.output.zip(right.output).map(EqualNullSafe.tupled)
    +      if (distinct) {
    +        Distinct(Join(left, right, LeftSemi, joinCond.reduceLeftOption(And)))
    --- End diff --
    
    Minor: you can move the `joinCond.reduceLeftOption(And)` into the construction of the `joinCond`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181945468
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-182603275
  
    I haven't actually looked at your pull request, but I'm fairly sure the implementation is wrong given the number of lines involved. The actual change is probably much larger to implement intersect all.
    
    Intersect all is actually not just a join. It is multisect intersect, e.g.
    
    [1, 2, 2] intersect [1, 2] == [1, 2]
    [1, 2, 2] intersect_all [1, 2] == [1, 2]
    [1, 2, 2] intersect_all [1, 2, 2] == [1, 2, 2]
    
    i.e. in order to support intersect all, we'd need to count the number of times each row appears.
    
    same thing with except all.
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180974351
  
    **[Test build #50891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50891/consoleFull)** for PR 11106 at commit [`95afce5`](https://github.com/apache/spark/commit/95afce5f02f86c1aa3cf22f81c79d8d9e46c1e0a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181540969
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50930/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile closed the pull request at:

    https://github.com/apache/spark/pull/11106


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181489822
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180972560
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180982584
  
    **[Test build #50891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50891/consoleFull)** for PR 11106 at commit [`95afce5`](https://github.com/apache/spark/commit/95afce5f02f86c1aa3cf22f81c79d8d9e46c1e0a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Intersect(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181227734
  
    **[Test build #50911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50911/consoleFull)** for PR 11106 at commit [`796e725`](https://github.com/apache/spark/commit/796e725955e95505a5c2108ee3691be8beecd8a7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181945076
  
    **[Test build #50977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50977/consoleFull)** for PR 11106 at commit [`8a0a0b2`](https://github.com/apache/spark/commit/8a0a0b23ad1b553ad558adf61c3d73774b89e7db).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181204620
  
    **[Test build #50911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50911/consoleFull)** for PR 11106 at commit [`796e725`](https://github.com/apache/spark/commit/796e725955e95505a5c2108ee3691be8beecd8a7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11106#discussion_r52121128
  
    --- Diff: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlParser.g ---
    @@ -2222,7 +2223,9 @@ setOperator
         : KW_UNION KW_ALL -> ^(TOK_UNIONALL)
         | KW_UNION KW_DISTINCT? -> ^(TOK_UNIONDISTINCT)
         | KW_EXCEPT -> ^(TOK_EXCEPT)
    -    | KW_INTERSECT -> ^(TOK_INTERSECT)
    +    | KW_INTERSECT (all=KW_ALL | distinct=KW_DISTINCT)?
    +    -> {$all == null}? ^(TOK_INTERSECTDISTINCT)
    +    ->                 ^(TOK_INTERSECTALL)
    --- End diff --
    
    Yeah, the default behavior of `INTERSECT` is `INTERSECT DISTINCT`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11106#discussion_r52316778
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -223,16 +222,22 @@ object HiveTypeCoercion {
         def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
           case p if p.analyzed => p
     
    -      case s @ SetOperation(left, right) if s.childrenResolved &&
    +      case s @ Except(left, right) if s.childrenResolved &&
               left.output.length == right.output.length && !s.resolved =>
             val newChildren: Seq[LogicalPlan] = buildNewChildrenWithWiderTypes(left :: right :: Nil)
             assert(newChildren.length == 2)
    -        s.makeCopy(Array(newChildren.head, newChildren.last))
    +        Except(newChildren.head, newChildren.last)
    +
    +      case s @ Intersect(left, right, distinct) if s.childrenResolved &&
    +          left.output.length == right.output.length && !s.resolved =>
    +        val newChildren: Seq[LogicalPlan] = buildNewChildrenWithWiderTypes(left :: right :: Nil)
    --- End diff --
    
    Thanks! Will do the changes. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181512242
  
    **[Test build #50930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50930/consoleFull)** for PR 11106 at commit [`afd1725`](https://github.com/apache/spark/commit/afd1725e815087244c753abe4198fa1746b00f9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181540611
  
    **[Test build #50930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50930/consoleFull)** for PR 11106 at commit [`afd1725`](https://github.com/apache/spark/commit/afd1725e815087244c753abe4198fa1746b00f9f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180982794
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181540966
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #11106: [SPARK-13225] [SQL] Support Intersect All/Distinct [WIP]

Posted by Tagar <gi...@git.apache.org>.
Github user Tagar commented on the issue:

    https://github.com/apache/spark/pull/11106
  
    another possible way to implement INTERSECT ALL
    https://issues.apache.org/jira/browse/SPARK-21274


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181150882
  
    @marmbrus @rxin I want to get your opinions first before adding the corresponding Dataframe and Dataset APIs.
    
    Now, `unionall` is changed to `union`, whose default behavior is `UNION ALL`. The default behavior of `intersect` is `INTERSECT DISTINCT`. I think we are facing three options:
      - Add a new API named `intersectall`
      - Add an optional boolean parameter `distinct` into the existing `intersect` API, like the APIs `sample` and `repartition`
      - Do not add any API.
     
    Thank you!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181945469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50977/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181199189
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180913304
  
    **[Test build #50885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50885/consoleFull)** for PR 11106 at commit [`95afce5`](https://github.com/apache/spark/commit/95afce5f02f86c1aa3cf22f81c79d8d9e46c1e0a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181164212
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50906/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-182633769
  
    Thank you! Will do it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-182631369
  
    Uh, you are right. : ) Will follow your suggestions. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11106#discussion_r52121131
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1059,19 +1059,24 @@ object ReplaceDistinctWithAggregate extends Rule[LogicalPlan] {
      * {{{
      *   SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
      *   ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
    + *   SELECT a1, a2 FROM Tab1 INTERSECT ALL SELECT b1, b2 FROM Tab2
    + *   ==>  SELECT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
      * }}}
      *
    - * Note:
    - * 1. This rule is only applicable to INTERSECT DISTINCT. Do not use it for INTERSECT ALL.
    - * 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated
    + * Note: This rule has to be done after de-duplicating the attributes; otherwise, the generated
      *    join conditions will be incorrect.
      */
     object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case Intersect(left, right) =>
    +    case Intersect(left, right, distinct) =>
           assert(left.output.size == right.output.size)
           val joinCond = left.output.zip(right.output).map { case (l, r) => EqualNullSafe(l, r) }
    --- End diff --
    
    Sure, will change it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11106#discussion_r52316817
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1059,19 +1059,24 @@ object ReplaceDistinctWithAggregate extends Rule[LogicalPlan] {
      * {{{
      *   SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
      *   ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
    + *   SELECT a1, a2 FROM Tab1 INTERSECT ALL SELECT b1, b2 FROM Tab2
    + *   ==>  SELECT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
      * }}}
      *
    - * Note:
    - * 1. This rule is only applicable to INTERSECT DISTINCT. Do not use it for INTERSECT ALL.
    - * 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated
    + * Note: This rule has to be done after de-duplicating the attributes; otherwise, the generated
      *    join conditions will be incorrect.
      */
     object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case Intersect(left, right) =>
    +    case Intersect(left, right, distinct) =>
           assert(left.output.size == right.output.size)
    -      val joinCond = left.output.zip(right.output).map { case (l, r) => EqualNullSafe(l, r) }
    -      Distinct(Join(left, right, LeftSemi, joinCond.reduceLeftOption(And)))
    +      val joinCond = left.output.zip(right.output).map(EqualNullSafe.tupled)
    +      if (distinct) {
    +        Distinct(Join(left, right, LeftSemi, joinCond.reduceLeftOption(And)))
    --- End diff --
    
    Yeah, let me change it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180982798
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50891/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181489828
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50924/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-182631745
  
    In terms of API, I think we should just add intersectAll and exceptAll functions to it. 
    
    For union, we should keep the existing behavior, and if users want to do union distinct, they can just do union().distinct().
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-181443249
  
    **[Test build #50924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50924/consoleFull)** for PR 11106 at commit [`796e725`](https://github.com/apache/spark/commit/796e725955e95505a5c2108ee3691be8beecd8a7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180994189
  
    **[Test build #50896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50896/consoleFull)** for PR 11106 at commit [`95afce5`](https://github.com/apache/spark/commit/95afce5f02f86c1aa3cf22f81c79d8d9e46c1e0a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Intersect(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11106#discussion_r52114742
  
    --- Diff: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlParser.g ---
    @@ -2222,7 +2223,9 @@ setOperator
         : KW_UNION KW_ALL -> ^(TOK_UNIONALL)
         | KW_UNION KW_DISTINCT? -> ^(TOK_UNIONDISTINCT)
         | KW_EXCEPT -> ^(TOK_EXCEPT)
    -    | KW_INTERSECT -> ^(TOK_INTERSECT)
    +    | KW_INTERSECT (all=KW_ALL | distinct=KW_DISTINCT)?
    +    -> {$all == null}? ^(TOK_INTERSECTDISTINCT)
    +    ->                 ^(TOK_INTERSECTALL)
    --- End diff --
    
    So distinct is optional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-182634587
  
    And can we close this pr and only open it when you have a new version? Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180928249
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50885/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-180928248
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org