You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2014/08/24 22:35:19 UTC

[GitHub] spark pull request: [WIP][SPARK-3194][SQL] Add AttributeSet to fix...

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/2109

    [WIP][SPARK-3194][SQL] Add AttributeSet to fix bugs with invalid comparisons of AttributeReferences

    It is common to want to describe sets of attributes that are in various parts of a query plan.  However, the semantics of putting `AttributeReference` objects into a standard Scala `Set` result in subtle bugs when references differ cosmetically.  For example, with case insensitive resolution it is possible to have two references to the same attribute whose names are not equal.  
    
    In this PR I introduce a new abstraction, an `AttributeSet`, which performs all comparisons using the globally unique `ExpressionId` instead of case class equality.  (There is already a related class, `AttributeMap`)  This new type of set is used to fix a bug in the optimizer where needed attributes were getting projected away underneath join operators.
    
    I also took this opportunity to refactor the expression and query plan base classes.  In all but one instance the logic for computing the `references` of an `Expression` were the same.  Thus, I moved this logic into the base class.
    
    For query plans the semantics of  the `references` method were ill defined (is the the references output, or used by expression evaluation, or what?).  As a result, this method wasn't really used very much.  So, I removed it.
    
    TODO:
     - [ ] Finish scala doc for `AttributeSet`
     - [ ] Scan the code for other instances of `Set[Attribute]` and refactor them.
     - [ ] Finish removing `references` from `QueryPlan`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark attributeSets

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2109.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2109
    
----
commit 94e2cfc0a89a249dc5660e28e4f6a3c53ddbd83a
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-08-14T04:42:27Z

    WIP

commit c41916565ed681dad4d11a4f5e50c458adc89078
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-08-14T17:29:46Z

    WIP

commit 7a09400b6fd6f23edc30ee040a380e5e18399dc3
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-08-14T19:50:28Z

    WIP

commit c27f33f133c5fe199307d891f3811599a555b4d3
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-08-24T19:43:05Z

    Merge remote-tracking branch 'origin/master' into attributeSets

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-3194][SQL] Add AttributeSet to fix...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53209466
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19128/consoleFull) for   PR 2109 at commit [`d6e16be`](https://github.com/apache/spark/commit/d6e16bebd519a9ebeb045292ede936e4782a3df9).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53467696
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19223/consoleFull) for   PR 2109 at commit [`40ce7f6`](https://github.com/apache/spark/commit/40ce7f66e806c6621ebb69269caaae4b38d5fb50).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-3194][SQL] Add AttributeSet to fix...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53206312
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19127/consoleFull) for   PR 2109 at commit [`fc26b49`](https://github.com/apache/spark/commit/fc26b49f1f46561a16effc6f3b6334d1024bb644).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53507631
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19244/consoleFull) for   PR 2109 at commit [`1c0dae5`](https://github.com/apache/spark/commit/1c0dae57e29a9723d7614e98a1b5084460117a6b).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `protected class AttributeEquals(val a: Attribute) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53374032
  
    **Tests timed out** after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-3194][SQL] Add AttributeSet to fix...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53209493
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19128/consoleFull) for   PR 2109 at commit [`d6e16be`](https://github.com/apache/spark/commit/d6e16bebd519a9ebeb045292ede936e4782a3df9).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class AttributeEquals(val a: Attribute) `
      * `class AttributeSet(val baseSet: Set[AttributeEquals]) extends Traversable[Attribute] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2109#discussion_r16740921
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala ---
    @@ -0,0 +1,96 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +protected class AttributeEquals(val a: Attribute) {
    +  override def hashCode() = a.exprId.hashCode()
    +  override def equals(other: Any) = other match {
    +    case otherReference: AttributeEquals => a.exprId == otherReference.a.exprId
    +    case otherAttribute => false
    +  }
    +}
    +
    +object AttributeSet {
    +  /** Constructs a new [[AttributeSet]] given a sequence of [[Attribute Attributes]]. */
    +  def apply(baseSet: Seq[Attribute]) = {
    +    new AttributeSet(baseSet.map(new AttributeEquals(_)).toSet)
    +  }
    +}
    +
    +/**
    + * A Set designed to hold [[AttributeReference]] objects, that performs equality checking using
    + * expression id instead of standard java equality.  Using expression id means that these
    + * sets will correctly test for membership, even when the AttributeReferences in question differ
    + * cosmetically (e.g., the names have different capitalizations).
    + */
    +class AttributeSet protected (val baseSet: Set[AttributeEquals]) extends Traversable[Attribute] {
    --- End diff --
    
    in that case i think private works too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-3194][SQL] Add AttributeSet to fix...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53206395
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19127/consoleFull) for   PR 2109 at commit [`fc26b49`](https://github.com/apache/spark/commit/fc26b49f1f46561a16effc6f3b6334d1024bb644).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    $FWDIR/bin/spark-submit --class org.apache.spark.repl.Main "$`
      * `    $FWDIR/bin/spark-submit --class org.apache.spark.repl.Main "$`
      * `class AttributeEquals(val a: Attribute) `
      * `class AttributeSet(val baseSet: Set[AttributeEquals]) extends Traversable[Attribute] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53500213
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19244/consoleFull) for   PR 2109 at commit [`1c0dae5`](https://github.com/apache/spark/commit/1c0dae57e29a9723d7614e98a1b5084460117a6b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2109#discussion_r16693854
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala ---
    @@ -39,6 +39,7 @@ case class UnresolvedRelation(
         alias: Option[String] = None) extends LeafNode {
       override def output = Nil
       override lazy val resolved = false
    +  def reference = Set.empty
    --- End diff --
    
    references?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2109


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53508041
  
    Merging in master & branch-1.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53467319
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53472603
  
    LGTM. Please add the answers to my questions directly into the code documentation :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53476891
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19223/consoleFull) for   PR 2109 at commit [`40ce7f6`](https://github.com/apache/spark/commit/40ce7f66e806c6621ebb69269caaae4b38d5fb50).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `protected class AttributeEquals(val a: Attribute) `
      * `class AttributeSet protected (val baseSet: Set[AttributeEquals]) extends Traversable[Attribute] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53467276
  
    > What if we just override equals in Attribute?
    
    This is actually how things were implemented originally.  However, its really weird when `AttributeReference("a"...) == AttrributeReference("b", ...)`.  It breaks tests, and also makes doing transformations hard (we always try keep older trees instead of new ones when the transformation was a no-op).
    
    > Does AttributeSet provide anything that Scala's Set doesn't provide?
    
    No, it is actually a subset, this class is just encapsulating a very common pattern where you create sets that contain expression ids.  Furthermore, since an AttributeSet is not a Set[Attribute] you get nice type errors in many cases where you are trying to do semantically invalid things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53366907
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2109#discussion_r16740664
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala ---
    @@ -0,0 +1,96 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +protected class AttributeEquals(val a: Attribute) {
    +  override def hashCode() = a.exprId.hashCode()
    +  override def equals(other: Any) = other match {
    +    case otherReference: AttributeEquals => a.exprId == otherReference.a.exprId
    +    case otherAttribute => false
    +  }
    +}
    +
    +object AttributeSet {
    +  /** Constructs a new [[AttributeSet]] given a sequence of [[Attribute Attributes]]. */
    +  def apply(baseSet: Seq[Attribute]) = {
    +    new AttributeSet(baseSet.map(new AttributeEquals(_)).toSet)
    +  }
    +}
    +
    +/**
    + * A Set designed to hold [[AttributeReference]] objects, that performs equality checking using
    + * expression id instead of standard java equality.  Using expression id means that these
    + * sets will correctly test for membership, even when the AttributeReferences in question differ
    + * cosmetically (e.g., the names have different capitalizations).
    + */
    +class AttributeSet protected (val baseSet: Set[AttributeEquals]) extends Traversable[Attribute] {
    --- End diff --
    
    There is really no reason for users to be exposed to AttributeEquals.  They should instead use the factory method so we can change the implementation if we want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-3194][SQL] Add AttributeSet to fix...

Posted by chuxi <gi...@git.apache.org>.
Github user chuxi commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53225175
  
    I think references is used for transformExpression in QueryPlan to make sure the same attribute just traversed once when Analyzer is explaining the Attributes in a TreeNode (especially start to recursive from root treenode). 
    
    Failed two times = = ......momo



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2109#discussion_r16740691
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala ---
    @@ -39,6 +39,7 @@ case class UnresolvedRelation(
         alias: Option[String] = None) extends LeafNode {
       override def output = Nil
       override lazy val resolved = false
    +  def reference = Set.empty
    --- End diff --
    
    Good catch, this should just be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2109#discussion_r16693867
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala ---
    @@ -0,0 +1,96 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +protected class AttributeEquals(val a: Attribute) {
    +  override def hashCode() = a.exprId.hashCode()
    +  override def equals(other: Any) = other match {
    +    case otherReference: AttributeEquals => a.exprId == otherReference.a.exprId
    +    case otherAttribute => false
    +  }
    +}
    +
    +object AttributeSet {
    +  /** Constructs a new [[AttributeSet]] given a sequence of [[Attribute Attributes]]. */
    +  def apply(baseSet: Seq[Attribute]) = {
    +    new AttributeSet(baseSet.map(new AttributeEquals(_)).toSet)
    +  }
    +}
    +
    +/**
    + * A Set designed to hold [[AttributeReference]] objects, that performs equality checking using
    + * expression id instead of standard java equality.  Using expression id means that these
    + * sets will correctly test for membership, even when the AttributeReferences in question differ
    + * cosmetically (e.g., the names have different capitalizations).
    + */
    +class AttributeSet protected (val baseSet: Set[AttributeEquals]) extends Traversable[Attribute] {
    --- End diff --
    
    why protected?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53367367
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19189/consoleFull) for   PR 2109 at commit [`40ce7f6`](https://github.com/apache/spark/commit/40ce7f66e806c6621ebb69269caaae4b38d5fb50).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3194][SQL] Add AttributeSet to fix bugs...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2109#issuecomment-53370607
  
    Hey @marmbrus,
    
    Two questions:
    
    1. What if we just override equals in Attribute?
    
    2. Does AttributeSet provide anything that Scala's Set doesn't provide?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org