You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2014/08/27 07:56:34 UTC

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/2155

    [SPARK-3198] [SQL] Remove the TreeNode.id

    Thus id property of the TreeNode API does save time in a faster way to compare 2 TreeNodes, it is kind of performance bottleneck during the expression object creation in a multi-threading env (because of the memory barrier). 
    Fortunately, the tree node comparison only happen once in master, so even we remove it, the entire performance will not be affected.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark treenode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2155.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2155
    
----
commit bec42d23fa6d3c3f956a1fd291d76e0919daf7d7
Author: Cheng Hao <ha...@intel.com>
Date:   2014-08-27T05:39:03Z

    Remove the TreeNode.id

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877707
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -34,28 +29,13 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] {
       def children: Seq[BaseType]
     
       /**
    -   * A globally unique id for this specific instance. Not preserved across copies.
    -   * Unlike `equals`, `id` can be used to differentiate distinct but structurally
    -   * identical branches of a tree.
    -   */
    -  val id = TreeNode.nextId()
    -
    -  /**
    -   * Returns true if other is the same [[catalyst.trees.TreeNode TreeNode]] instance.  Unlike
    -   * `equals` this function will return false for different instances of structurally identical
    -   * trees.
    -   */
    -  def sameInstance(other: TreeNode[_]): Boolean = {
    -    this.id == other.id
    -  }
    -
    -  /**
        * Faster version of equality which short-circuits when two treeNodes are the same instance.
        * We don't just override Object.Equals, as doing so prevents the scala compiler from from
        * generating case class `equals` methods
        */
       def fastEquals(other: TreeNode[_]): Boolean = {
    -    sameInstance(other) || this == other
    +    // TODO we need to figure how to do that in a faster way.
    --- End diff --
    
    Yes, exactly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16801142
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/pythonUdfs.scala ---
    @@ -98,7 +98,7 @@ private[spark] object ExtractPythonUdfs extends Rule[LogicalPlan] {
             logical.Project(
               l.output,
               l.transformExpressions {
    -            case p: PythonUDF if p.id == udf.id => evaluation.resultAttribute
    +            case p: PythonUDF if p == udf => evaluation.resultAttribute
    --- End diff --
    
    This change is probably actually okay.  If there are multiple copies of the exact same UDF we can just compute the answer once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877653
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -34,28 +29,13 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] {
       def children: Seq[BaseType]
     
       /**
    -   * A globally unique id for this specific instance. Not preserved across copies.
    -   * Unlike `equals`, `id` can be used to differentiate distinct but structurally
    -   * identical branches of a tree.
    -   */
    -  val id = TreeNode.nextId()
    -
    -  /**
    -   * Returns true if other is the same [[catalyst.trees.TreeNode TreeNode]] instance.  Unlike
    -   * `equals` this function will return false for different instances of structurally identical
    -   * trees.
    -   */
    -  def sameInstance(other: TreeNode[_]): Boolean = {
    -    this.id == other.id
    -  }
    -
    -  /**
        * Faster version of equality which short-circuits when two treeNodes are the same instance.
        * We don't just override Object.Equals, as doing so prevents the scala compiler from from
        * generating case class `equals` methods
        */
       def fastEquals(other: TreeNode[_]): Boolean = {
    -    sameInstance(other) || this == other
    +    // TODO we need to figure how to do that in a faster way.
    --- End diff --
    
    We have a faster way, use `eq`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53529888
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19292/consoleFull) for   PR 2155 at commit [`bec42d2`](https://github.com/apache/spark/commit/bec42d23fa6d3c3f956a1fd291d76e0919daf7d7).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877346
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -82,7 +82,7 @@ abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanTy
     
         @inline def transformExpressionUp(e: Expression) = {
           val newE = e.transformUp(rule)
    -      if (newE.id != e.id && newE != e) {
    +      if (newE != e) {
    --- End diff --
    
    same


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877967
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala ---
    @@ -51,7 +51,6 @@ class TreeNodeSuite extends FunSuite {
         val after = before transform { case Literal(5, _) => Literal(1)}
     
         assert(before === after)
    -    assert(before.map(_.id) === after.map(_.id))
    --- End diff --
    
    Cool.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53688875
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19390/consoleFull) for   PR 2155 at commit [`5873415`](https://github.com/apache/spark/commit/5873415ec1e0cd670adf144144eaf8060e412503).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53828497
  
    Thank you @marmbrus I've updated the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53595667
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19312/consoleFull) for   PR 2155 at commit [`bec42d2`](https://github.com/apache/spark/commit/bec42d23fa6d3c3f956a1fd291d76e0919daf7d7).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877725
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/pythonUdfs.scala ---
    @@ -98,7 +98,7 @@ private[spark] object ExtractPythonUdfs extends Rule[LogicalPlan] {
             logical.Project(
               l.output,
               l.transformExpressions {
    -            case p: PythonUDF if p.id == udf.id => evaluation.resultAttribute
    +            case p: PythonUDF if p.eq(udf) => evaluation.resultAttribute
    --- End diff --
    
    fastEquals?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53832709
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19445/consoleFull) for   PR 2155 at commit [`7cf2cd2`](https://github.com/apache/spark/commit/7cf2cd2fff0ad1f63a9100b0fe8c319077a4467a).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class TreeNodeRef(val obj: TreeNode[_]) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53700780
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19390/consoleFull) for   PR 2155 at commit [`5873415`](https://github.com/apache/spark/commit/5873415ec1e0cd670adf144144eaf8060e412503).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  implicit class TreeNodeRef(val obj: TreeNode[_]) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53534171
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19292/consoleFull) for   PR 2155 at commit [`bec42d2`](https://github.com/apache/spark/commit/bec42d23fa6d3c3f956a1fd291d76e0919daf7d7).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `rem In this case, leave out the main class (org.apache.spark.deploy.SparkSubmit) and use our own.`
      * `"$FWDIR"/bin/spark-submit --class $CLASS "$`
      * `class ExternalSorter(object):`
      * `"$FWDIR"/bin/spark-submit --class $CLASS "$`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877359
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -34,28 +29,13 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] {
       def children: Seq[BaseType]
     
       /**
    -   * A globally unique id for this specific instance. Not preserved across copies.
    -   * Unlike `equals`, `id` can be used to differentiate distinct but structurally
    -   * identical branches of a tree.
    -   */
    -  val id = TreeNode.nextId()
    -
    -  /**
    -   * Returns true if other is the same [[catalyst.trees.TreeNode TreeNode]] instance.  Unlike
    -   * `equals` this function will return false for different instances of structurally identical
    -   * trees.
    -   */
    -  def sameInstance(other: TreeNode[_]): Boolean = {
    -    this.id == other.id
    -  }
    -
    -  /**
        * Faster version of equality which short-circuits when two treeNodes are the same instance.
        * We don't just override Object.Equals, as doing so prevents the scala compiler from from
        * generating case class `equals` methods
        */
       def fastEquals(other: TreeNode[_]): Boolean = {
    --- End diff --
    
    We can just get rid of this it is the same as `eq`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877343
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -50,7 +50,7 @@ abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanTy
     
         @inline def transformExpressionDown(e: Expression) = {
           val newE = e.transformDown(rule)
    -      if (newE.id != e.id && newE != e) {
    +      if (newE != e) {
    --- End diff --
    
    Add back in an `eq` check to avoid the expensive equality checking when the same object is returned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16857315
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/package.scala ---
    @@ -37,4 +37,15 @@ package object trees extends Logging {
       // Since we want tree nodes to be lightweight, we create one logger for all treenode instances.
       protected override def logName = "catalyst.trees"
     
    +  /**
    +   * A [[TreeNode]] companion for reference equality for Hash based Collection.
    +   */
    +  implicit class TreeNodeRef(val obj: TreeNode[_]) {
    --- End diff --
    
    Why is this an implicit class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53632983
  
    Thanks for working on this.  I'm concerned that in many cases we are actually changing the semantics here.  Instead of replacing places where we use `id` to just use `==` I think we actually want to be using `eq` and `ne` which are the scala operations for reference equality (i.e. are these two objects actually the same object, stored at the same place in memory).
    
    For example:
    ```scala
    scala> case class A(x: Int)
    defined class A
    
    scala> val a1 = A(1)
    a1: A = A(1)
    
    scala> val a2 = A(1)
    a2: A = A(1)
    
    scala> a1 == a2
    res0: Boolean = true
    
    scala> a1 eq a2
    res1: Boolean = false
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877748
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/pythonUdfs.scala ---
    @@ -98,7 +98,7 @@ private[spark] object ExtractPythonUdfs extends Rule[LogicalPlan] {
             logical.Project(
               l.output,
               l.transformExpressions {
    -            case p: PythonUDF if p.id == udf.id => evaluation.resultAttribute
    +            case p: PythonUDF if p.eq(udf) => evaluation.resultAttribute
    --- End diff --
    
    fastEquals would also be okay, thought this is not a critical path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53580777
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53688906
  
    Thank you @marmbrus , you're right. I've updated the code by providing a new class called `TreeNodeRef` which is a wrapper simply re-implement the `equals` and `hashCode` methods for `TreeNode`, and it aims to replace the `id` where be used as the key in HashSet/HashMap.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53581468
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19312/consoleFull) for   PR 2155 at commit [`bec42d2`](https://github.com/apache/spark/commit/bec42d23fa6d3c3f956a1fd291d76e0919daf7d7).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877237
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala ---
    @@ -51,7 +51,6 @@ class TreeNodeSuite extends FunSuite {
         val after = before transform { case Literal(5, _) => Literal(1)}
     
         assert(before === after)
    -    assert(before.map(_.id) === after.map(_.id))
    --- End diff --
    
    Instead of deleting this maybe something like: 
    
    ```scala
    before.map(identity[Expression]).zip(after.map(identity[Expression])).foreach { case (b, a) => assert(b eq a) }
    ```
    
    Perhaps with a comment: `// Ensure that the object after are the same objects before the transformation`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53529874
  
    This is a follow up of #2114 after discussing with @marmbrus


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16801212
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala ---
    @@ -43,10 +43,10 @@ package object debug {
       implicit class DebugQuery(query: SchemaRDD) {
         def debug(): Unit = {
           val plan = query.queryExecution.executedPlan
    -      val visited = new collection.mutable.HashSet[Long]()
    +      val visited = new collection.mutable.HashSet[SparkPlan]()
           val debugPlan = plan transform {
    -        case s: SparkPlan if !visited.contains(s.id) =>
    -          visited += s.id
    +        case s: SparkPlan if !visited.contains(s) =>
    --- End diff --
    
    Here is an example problem: If we have a tree with a join where the two sides of the join are structurally identical then we are never going to traverse one of the sides.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2155


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53828732
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19445/consoleFull) for   PR 2155 at commit [`7cf2cd2`](https://github.com/apache/spark/commit/7cf2cd2fff0ad1f63a9100b0fe8c319077a4467a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877263
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/package.scala ---
    @@ -37,4 +37,15 @@ package object trees extends Logging {
       // Since we want tree nodes to be lightweight, we create one logger for all treenode instances.
       protected override def logName = "catalyst.trees"
     
    +  /**
    +   * A [[TreeNode]] companion for reference equality for Hash based Collection.
    +   */
    +  implicit class TreeNodeRef(val obj: TreeNode[_]) {
    --- End diff --
    
    Hm, I see.  I think in this case I'd prefer to be a little more verbose.  A general rule is that you should only use implicit conversions to extend functionality, but never to perform subtle type conversions.  These can lead to really confusing errors when things go wrong.  You might be safe here, but I've used implicits many times counter to this rule and I almost always end up regretting it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877382
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/pythonUdfs.scala ---
    @@ -98,7 +98,7 @@ private[spark] object ExtractPythonUdfs extends Rule[LogicalPlan] {
             logical.Project(
               l.output,
               l.transformExpressions {
    -            case p: PythonUDF if p.id == udf.id => evaluation.resultAttribute
    +            case p: PythonUDF if p.eq(udf) => evaluation.resultAttribute
    --- End diff --
    
    This one actually can just be ==.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2155#issuecomment-53937341
  
    Thanks! I've merged this to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2155#discussion_r16877080
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/package.scala ---
    @@ -37,4 +37,15 @@ package object trees extends Logging {
       // Since we want tree nodes to be lightweight, we create one logger for all treenode instances.
       protected override def logName = "catalyst.trees"
     
    +  /**
    +   * A [[TreeNode]] companion for reference equality for Hash based Collection.
    +   */
    +  implicit class TreeNodeRef(val obj: TreeNode[_]) {
    --- End diff --
    
    Just for save some code for like
    ```
    val resultMap: Map[TreeNodeRef, Expression] = ...
    ...
    resultMap.contains(e)  // e is the expression, but the implicit class will convert it into TreeNodeRef
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org