You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2014/08/25 10:36:56 UTC

[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/2114

    [SPARK-3198] [SQL] Generate the expression id when necessary

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark patch-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2114.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2114
    
----
commit 83531bcd8bfb408ad650d6e7789a345f8af405e8
Author: Cheng Hao <ha...@intel.com>
Date:   2014-08-25T08:35:56Z

    Generate the expression id when necessary

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2114#issuecomment-53247383
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19137/consoleFull) for   PR 2114 at commit [`83531bc`](https://github.com/apache/spark/commit/83531bcd8bfb408ad650d6e7789a345f8af405e8).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    $FWDIR/bin/spark-submit --class org.apache.spark.repl.Main "$`
      * `    $FWDIR/bin/spark-submit --class org.apache.spark.repl.Main "$`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2114#discussion_r16689350
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -38,7 +38,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] {
        * Unlike `equals`, `id` can be used to differentiate distinct but structurally
        * identical branches of a tree.
        */
    -  val id = TreeNode.nextId()
    +  @transient lazy val id = TreeNode.nextId()
    --- End diff --
    
    Marking this as transient and lazy means that this could change on the slaves which is a little weird.  Honestly I think the correct change here is to get rid of this id entirely and change usages of it to do reference equality instead (`eq`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2114#issuecomment-53240426
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19137/consoleFull) for   PR 2114 at commit [`83531bc`](https://github.com/apache/spark/commit/83531bcd8bfb408ad650d6e7789a345f8af405e8).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/2114#issuecomment-53517245
  
    OK, I will try that. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2114#discussion_r16722823
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -38,7 +38,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] {
        * Unlike `equals`, `id` can be used to differentiate distinct but structurally
        * identical branches of a tree.
        */
    -  val id = TreeNode.nextId()
    +  @transient lazy val id = TreeNode.nextId()
    --- End diff --
    
    Yes, you're right, we need to think about the id usage, but currently it is the workaround for performance. I noticed that the aggregation performance is not so good because large number of `AggregateFunction` objects were created during the execution on slave, you know the `AggregationFunction` is a sub class of `TreeNode`, actually the id generation here is the bottleneck in multithreading env(because of the memory barrier in a multi-core system).
    
    On the other hand, I don't think we have the logic to call the `eq` of an expression object during the execution time on slaves directly or indirectly. Those calls are supposed to be done in master(for example in logical plan analysis & optimization).
    
    So I am OK to merge this PR for the quick fix or rewrite the code for getting ride of the id entirely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel closed the pull request at:

    https://github.com/apache/spark/pull/2114


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2114#issuecomment-53514183
  
    Yeah, if you want to take a crack at removing it that would cool.  If you run into problems let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/2114#issuecomment-53513975
  
    Thank you @marmbrus , I will close this PR. Let me know if I can help on the TreeNode id stuff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2114#discussion_r16733513
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -38,7 +38,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] {
        * Unlike `equals`, `id` can be used to differentiate distinct but structurally
        * identical branches of a tree.
        */
    -  val id = TreeNode.nextId()
    +  @transient lazy val id = TreeNode.nextId()
    --- End diff --
    
    I would consider generating new TreeNodes in the critical path a bug, and any places where we do that should be removed (as you have done in other PRs).
    
    As far as I know, we only use this `id` in a few places and all of these could be replaced with `eq`.  I just try removing it and see what breaks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org