You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2016/04/27 18:58:53 UTC

[GitHub] spark pull request: [SPARK-14960][WIP] treeAggregate should fall b...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/12737

    [SPARK-14960][WIP] treeAggregate should fall back to regular aggregate in local mode

    I don't think that `treeAggregate` will help performance in local mode and based on measurement of some unit tests it looks like it actually severely harms performance in certain cases. Therefore, I think that `treeAggregate` should fall back to plain `aggregate` when running in `local` mode.
    
    This is WIP because I think we'll need to refactor the tests to make sure that this patch doesn't lead to a loss of test coverage in `treeAggregate`. I'm opening now in order to run tests and measure performance.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-14960-skip-tree-agg-in-local-mode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12737
    
----
commit bfdc3f3957ffeba974d7dc8922cecfcb9a954345
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-04-27T16:56:08Z

    Try disabling treeAggregate in local mode.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14960][WIP] treeAggregate should fall b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12737#issuecomment-215185901
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57142/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14960][WIP] treeAggregate should fall b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12737#issuecomment-215149083
  
    **[Test build #57142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57142/consoleFull)** for PR 12737 at commit [`bfdc3f3`](https://github.com/apache/spark/commit/bfdc3f3957ffeba974d7dc8922cecfcb9a954345).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14960][WIP] treeAggregate should fall b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12737#issuecomment-215185896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14960][WIP] treeAggregate should fall b...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/12737#issuecomment-215164982
  
    One test which is significantly improved by this patch is `JsonHadoopFsRelationSuite`'s "SPARK-8406: Avoids name collision while writing files" test, which improved from ~40 seconds to ~8 seconds. It's probably worth separately investigating exactly why the magnitude of the improvement was so big here, but irrespective of the underlying cause I don't see a good reason to do `treeAggregate` in `local` mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14960][WIP] treeAggregate should fall b...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen closed the pull request at:

    https://github.com/apache/spark/pull/12737


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14960][WIP] treeAggregate should fall b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12737#issuecomment-215185684
  
    **[Test build #57142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57142/consoleFull)** for PR 12737 at commit [`bfdc3f3`](https://github.com/apache/spark/commit/bfdc3f3957ffeba974d7dc8922cecfcb9a954345).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org