You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2016/04/14 03:18:15 UTC

[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/12378

    [SPARK-14619] Track internal accumulators (metrics) by stage attempt

    ## What changes were proposed in this pull request?
    Working on the description right now ...
    
    
    ## How was this patch tested?
    Covered by existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark SPARK-14619

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12378.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12378
    
----
commit 48f248a3e4eaa7843734cbc4f03fdb67dce81bf6
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-04-14T01:17:34Z

    [SPARK-14619] Track internal accumulators (metrics) by stage attempt rather than stage

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12378#issuecomment-209740374
  
    **[Test build #55776 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55776/consoleFull)** for PR 12378 at commit [`48f248a`](https://github.com/apache/spark/commit/48f248a3e4eaa7843734cbc4f03fdb67dce81bf6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12378#issuecomment-209713666
  
    **[Test build #55776 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55776/consoleFull)** for PR 12378 at commit [`48f248a`](https://github.com/apache/spark/commit/48f248a3e4eaa7843734cbc4f03fdb67dce81bf6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12378#discussion_r59675345
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala ---
    @@ -75,22 +75,6 @@ private[scheduler] abstract class Stage(
       val name: String = callSite.shortForm
       val details: String = callSite.longForm
     
    -  private var _internalAccumulators: Seq[Accumulator[_]] = Seq.empty
    -
    -  /** Internal accumulators shared across all tasks in this stage. */
    -  def internalAccumulators: Seq[Accumulator[_]] = _internalAccumulators
    -
    -  /**
    -   * Re-initialize the internal accumulators associated with this stage.
    -   *
    -   * This is called every time the stage is submitted, *except* when a subset of tasks
    -   * belonging to this stage has already finished. Otherwise, reinitializing the internal
    -   * accumulators here again will override partial values from the finished tasks.
    --- End diff --
    
    The whole point of this change is to split them into separate accumulators so we don't need to worry about overrides.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12378#discussion_r59674773
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala ---
    @@ -75,22 +75,6 @@ private[scheduler] abstract class Stage(
       val name: String = callSite.shortForm
       val details: String = callSite.longForm
     
    -  private var _internalAccumulators: Seq[Accumulator[_]] = Seq.empty
    -
    -  /** Internal accumulators shared across all tasks in this stage. */
    -  def internalAccumulators: Seq[Accumulator[_]] = _internalAccumulators
    -
    -  /**
    -   * Re-initialize the internal accumulators associated with this stage.
    -   *
    -   * This is called every time the stage is submitted, *except* when a subset of tasks
    -   * belonging to this stage has already finished. Otherwise, reinitializing the internal
    -   * accumulators here again will override partial values from the finished tasks.
    --- End diff --
    
    No


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12378#issuecomment-210076280
  
    LGTM merging into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12378#issuecomment-209740489
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12378


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12378#discussion_r59674310
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala ---
    @@ -75,22 +75,6 @@ private[scheduler] abstract class Stage(
       val name: String = callSite.shortForm
       val details: String = callSite.longForm
     
    -  private var _internalAccumulators: Seq[Accumulator[_]] = Seq.empty
    -
    -  /** Internal accumulators shared across all tasks in this stage. */
    -  def internalAccumulators: Seq[Accumulator[_]] = _internalAccumulators
    -
    -  /**
    -   * Re-initialize the internal accumulators associated with this stage.
    -   *
    -   * This is called every time the stage is submitted, *except* when a subset of tasks
    -   * belonging to this stage has already finished. Otherwise, reinitializing the internal
    -   * accumulators here again will override partial values from the finished tasks.
    --- End diff --
    
    Don't we need to care about the overriding problem as mentioned here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12378#issuecomment-209740490
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55776/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14619] Track internal accumulators (met...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12378#issuecomment-209753807
  
    cc @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org