You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by sryza <gi...@git.apache.org> on 2014/09/23 08:04:14 UTC

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

GitHub user sryza opened a pull request:

    https://github.com/apache/spark/pull/2504

    SPARK-3172 and SPARK-3577

    The posted patch addresses both SPARK-3172 and SPARK-3577.  It renames ShuffleWriteMetrics to WriteMetrics and uses it for tracking all three of shuffle write, spilling on the fetch side, and spilling on the write side (which only occurs during sort-based shuffle).
    
    I'll fix and add tests if people think restructuring the metrics in this way makes sense.
    
    I'm a little unsure about the name shuffleReadSpillMetrics, as spilling happens during aggregation, not read, but I had trouble coming up with something better.
    
    I'm also unsure on what the most useful columns would be to display in the UI - I remember some pushback on adding new columns.  Ultimately these metrics will be most helpful if they can inform users whether and how much they need to increase the number of partitions / increase spark.shuffle.memoryFraction.  Reporting spill time informs users whether spilling is a significant impacting performance.  Reporting memory size can help with understanding how much needs to be done to avoid spilling.
    
    @pwendell any thoughts on this?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sryza/spark sandy-spark-3172

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2504.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2504
    
----
commit c854514d81b4830ce1f1109662a713c51e6c8023
Author: Sandy Ryza <sa...@cloudera.com>
Date:   2014-09-23T05:58:18Z

    SPARK-3172 and SPARK-3577

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by kayousterhout <gi...@git.apache.org>.

Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-74035966
  
    @sryza I'm wondering about SPARK-3172 -- is that a common source of confusion / debugging difficulty? Just wondering whether it's worthwhile to fix that, since this patch would be simpler if you could just have a single set of spill metrics for the task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by kayousterhout <gi...@git.apache.org>.

Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-74140271
  
    Ok that makes sense...now my concern is whether it's possible to do that without changing the developer API (see my in-line comment).
    
    A few other things:
    -Can you fix the import ordering? :)
    -It looks like, in external sorter, right now you just keep track of the spilled bytes.  Can you add the spill time there too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-75010587
  
      [Test build #27713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27713/consoleFull) for   PR 2504 at commit [`b38fe51`](https://github.com/apache/spark/commit/b38fe51c6eccc4c9dd7886153ab9acef23263272).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-75010658
  
    I rewrote this patch with more of an aim to maintain compatibility than the last version.  This is still not completely done and likely will not past tests.  I'm hoping to get validation on the general approach before polishing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by kayousterhout <gi...@git.apache.org>.

Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-62475273
  
    @sryza @andrewor14, @shivaram just sent me a pointer to this -- just wanted to comment that this should be totally OK to add to the UI now that we have functionality to show/hide columns (#2867), because these could be hidden by default and included in the "Show Additional Metrics" list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-56868893
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20815/consoleFull) for   PR 2504 at commit [`c854514`](https://github.com/apache/spark/commit/c854514d81b4830ce1f1109662a713c51e6c8023).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-56869377
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20815/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-56479669
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20689/consoleFull) for   PR 2504 at commit [`c854514`](https://github.com/apache/spark/commit/c854514d81b4830ce1f1109662a713c51e6c8023).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-56868000
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by kayousterhout <gi...@git.apache.org>.

Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2504#discussion_r24613511
  
    --- Diff: core/src/main/scala/org/apache/spark/Aggregator.scala ---
    @@ -54,14 +55,13 @@ case class Aggregator[K, V, C] (
           }
           combiners.iterator
         } else {
    -      val combiners = new ExternalAppendOnlyMap[K, V, C](createCombiner, mergeValue, mergeCombiners)
    -      combiners.insertAll(iter)
    -      // Update task metrics if context is not null
           // TODO: Make context non optional in a future release
    -      Option(context).foreach { c =>
    -        c.taskMetrics.memoryBytesSpilled += combiners.memoryBytesSpilled
    -        c.taskMetrics.diskBytesSpilled += combiners.diskBytesSpilled
    -      }
    +      val spillMetrics = Option(context).map(
    +        _.taskMetrics.getOrCreateShuffleReadSpillMetrics()).getOrElse(new WriteMetrics())
    --- End diff --
    
    If we're doing map-side combining, this should be for the shuffle write, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-56869375
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20815/consoleFull) for   PR 2504 at commit [`c854514`](https://github.com/apache/spark/commit/c854514d81b4830ce1f1109662a713c51e6c8023).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-74037430
  
    I think SPARK-3172 is probably the more useful portion of this patch.  Ultimately, users want to be able to trace their shuffle resource consumption back to the transformations they used.  A single shuffle spill metric makes it difficult to determine which of the stage boundaries bordering a task (and from there, the transformation that triggered the stage boundary) is the culprit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-75015873
  
      [Test build #27713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27713/consoleFull) for   PR 2504 at commit [`b38fe51`](https://github.com/apache/spark/commit/b38fe51c6eccc4c9dd7886153ab9acef23263272).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-56879452
  
    I considered that approach as well, but found that this one sat more elegantly with the metrics collecting code.
    
    DiskObjectWriter, which is used both when spilling and when writing out final shuffle data, accepts a WriteMetrics (nee ShuffleWriteMetrics) object, and increments it as it writes.  Having a more complex ShuffleWriteMetrics object would require pushing down knowledge about the purpose of the write into DiskObjectWriter so it could increment the appropriate fields.  Or we could make a distinction between the metric-collecting objects that DiskObjectWriters takes and the ShuffleWriteMetrics/ShuffleReadMetrics that go into the TaskMetrics, but this seems to me like a layer of complexity that's worth avoiding if we can.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-56870531
  
    Hey @sryza I haven't looked in detail at the changes but I have a few high-level comments. We only ever  spill during a shuffle read or shuffle write, so I think it actually makes sense to just add a field in each of `ShuffleWriteMetrics` and `ShuffleReadMetrics` that reports the number of bytes spilled ("Bytes Spilled") and the time spent spilling ("Spill Time" or something). The existing code groups the bytes spilled from both shuffle read and shuffle write into `ShuffleWriteMetrics`, which I think is slightly incorrect.
    
    As for the columns, I think it's OK to add more for these particular metrics. Eventually we'll have a mechanism for the user to toggle which columns he/she is interested in (through a series of checkboxes or something), and the default set could be a subset of all the columns. If you're worried about having too many columns, we could group the corresponding columns in `ShuffleReadMetrics` and `ShuffleWriteMetrics` and separate them with a slash or something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by hammer <gi...@git.apache.org>.

Github user hammer commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-62474114
  
    @andrewor14 @sryza what happened with this patch? these metrics would be nice to have.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by nchammas <gi...@git.apache.org>.

Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-76054035
  
    Minor side comment: Can we update the title of this PR to something like:
    
    ```
    [SPARK-3172] [SPARK-3577] Concise description of changes goes here
    ```
    
    Right now this PR shows up weird on the [PR dashboard](https://spark-prs.appspot.com/).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-56479827
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20689/consoleFull) for   PR 2504 at commit [`c854514`](https://github.com/apache/spark/commit/c854514d81b4830ce1f1109662a713c51e6c8023).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class WriteMetrics extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2504#discussion_r24645659
  
    --- Diff: core/src/main/scala/org/apache/spark/Aggregator.scala ---
    @@ -54,14 +55,13 @@ case class Aggregator[K, V, C] (
           }
           combiners.iterator
         } else {
    -      val combiners = new ExternalAppendOnlyMap[K, V, C](createCombiner, mergeValue, mergeCombiners)
    -      combiners.insertAll(iter)
    -      // Update task metrics if context is not null
           // TODO: Make context non optional in a future release
    -      Option(context).foreach { c =>
    -        c.taskMetrics.memoryBytesSpilled += combiners.memoryBytesSpilled
    -        c.taskMetrics.diskBytesSpilled += combiners.diskBytesSpilled
    -      }
    +      val spillMetrics = Option(context).map(
    +        _.taskMetrics.getOrCreateShuffleReadSpillMetrics()).getOrElse(new WriteMetrics())
    --- End diff --
    
    Ah yeah, looks like this is a copy/paste mistake.  In this path, I think we're always on the map side, so the call should just be `getOrCreateShuffleWriteSpillMetrics`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-56479829
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20689/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-75506483
  
    Both approaches are fine with me.  I've had some discussions recently with @pwendell recently where he's stressed the importance of not breaking developer APIs. Patrick, care to weigh in?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-75015886
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27713/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Posted by kayousterhout <gi...@git.apache.org>.

Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-75100303
  
    @sryza is the idea here to maintain backward compatibility by re-using the ShuffleWriteMetrics class for spill metrics?
    
    I preferred the old approach, where you defined a DiskWriteMetrics class, and then ShuffleReadMetrics includes "val spillMetrics = new DiskWriteMetrics", and shuffle write metrics includes two DiskWriteMetrics (one for spill and one for normal write).  To make everything backward compatible, I might just write the JsonMetrics in the same way as before for ShuffleWriteMetrics.  I'm just concerned this approach is pretty hard to understand for a new person looking at this.  Happy for others to weigh in though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org