You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kayousterhout <gi...@git.apache.org> on 2015/02/10 21:05:52 UTC

[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

GitHub user kayousterhout opened a pull request:

    https://github.com/apache/spark/pull/4510

    [SPARK-5645] Added local read bytes/time to task metrics

    @ksakellis I stumbled on your JIRA for this yesterday; I know it's assigned to you but I'd already done this for my own uses a while ago so thought I could help save you the work of doing it!  Hopefully this doesn't duplicate any work you've already done.
     
    Here's a screenshot of what the UI looks like:
    ![image](https://cloud.githubusercontent.com/assets/1108612/6135352/c03e7276-b11c-11e4-8f11-c6aefe1f35b9.png)
    Based on a discussion with @pwendell, I put the data read remotely in as an additional metric rather than showing it in brackets as you'd suggested, Kostas.  The assumption here is that the average user doesn't care about the differentiation between local / remote data, so it's better not to pollute the UI.
    
    I also added data about the local read time, which I've found very helpful for debugging, but I didn't put it in the UI because I think it's probably something not a ton of people will need to use.
    
    With this change, the total read time and total write time shown in the UI should now add up properly:
    ![image](https://cloud.githubusercontent.com/assets/1108612/6135399/25f14490-b11d-11e4-8086-20be5f4002e6.png)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kayousterhout/spark-1 SPARK-5645

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4510.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4510
    
----
commit 347e2cd440c13811f3c2d795ca195ec157eaa125
Author: Kay Ousterhout <ka...@gmail.com>
Date:   2014-09-08T23:45:52Z

    [SPARK-5645] Added local read bytes/time to task metrics

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73783515
  
      [Test build #27226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27226/consoleFull) for   PR 4510 at commit [`33d2e2d`](https://github.com/apache/spark/commit/33d2e2dd7fecf92b24f4322460bfe6a1b1d8f094).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by ksakellis <gi...@git.apache.org>.
Github user ksakellis commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73812744
  
    Thanks for doing this! Thats one less thing for me to do. Overall this LGTM aside from the small little comments I made. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4510#discussion_r24627204
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -454,11 +487,17 @@ private[ui] class StagePage(parent: StagesTab) extends WebUIPage("stage") {
           val shuffleReadBlockedTimeReadable =
             maybeShuffleRead.map(ms => UIUtils.formatDuration(ms.fetchWaitTime)).getOrElse("")
     
    -      val shuffleReadSortable = maybeShuffleRead.map(_.remoteBytesRead.toString).getOrElse("")
    -      val shuffleReadReadable = maybeShuffleRead
    -        .map(m => s"${Utils.bytesToString(m.remoteBytesRead)}").getOrElse("")
    +      val totalShuffleBytes = maybeShuffleRead.map(_.totalBytesRead)
    +      val shuffleReadSortable = totalShuffleBytes.map(_.toString).getOrElse("")
    +      val shuffleReadReadable = totalShuffleBytes
    +        .map(bytes => s"${Utils.bytesToString(bytes)}").getOrElse("")
           val shuffleReadRecords = maybeShuffleRead.map(_.recordsRead.toString).getOrElse("")
     
    +      val remoteShuffleBytes = maybeShuffleRead.map(_.remoteBytesRead)
    +      val shuffleReadRemoteSortable = remoteShuffleBytes.map(_.toString).getOrElse("")
    +      val shuffleReadRemoteReadable = remoteShuffleBytes
    +        .map(bytes => s"${Utils.bytesToString(bytes)}").getOrElse("")
    --- End diff --
    
    why not just `.map(Utils.bytesToString).getOrElse("")`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73799467
  
      [Test build #27236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27236/consoleFull) for   PR 4510 at commit [`ba05149`](https://github.com/apache/spark/commit/ba05149f24b78d70bb9f3c52e44d1adfe2cb682d).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by ksakellis <gi...@git.apache.org>.
Github user ksakellis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4510#discussion_r24464381
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala ---
    @@ -344,6 +346,20 @@ class ShuffleReadMetrics extends Serializable {
       private[spark] def decRemoteBytesRead(value: Long) = _remoteBytesRead -= value
     
       /**
    +   * Time the task spent (in milliseconds) reading local shuffle blocks (from the local disk).
    +   */
    +  private var _localReadTime: Long = _
    +  def localReadTime = _localReadTime
    +  private[spark] def incLocalReadTime(value: Long) = _localReadTime += value
    +
    +  /**
    +   * Shuffle data that was read from the local disk (as opposed to from a remote executor).
    +   */
    +  private var _localBytesRead: Long = _
    +  def localBytesRead = _localBytesRead
    +  private[spark] def incLocalBytesRead(value: Long) = _localBytesRead += value
    +
    +  /**
        * Number of blocks fetched in this shuffle by this task (remote or local)
        */
       def totalBlocksFetched = _remoteBlocksFetched + _localBlocksFetched
    --- End diff --
    
    To stay consistent, can you add def totalBytesRead = _remoteBytesRead + _localBytesRead



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74011838
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27319/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73773623
  
      [Test build #27225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27225/consoleFull) for   PR 4510 at commit [`347e2cd`](https://github.com/apache/spark/commit/347e2cd440c13811f3c2d795ca195ec157eaa125).
     * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by ksakellis <gi...@git.apache.org>.
Github user ksakellis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4510#discussion_r24464185
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
    @@ -181,7 +188,8 @@ private[ui] class StagePage(parent: StagesTab) extends WebUIPage("stage") {
             {if (stageData.hasOutput) Seq(("Output Size / Records", "")) else Nil} ++
             {if (stageData.hasShuffleRead) {
               Seq(("Shuffle Read Blocked Time", TaskDetailsClassNames.SHUFFLE_READ_BLOCKED_TIME),
    -            ("Shuffle Read Size / Records", ""))
    +            ("Shuffle Read Size / Records", ""),
    +            ("Shuffle Data Read Remotely", TaskDetailsClassNames.SHUFFLE_READ_REMOTE_SIZE))
    --- End diff --
    
    Nit: Maybe a shorter string can be "Shuffle Remote Reads"? It'll make the column thinner. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73783525
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27226/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73810289
  
      [Test build #27236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27236/consoleFull) for   PR 4510 at commit [`ba05149`](https://github.com/apache/spark/commit/ba05149f24b78d70bb9f3c52e44d1adfe2cb682d).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74012183
  
      [Test build #27320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27320/consoleFull) for   PR 4510 at commit [`5f5da1b`](https://github.com/apache/spark/commit/5f5da1b33e0ddffe9d93e3514d3874200a1cbd61).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74168655
  
    Thanks Andrew!
    
    On Thu, Feb 12, 2015 at 2:32 PM, andrewor14 <no...@github.com>
    wrote:
    
    > Alright LGTM I'll fix the comments myself when I merge this into master
    > and 1.3 thanks.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/4510#issuecomment-74167787>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74017678
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27321/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by ksakellis <gi...@git.apache.org>.
Github user ksakellis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4510#discussion_r24450007
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -189,6 +189,7 @@ final class ShuffleBlockFetcherIterator(
             // Filter out zero-sized blocks
             localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1)
             numBlocksToFetch += localBlocks.size
    +        shuffleMetrics.incLocalBytesRead(blockInfos.map(_._2).sum)
    --- End diff --
    
    Is there a reason you are not incrementing the localBytesRead in fetchLocalBlocks right after the localBlocks are incremented?
    Looking at the initialization code, i don't really think there is much difference in functionality but it might be easier to understand to have both:
    shuffleMetrics.incLocalBlocksFetched(1)
    shuffleMetrics.incLocalBytesRead(buf.size)
    be together. Also, if they are together there is less chance of them later being calculated at different times. (if we want to add more parallelism for example). 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74011712
  
      [Test build #27319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27319/consoleFull) for   PR 4510 at commit [`5da04cf`](https://github.com/apache/spark/commit/5da04cf2dee630416218e5bf18adc8083ef01607).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74167787
  
    Alright LGTM I'll fix the comments myself when I merge this into master and 1.3 thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73810297
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27236/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73787570
  
      [Test build #27225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27225/consoleFull) for   PR 4510 at commit [`347e2cd`](https://github.com/apache/spark/commit/347e2cd440c13811f3c2d795ca195ec157eaa125).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74017676
  
      [Test build #27321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27321/consoleFull) for   PR 4510 at commit [`4a0182c`](https://github.com/apache/spark/commit/4a0182c294d8d96eba87992250cc61cdbb0d997d).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4510#discussion_r24457817
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -189,6 +189,7 @@ final class ShuffleBlockFetcherIterator(
             // Filter out zero-sized blocks
             localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1)
             numBlocksToFetch += localBlocks.size
    +        shuffleMetrics.incLocalBytesRead(blockInfos.map(_._2).sum)
    --- End diff --
    
    Good point -- fixed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73773975
  
    cc @andrewor14


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4510


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74012381
  
      [Test build #27321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27321/consoleFull) for   PR 4510 at commit [`4a0182c`](https://github.com/apache/spark/commit/4a0182c294d8d96eba87992250cc61cdbb0d997d).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74012184
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27320/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73787582
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27225/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-73774514
  
      [Test build #27226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27226/consoleFull) for   PR 4510 at commit [`33d2e2d`](https://github.com/apache/spark/commit/33d2e2dd7fecf92b24f4322460bfe6a1b1d8f094).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74011837
  
      [Test build #27319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27319/consoleFull) for   PR 4510 at commit [`5da04cf`](https://github.com/apache/spark/commit/5da04cf2dee630416218e5bf18adc8083ef01607).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74012021
  
      [Test build #27320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27320/consoleFull) for   PR 4510 at commit [`5f5da1b`](https://github.com/apache/spark/commit/5f5da1b33e0ddffe9d93e3514d3874200a1cbd61).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4510#discussion_r24627100
  
    --- Diff: core/src/test/scala/org/apache/spark/ui/jobs/JobProgressListenerSuite.scala ---
    @@ -290,8 +291,11 @@ class JobProgressListenerSuite extends FunSuite with LocalSparkContext with Matc
     
         stage0Data = listener.stageIdToData.get((0, 0)).get
         stage1Data = listener.stageIdToData.get((1, 0)).get
    -    assert(stage0Data.shuffleReadBytes == 402)
    -    assert(stage1Data.shuffleReadBytes == 602)
    +    // Task 1235 contributed (100+1)+(100+9) = 210 shuffle bytes, and task 1234 contributed
    +    // (300+1)+(300+9) = 610 total shuffle bytes, so the total for the stage is 820.
    +    assert(stage0Data.shuffleReadTotalBytes == 820)
    +    // Task 1236 contributed 410 shuffle bytes, and task 1237 contributed 810 shuffle bytes.
    +    assert(stage1Data.shuffleReadTotalBytes == 1220)
    --- End diff --
    
    hm these should all have `===`, but we can fix this outside of this patch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5645] Added local read bytes/time to ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/4510#issuecomment-74011746
  
    Thanks for taking a look at this @ksakellis! I've fixed the issues you pointed out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org