You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/23 23:54:42 UTC

[GitHub] [spark] robreeves opened a new pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

robreeves opened a new pull request #35637:
URL: https://github.com/apache/spark/pull/35637


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] shahidki31 commented on pull request #35637: [SPARK-38309][CORE] Fix SHS `shuffleTotalReads` and `shuffleTotalBlocks` percentile metrics

Posted by GitBox <gi...@apache.org>.
shahidki31 commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1061256121


   Ok. We will be converting to the actual shuffle metrics value here. https://github.com/apache/spark/blob/7715538a8743817f110dbaac8aa010dee6bd730b/core/src/main/scala/org/apache/spark/status/storeTypes.scala#L270-L276
   
   But this method is used only for computing the summary metrics. LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] robreeves commented on a change in pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
robreeves commented on a change in pull request #35637:
URL: https://github.com/apache/spark/pull/35637#discussion_r816913951



##########
File path: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala
##########
@@ -227,32 +268,41 @@ class AppStatusStoreSuite extends SparkFunSuite {
     liveTask.write(store.asInstanceOf[ElementTrackingStore], 1L)
   }
 
-  private def getTaskMetrics(i: Int): TaskMetrics = {
+  /**
+   * Creates fake task metrics
+   * @param seed The random seed. The output will be reproducible for a given seed.
+   * @return The test metrics object with fake data
+   */

Review comment:
       The previous version of the test set every metric to the same value. This left a gap in test coverage where the test would pass if the code accidentally used the wrong metric (an easy mistake to make). This improves the test coverage be eliminating this gap because every metric has unique values. This is important because I expanded the test to check every metric, instead of just `executorRunTime` like before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #35637:
URL: https://github.com/apache/spark/pull/35637#discussion_r816431350



##########
File path: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala
##########
@@ -227,32 +268,41 @@ class AppStatusStoreSuite extends SparkFunSuite {
     liveTask.write(store.asInstanceOf[ElementTrackingStore], 1L)
   }
 
-  private def getTaskMetrics(i: Int): TaskMetrics = {
+  /**
+   * Creates fake task metrics
+   * @param seed The random seed. The output will be reproducible for a given seed.
+   * @return The test metrics object with fake data
+   */

Review comment:
       Why are we changing this (use of random) ? What behavior/change are we testing here ?

##########
File path: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala
##########
@@ -227,32 +268,41 @@ class AppStatusStoreSuite extends SparkFunSuite {
     liveTask.write(store.asInstanceOf[ElementTrackingStore], 1L)
   }
 
-  private def getTaskMetrics(i: Int): TaskMetrics = {
+  /**
+   * Creates fake task metrics
+   * @param seed The random seed. The output will be reproducible for a given seed.
+   * @return The test metrics object with fake data
+   */
+  private def getTaskMetrics(seed: Int): TaskMetrics = {
+    val random = new Random(seed)
+    val randomMax = 1000
+    def nextInt(): Int = random.nextInt(randomMax)

Review comment:
       Note: This can end up becoming zero.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1049529081


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] robreeves commented on a change in pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
robreeves commented on a change in pull request #35637:
URL: https://github.com/apache/spark/pull/35637#discussion_r816915243



##########
File path: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala
##########
@@ -227,32 +268,41 @@ class AppStatusStoreSuite extends SparkFunSuite {
     liveTask.write(store.asInstanceOf[ElementTrackingStore], 1L)
   }
 
-  private def getTaskMetrics(i: Int): TaskMetrics = {
+  /**
+   * Creates fake task metrics
+   * @param seed The random seed. The output will be reproducible for a given seed.
+   * @return The test metrics object with fake data
+   */
+  private def getTaskMetrics(seed: Int): TaskMetrics = {
+    val random = new Random(seed)
+    val randomMax = 1000
+    def nextInt(): Int = random.nextInt(randomMax)

Review comment:
       This should have no impact on the validity of the tests here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] robreeves commented on pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
robreeves commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1054487295


   @srowen if @shahidki31 doesn't answer is there anyone else we can ping to verify the changes? Let me know if I can help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] robreeves edited a comment on pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
robreeves edited a comment on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1055756392


   > Can we add a specific unit test to validate that the sign is not flipping ?
   
   The test `SPARK-26260: summary should contain only successful tasks' metrics (store = disk leveldb)` in `AppStatusStoreSuite` would catch this because it would mean a task is either included or excluded when it shouldn't be. This test is more like a mini-integration test.
   
   After researching if we could write a lower level test directly for the index value I am leaning towards the above test being the appropriate level to test at, but am open to changing this if you disagree.
   
   - `TaskDataWrapper` is only a data class so we could pass in whatever field values we want for the test, including making the sign flip. A test here would not really test any meaningful behavior. Additionally, we'd be making the fields `shuffleTotalReads` and `shuffleTotalBlocks` public just for testing. Not a huge deal and I am not dogmatic on this, but I generally consider exposing data/logic just for testing an anti-pattern and feel unit tests should focus on what a class, or classes, needs to expose publicly.
   - `LiveTask` is also private. I could change its access level to `LiveTask` and `TaskDataWrapper` to test it, but it is well covered in the `AppStatusStoreSuite` test. Since the behavior in `AppStatusStore` is what we really are concerned about and these are more helper classes I felt this is acceptable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] shahidki31 edited a comment on pull request #35637: [SPARK-38309][CORE] Fix SHS `shuffleTotalReads` and `shuffleTotalBlocks` percentile metrics

Posted by GitBox <gi...@apache.org>.
shahidki31 edited a comment on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1061256121


   Ok. We will be converting to the actual shuffle metrics value here. https://github.com/apache/spark/blob/7715538a8743817f110dbaac8aa010dee6bd730b/core/src/main/scala/org/apache/spark/status/storeTypes.scala#L270-L276
   
   But this method is used only for computing the summary metrics.
   https://github.com/apache/spark/blob/7715538a8743817f110dbaac8aa010dee6bd730b/core/src/main/scala/org/apache/spark/status/storeTypes.scala#L344-L351
   
    LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] robreeves commented on pull request #35637: [SPARK-38309][CORE] Fix SHS `shuffleTotalReads` and `shuffleTotalBlocks` percentile metrics

Posted by GitBox <gi...@apache.org>.
robreeves commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1061178727


   > @robreeves The PR #26508 was originally intended to compute summary metrics for only successful tasks. That is why made all the non successful tasks' metrics negative, initially. So do you see ShuffleTotalReads and ShuffleTotalBlocks negative even for successful tasks?
   
   @shahidki31 The problem for ShuffleTotalReads and ShuffleTotalBlocks is that the index values are not ever negative because `getMetricValue` is called within the index method. Because of this non successful tasks are included in the percentile computations for these metrics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #35637: [SPARK-38309][CORE] Fix SHS `shuffleTotalReads` and `shuffleTotalBlocks` percentile metrics

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1062093790


   Merged to master/branch-3.2/branch-3.1
   
   Thanks for working on this @robreeves !
   Thanks for the reviews @dongjoon-hyun, @shahidki31 :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] shahidki31 commented on pull request #35637: [SPARK-38309][CORE] Fix SHS `shuffleTotalReads` and `shuffleTotalBlocks` percentile metrics

Posted by GitBox <gi...@apache.org>.
shahidki31 commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1060366939


   @robreeves The PR https://github.com/apache/spark/pull/26508 was originally intended to compute summary metrics for only successful tasks. That is why made all the non successful tasks metrics negative, initially. So do you see ShuffleTotalReads and ShuffleTotalBlocks negative even for successful tasks?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] robreeves commented on a change in pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
robreeves commented on a change in pull request #35637:
URL: https://github.com/apache/spark/pull/35637#discussion_r816922878



##########
File path: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala
##########
@@ -227,32 +268,41 @@ class AppStatusStoreSuite extends SparkFunSuite {
     liveTask.write(store.asInstanceOf[ElementTrackingStore], 1L)
   }
 
-  private def getTaskMetrics(i: Int): TaskMetrics = {
+  /**
+   * Creates fake task metrics
+   * @param seed The random seed. The output will be reproducible for a given seed.
+   * @return The test metrics object with fake data
+   */
+  private def getTaskMetrics(seed: Int): TaskMetrics = {
+    val random = new Random(seed)
+    val randomMax = 1000
+    def nextInt(): Int = random.nextInt(randomMax)

Review comment:
       I think you might have mentioned this due to the metric value being made negative for tasks that aren't successful. Zero is accounted for [here](https://github.com/apache/spark/pull/26508/files#diff-141a7567a788693eb50e629e135b0d4656fa58409e1402508d7aa241f3e43de4R731).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1049912279


   Just out of curiosity, can you explain again the idea that negative = negative + negative? why add the same thing to itself? it looks funny but I'm sure there's a reason it works - may be worth a comment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1058847727


   Since the existing test was not catching this issue, I want to make sure that we are testing for this behavior.
   Given that we have exhaustively tested the current metrics - including the one which had an issue, this looks fine to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1050088375


   Oh disregard, I read the new code as adding the same variable to itself -- they're local vs remote values. I got it
   I don't know quite enough to say I can evaluate the change, but looks reasonable and sounds like you've researched this.
   @shahidki31 WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #35637:
URL: https://github.com/apache/spark/pull/35637#discussion_r819277689



##########
File path: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala
##########
@@ -227,32 +268,41 @@ class AppStatusStoreSuite extends SparkFunSuite {
     liveTask.write(store.asInstanceOf[ElementTrackingStore], 1L)
   }
 
-  private def getTaskMetrics(i: Int): TaskMetrics = {
+  /**
+   * Creates fake task metrics
+   * @param seed The random seed. The output will be reproducible for a given seed.
+   * @return The test metrics object with fake data
+   */

Review comment:
       Sounds good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #35637: [SPARK-38309][CORE] Fix SHS `shuffleTotalReads` and `shuffleTotalBlocks` percentile metrics

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #35637:
URL: https://github.com/apache/spark/pull/35637


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] robreeves commented on pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
robreeves commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1055756392


   > Can we add a specific unit test to validate that the sign is not flipping ?
   
   The test `SPARK-26260: summary should contain only successful tasks' metrics (store = disk leveldb)` in `AppStatusStoreSuite` would catch this because it would mean a task is either included or excluded when it shouldn't be. This test is more like a mini-integration test.
   
   After looking into see if we could write a lower level test directly for the index value I am leaning towards the above test being the appropriate level to test at, but am open to changing this if you disagree.
   
   - `TaskDataWrapper` is only a data class so we could pass in whatever field values we want for the test, including making the sign flip. A test here would not really test any meaningful behavior. Additionally, we'd be making the fields `shuffleTotalReads` and `shuffleTotalBlocks` public just for testing. Not a huge deal and I am not dogmatic on this, but I generally consider exposing data/logic just for testing an anti-pattern and feel unit tests should focus on what a class, or classes, needs to expose publicly.
   - `LiveTask` is also private. I could change its access level to `LiveTask` and `TaskDataWrapper` to test it, but it is well covered in the `AppStatusStoreSuite` test. Since the behavior in `AppStatusStore` is what we really are concerned about and these are more helper classes I felt this is acceptable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #35637: [SPARK-38309][CORE] Fix SHS `shuffleTotalReads` and `shuffleTotalBlocks` percentile metrics

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1060099666


   Backport to 3.2/3.1 I presume?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] shahidki31 edited a comment on pull request #35637: [SPARK-38309][CORE] Fix SHS `shuffleTotalReads` and `shuffleTotalBlocks` percentile metrics

Posted by GitBox <gi...@apache.org>.
shahidki31 edited a comment on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1060366939


   @robreeves The PR https://github.com/apache/spark/pull/26508 was originally intended to compute summary metrics for only successful tasks. That is why made all the non successful tasks' metrics negative, initially. So do you see ShuffleTotalReads and ShuffleTotalBlocks negative even for successful tasks?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] robreeves commented on pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

Posted by GitBox <gi...@apache.org>.
robreeves commented on pull request #35637:
URL: https://github.com/apache/spark/pull/35637#issuecomment-1050061063


   > Just out of curiosity, can you explain again the idea that negative = negative + negative? why add the same thing to itself? it looks funny but I'm sure there's a reason it works - may be worth a comment
   
   This doesn't add the same number together twice. This refers to the sign (positive or negative) for the composite indices modified in this PR, `TaskDataWrapper.shuffleTotalReads` and `TaskDataWrapper.shuffleTotalBlocks`. The sign is important because tasks that aren't successful should have a negative sign (implemented in PR #26508).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org