You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "dzypersonal (via GitHub)" <gi...@apache.org> on 2023/08/21 02:26:58 UTC

[GitHub] [spark] dzypersonal commented on pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

dzypersonal commented on PR #36162:
URL: https://github.com/apache/spark/pull/36162#issuecomment-1685522302

   > It helps in two cases @weixiuli - the example you gave (generated input (like range()), etc where there is no input metrics). It also helps when reading shuffle input where there is a sort - the entire shuffle input will get consumed at beginning of the task, but the output rate would be impacted by the subsequent computation/skew/etc in the task (or even output writes from the stage).
   
   That makes sense. I got a data skew task as follows:
   ![企业微信截图_80ae77cb-4647-49ce-a3e5-6c6c78104d09](https://github.com/apache/spark/assets/39691337/309028a1-1e33-404a-80b0-186a8aafc5b1)
   ![企业微信截图_91a84b56-66c3-4a52-8bdd-bbc779a743a0](https://github.com/apache/spark/assets/39691337/a0325a5f-eab3-4e85-a415-7b69ece9528e)
   
   Median shuffle read records process rate is probably 25507 / 5 = 5101.4, and shuffle write records process rate is probably 399365 / 5 = 79873.
   Skew task index 42 is marked as speculatable due to its shuffle read records process rate is probably 15606048 / 5400 = 2890, it seems ineffecient. But if we calculate its shuffle write records process rate, that probably is 499186709 / 5400 = 92441.983, it is larger than median one 79873.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org