You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/29 21:22:24 UTC

[GitHub] [spark] squito opened a new pull request #24245: [SPARK-13704][CORE][YARN] Reduce rack resolution time

squito opened a new pull request #24245: [SPARK-13704][CORE][YARN] Reduce rack resolution time
URL: https://github.com/apache/spark/pull/24245
 
 
   ## What changes were proposed in this pull request?
   
   If submits a stage with abundant tasks, rack resolving takes a long time when initializing TaskSetManager caused by a mass of loops to execute rack resolving script.
   Based on current implementation, it takes 30~40 seconds to resolve the racks in our 5000 nodes' cluster. After applied the patch, it decreased to less than 15 seconds.
   
   YARN-9332 has added an interface to handle multiple hosts in one invocation to save time. But before upgrading to the newest Hadoop, we could construct the same tool in Spark to resolve this issue.
   
   ## How was this patch tested?
   
   UT and manually testing on a 5000 node cluster.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org