You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/10/29 13:42:55 UTC

[GitHub] [spark] tgravescs commented on issue #25962: [SPARK-29285][Shuffle] Temporary shuffle files should be able to handle disk failures

tgravescs commented on issue #25962: [SPARK-29285][Shuffle] Temporary shuffle files should be able to handle disk failures
URL: https://github.com/apache/spark/pull/25962#issuecomment-547425618
 
 
   > In our 2000 nodes Hadoop cluster, which with 12 disks/node, this approach reduce the number of that exception a lot.
   
   So the only time Hadoop should show you this bad disk is if yarn doesn't detect it or it if goes bad during the running of the container.  YARN has a specific feature to detect bad disks and will not give that to the container if they are bad.  So in your case are you executors very long running?  Are you using the yarn feature?  
   I'm not necessarily against this idea as disks can go bad while executors are running but just want to check to see how much this is really happening.  What happens when we go to rename/merge the temp file to final location?  the shuffle file name is static so should hash to same dir every time unless we are adding different dir.  I can't remember that code off the top of my head.  With the external shuffle service, the application registers what directories its using such that the external shuffle service can use those to find the files again, I'm wondering if the temp ones might work but then fail later on the static names.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org