You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/16 12:14:35 UTC

[GitHub] [spark] xiaoyang1129 commented on pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk block manager files following executor exits on a Standalone cluster

xiaoyang1129 commented on pull request #21390:
URL: https://github.com/apache/spark/pull/21390#issuecomment-693365678


   > Yeah, this is only concerned with non-shuffle files which are located in the block manager temp directories (e.g. large sorter spill files).
   > 
   > There is a related issue where shuffle files can be leaked indefinitely following executor death because the external shuffle service is never directly told that shuffles are safe to remove (the context cleaner sends RPCs to executors and executors clean up their own shuffle files). That issue is substantially harder to fix, though, since it likely requires protocol changes to the shuffle service or an inversion-of-control where the shuffle service can periodically ask the driver "do any of these shuffle IDs correspond to cleaned shuffles?". As a result, I think the strategy here is to decompose that disk leak solution into two separate sets of fixes, where this patch is concerned with the simpler case of non-shuffle files (we'll defer the more complex case to a separate PR because it requires a lot more design).
   
   Any news about shuffle files to be cleaned up when executor abnormally terminated? We encountered the disk leakage with   long-lived driver.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org