You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/17 18:56:35 UTC

[GitHub] [spark] JoshRosen commented on pull request #37522: [SPARK-40083][SHUFFLE] Add shuffle index cache timebased expire policy

JoshRosen commented on PR #37522:
URL: https://github.com/apache/spark/pull/37522#issuecomment-1218380759

   If the primary intent is to clean up cache entries associated with finished Spark applications, is there a way that we could do this more directly? 
   
   In `ExternalShuffleBlockResolver.applicationRemoved` it looks like we have some logic that can start an asynchronous background task to clean up the shuffle files. Maybe we could somehow remove the cache entries in a similar manner (e.g. either by doing this during the file deletion, taking advantage of the fact that the cache keys are filenames, or by iterating over the cache's keys and deleting files in the deleted application's directory (we'd have to check the iteration semantic to make sure this is safe)).
   
   Doing this cleanup at application-removal time would avoid the need to have a time-based config (which could be hard to tune appropriately).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org