You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/09 03:50:20 UTC

[GitHub] [arrow-datafusion] Ted-Jiang commented on a change in pull request #1783: Enable periodic cleanup of work_dir directories in ballista executor

Ted-Jiang commented on a change in pull request #1783:
URL: https://github.com/apache/arrow-datafusion/pull/1783#discussion_r802247304



##########
File path: ballista/rust/executor/src/main.rs
##########
@@ -112,6 +116,21 @@ async fn main() -> Result<()> {
         .context("Could not connect to scheduler")?;
 
     let scheduler_policy = opt.task_scheduling_policy;
+    let cleanup_ttl = opt.executor_cleanup_ttl;

Review comment:
       @alamb Thanks for your advice 😊!
   IMHP, if one job has 3 stage, stage2 read stage1 input then delete the file, but stage2 task fail,
   In ballista, scheduler will start a task to reload stage1 input.  I think using  `NamedTempFile` will cause some trouble and complexity.  
   we need keep the file for task-recovery and stage retry (like spark). So i decide if all the files under job_dir not modified in TTL we can safely delete it.
   If i am not right,  Please correct me 🙈




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org