You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 22:09:40 UTC

[GitHub] [beam] kennknowles opened a new issue, #19076: Less wasteful ArtifactStagingService

kennknowles opened a new issue, #19076:
URL: https://github.com/apache/beam/issues/19076

   [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java](https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java) is the main implementation of ArtifactStagingService.
   
   It stages artifacts into a directory; and in practice the passed staging session token is such that the directory is different for every job. This leads to 2 issues:
    * It doesn't get cleaned up when the job finishes or even when the JobService shuts down, so we have disk space leaks if running a lot of jobs (e.g. a suite of ValidatesRunner tests)
    * We repeatedly re-stage the same artifacts. Instead, ideally, we should identify that some artifacts don't need to be staged - based on knowing their md5. The artifact staging protocol has rudimentary support for this but may need to be modified.
   
   CC: [~angoenka]
   
   Imported from Jira [BEAM-4778](https://issues.apache.org/jira/browse/BEAM-4778). Original Jira may contain additional context.
   Reported by: jkff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org