You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/12/01 01:21:45 UTC

[GitHub] [beam] y1chi commented on pull request #13399: [BEAM-11312] Log cloud build url and enable kaniko cache in sdk_conta…

y1chi commented on pull request #13399:
URL: https://github.com/apache/beam/pull/13399#issuecomment-736154645


   
   
   
   > Thanks. It seems that caching may improve the startup time, and be useful for users who frequently launch the same pipeline. However I think caching may result in a difference in behavior. Questions:
   > 
   > 1. Is it possible that caching will result in a stale image that users will perceive as undesirable and the behavior will be difficult to debug to users or support folks? For example, if a user pipeline depends on a latest version of a dependency X in pypi. Perhaps a dependency they control. They have a pipeline with a setup.py that has an open install_requires bound dep>=1.0.0 < 2. They run the pipeline, then push dependency to pypi and run the pipeline again, expecting a change in behavior. Kaniko will not rebuild the image in this case, right? What are your thoughts on that?
   
   I think kaniko cache works the same way as docker layer cache, that is to say, if the locally downloaded artifacts changed(or requirements.txt, setup.py changed) it will actually change the COPY step in the prebuilding workflow. There will be no valid cache layer since the artifacts copy step and a new image will be rebuilt. (also verified through my own experiment with changing requirements.txt)
   
   > 2. During runtime with prebuilding workflow enabled, how visible is it to the user that the cached layers are reused and not rebuilt?
   There will be log entries "No cached layer found for cmd ..." in the cloud build log.
   
   > 3. I think we should document the prebuilding feature in Beam docs, and reflect the caching behavior and associated TTLs. What is a plan for that?
   I do believe Emily will be working on documenting this as part of the custom container next quarter and I can also help.
   
   > 4. Would customizing the TTL or adding a no-cache option make sense? We are using default 2 weeks TTL, right? See: https://cloud.google.com/cloud-build/docs/kaniko-cache#configuring_the_cache_expiration_time.
   I think default value makes sense, I didn't want to provide too many knobs to users since it may become more confusing or rarely used, but we can always provide additional flags for more advanced user to control it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org