You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "Yikun (via GitHub)" <gi...@apache.org> on 2023/09/07 01:36:22 UTC

[GitHub] [spark] Yikun commented on pull request #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

Yikun commented on PR #42842:
URL: https://github.com/apache/spark/pull/42842#issuecomment-1709336687

   LGTM, but I think you should know (or maybe already know):
   
   > Always combine RUN apt-get update with apt-get install in the same RUN statement.
   
   This best practices is for to help reduce redundat layer between `update` and `install`, but for spark infra, there are a tradeoff between `image size` download cost and `image rebuild` cost.
   
   Move frequently modified dependencies to front means: once the Dockerfile head front dependencies are modified, the following content will also be rebuild (due to infra cache invalidated). Finally, it impacts PRs which `change Dockerfile head lines` or `haven't rebase in time` CI will cost more time (about 1h).
   
   TL;DR: move stable dependencies to top, keep frequently modified dependencies to end.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org