You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2023/09/07 00:50:53 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

zhengruifeng opened a new pull request, #42842:
URL: https://github.com/apache/spark/pull/42842

   ### What changes were proposed in this pull request?
   follow the [Best practices for writing Dockerfiles](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#apt-get) :
   
   > Always combine RUN apt-get update with apt-get install in the same RUN statement.
   
   
   ### Why are the changes needed?
   1, to address https://github.com/apache/spark/pull/42253#discussion_r1280479837
   2, when I attempted to change the apt-get install in https://github.com/apache/spark/pull/41918, the behavior was confusing. By following the best practices, further changes should work immediately.
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO, dev-only
   
   
   ### How was this patch tested?
   CI
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   NO


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile
URL: https://github.com/apache/spark/pull/42842


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42842:
URL: https://github.com/apache/spark/pull/42842#issuecomment-1709349788

   > LGTM, but I think you should know (or maybe already know):
   > 
   > > Always combine RUN apt-get update with apt-get install in the same RUN statement.
   > 
   > This best practices is for to help reduce redundat layer between `update` and `install`, but for spark infra, there are a tradeoff between `image size` download cost and `image rebuild` cost.
   > 
   > Move frequently modified dependencies to front means: once the Dockerfile head front dependencies are modified, the following content will also be rebuild (due to infra cache invalidated). Finally, it impacts PRs which `change Dockerfile head lines` or `haven't rebase in time` CI will cost more time (about 1h).
   > 
   > TL;DR: move stable dependencies to top, keep frequently modified dependencies to end.
   
   thanks for the detailed explaination, since we rarely modify the apt-install (we frequently changes the pip-install), so I think it would be fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42842:
URL: https://github.com/apache/spark/pull/42842#issuecomment-1709308415

   CI link: https://github.com/zhengruifeng/spark/actions/runs/6093404505/job/16533661403


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on pull request #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on PR #42842:
URL: https://github.com/apache/spark/pull/42842#issuecomment-1709336687

   LGTM, but I think you should know (or maybe already know):
   
   > Always combine RUN apt-get update with apt-get install in the same RUN statement.
   
   This best practices is for to help reduce redundat layer between `update` and `install`, but for spark infra, there are a tradeoff between `image size` download cost and `image rebuild` cost.
   
   Move frequently modified dependencies to front means: once the Dockerfile head front dependencies are modified, the following content will also be rebuild (due to infra cache invalidated). Finally, it impacts PRs which `change Dockerfile head lines` or `haven't rebase in time` CI will cost more time (about 1h).
   
   TL;DR: move stable dependencies to top, keep frequently modified dependencies to end.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42842:
URL: https://github.com/apache/spark/pull/42842#issuecomment-1709308053

   cc @Yikun @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42842:
URL: https://github.com/apache/spark/pull/42842#issuecomment-1716836895

   thanks @Yikun and @HyukjinKwon , merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #42842:
URL: https://github.com/apache/spark/pull/42842#discussion_r1317993579


##########
dev/infra/Dockerfile:
##########
@@ -45,13 +62,11 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
 
-RUN $APT_INSTALL gnupg ca-certificates pandoc
 RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' >> /etc/apt/sources.list
 RUN gpg --keyserver hkps://keyserver.ubuntu.com --recv-key E298A3A825C0D65DFD57CBB651716619E084DAB9
 RUN gpg -a --export E084DAB9 | apt-key add -
 RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'
-RUN apt update
-RUN $APT_INSTALL r-base libcurl4-openssl-dev qpdf libssl-dev zlib1g-dev
+
 RUN Rscript -e "install.packages(c('knitr', 'markdown', 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', 'roxygen2', 'xml2'), repos='https://cloud.r-project.org/')"
 
 # See more in SPARK-39959, roxygen2 < 7.2.1

Review Comment:
   not, should be also moved in L27, let me update



##########
dev/infra/Dockerfile:
##########
@@ -45,13 +62,11 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
 
-RUN $APT_INSTALL gnupg ca-certificates pandoc
 RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' >> /etc/apt/sources.list
 RUN gpg --keyserver hkps://keyserver.ubuntu.com --recv-key E298A3A825C0D65DFD57CBB651716619E084DAB9
 RUN gpg -a --export E084DAB9 | apt-key add -
 RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'
-RUN apt update
-RUN $APT_INSTALL r-base libcurl4-openssl-dev qpdf libssl-dev zlib1g-dev
+
 RUN Rscript -e "install.packages(c('knitr', 'markdown', 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', 'roxygen2', 'xml2'), repos='https://cloud.r-project.org/')"
 
 # See more in SPARK-39959, roxygen2 < 7.2.1

Review Comment:
   no, should be also moved in L27, let me update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a diff in pull request #42842: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #42842:
URL: https://github.com/apache/spark/pull/42842#discussion_r1317987964


##########
dev/infra/Dockerfile:
##########
@@ -45,13 +62,11 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
 
-RUN $APT_INSTALL gnupg ca-certificates pandoc
 RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' >> /etc/apt/sources.list
 RUN gpg --keyserver hkps://keyserver.ubuntu.com --recv-key E298A3A825C0D65DFD57CBB651716619E084DAB9
 RUN gpg -a --export E084DAB9 | apt-key add -
 RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'
-RUN apt update
-RUN $APT_INSTALL r-base libcurl4-openssl-dev qpdf libssl-dev zlib1g-dev
+
 RUN Rscript -e "install.packages(c('knitr', 'markdown', 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', 'roxygen2', 'xml2'), repos='https://cloud.r-project.org/')"
 
 # See more in SPARK-39959, roxygen2 < 7.2.1

Review Comment:
   So L58 not change is intentional, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org