You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "Yikun (via GitHub)" <gi...@apache.org> on 2023/05/04 07:21:11 UTC

[GitHub] [spark-docker] Yikun opened a new pull request, #36: [SPARK-43365] Refactor Dockerfile and workflow based on base image

Yikun opened a new pull request, #36:
URL: https://github.com/apache/spark-docker/pull/36

   ### What changes were proposed in this pull request?
   This PR changes Dockerfile and workflow based on base image to save space by sharing layers by having one image from another.
   
   [1] https://github.com/docker-library/official-images/pull/13089?notification_referrer_id=NT_kwDOABp-orI0MzIwMzMwNzY5OjE3MzYzNTQ#issuecomment-1533540388
   
   ### Why are the changes needed?
   Address DOI comments, and also to save space by sharing layers by having one image from another.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   CI passed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] zhengruifeng commented on pull request #36: [SPARK-43365] Refactor Dockerfile and workflow based on base image

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #36:
URL: https://github.com/apache/spark-docker/pull/36#issuecomment-1536963419

   Late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on pull request #36: [SPARK-43365] Refactor Dockerfile and workflow based on base image

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on PR #36:
URL: https://github.com/apache/spark-docker/pull/36#issuecomment-1536958797

   @HyukjinKwon Thanks! Merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #36: [SPARK-43365] Refactor Dockerfile and workflow based on base image

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #36:
URL: https://github.com/apache/spark-docker/pull/36#discussion_r1184669388


##########
3.4.0/scala2.12-java11-ubuntu/Dockerfile:
##########
@@ -64,6 +65,9 @@ RUN set -ex; \
     mv examples /opt/spark/; \
     mv kubernetes/tests /opt/spark/; \
     mv data /opt/spark/; \
+    mv python/pyspark /opt/spark/python/pyspark/; \
+    mv python/lib /opt/spark/python/lib/; \
+    mv R /opt/spark/; \

Review Comment:
   > download/extract spark (maybe keeping python and R files too? they seem relatively small compared to the rest)
   
   This the key change:
   
   ```
   2.0M	./lib
   11M	./pyspark
   5.6M	./R
   ```
   
   Compare to complete docker image size (500-600MB), I think it can be accepted, otherwise we have to download/extract/move these step into Pyspark/R.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun closed pull request #36: [SPARK-43365] Refactor Dockerfile and workflow based on base image

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun closed pull request #36: [SPARK-43365] Refactor Dockerfile and workflow based on base image
URL: https://github.com/apache/spark-docker/pull/36


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #36: [SPARK-43365] Refactor Dockerfile and workflow based on base image

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #36:
URL: https://github.com/apache/spark-docker/pull/36#discussion_r1184669388


##########
3.4.0/scala2.12-java11-ubuntu/Dockerfile:
##########
@@ -64,6 +65,9 @@ RUN set -ex; \
     mv examples /opt/spark/; \
     mv kubernetes/tests /opt/spark/; \
     mv data /opt/spark/; \
+    mv python/pyspark /opt/spark/python/pyspark/; \
+    mv python/lib /opt/spark/python/lib/; \
+    mv R /opt/spark/; \

Review Comment:
   > download/extract spark (maybe keeping python and R files too? they seem relatively small compared to the rest)
   
   This the key change:
   
   ```
   2.0M	./lib
   11M	./pyspark
   5.6M	./R
   ```
   
   Compare to complete docker image size (500-600MB), I think it can be accepted, otherwise we have to download/extract/move these step into Pyspark/R dockerfiles.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on pull request #36: [SPARK-43365] Refactor Dockerfile and workflow based on base image

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on PR #36:
URL: https://github.com/apache/spark-docker/pull/36#issuecomment-1534276571

   cc @HyukjinKwon @zhengruifeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #36: [SPARK-43365] Refactor Dockerfile and workflow based on base image

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #36:
URL: https://github.com/apache/spark-docker/pull/36#discussion_r1184669388


##########
3.4.0/scala2.12-java11-ubuntu/Dockerfile:
##########
@@ -64,6 +65,9 @@ RUN set -ex; \
     mv examples /opt/spark/; \
     mv kubernetes/tests /opt/spark/; \
     mv data /opt/spark/; \
+    mv python/pyspark /opt/spark/python/pyspark/; \
+    mv python/lib /opt/spark/python/lib/; \
+    mv R /opt/spark/; \

Review Comment:
   > download/extract spark (maybe keeping python and R files too? they seem relatively small compared to the rest)
   
   This the key change:
   
   ```
   2.0M	./lib
   11M	./pyspark
   5.6M	./R
   ```
   
   Compare to complete docker image size (500-600MB), I think it can be accepted, otherwise we have to keep download/extract/move these step into Pyspark/R dockerfiles.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org