You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2024/01/17 09:10:59 UTC

[PR] [SPARK-46745][INFRA] Purge pip cache in dockerfile [spark]

zhengruifeng opened a new pull request, #44768:
URL: https://github.com/apache/spark/pull/44768

   ### What changes were proposed in this pull request?
   Purge pip cache in dockerfile
   
   
   ### Why are the changes needed?
   to save 4G disk space:
   
   before
   
   https://github.com/zhengruifeng/spark/actions/runs/7541725028/job/20530432798
   
   ```
   #45 [39/39] RUN df -h
   #45 0.090 Filesystem      Size  Used Avail Use% Mounted on
   #45 0.090 overlay          84G   70G   15G  83% /
   #45 0.090 tmpfs            64M     0   64M   0% /dev
   #45 0.090 shm              64M     0   64M   0% /dev/shm
   #45 0.090 /dev/root        84G   70G   15G  83% /etc/resolv.conf
   #45 0.090 tmpfs           7.9G     0  7.9G   0% /proc/acpi
   #45 0.090 tmpfs           7.9G     0  7.9G   0% /sys/firmware
   #45 0.090 tmpfs           7.9G     0  7.9G   0% /proc/scsi
   #45 DONE 2.0s
   ```
   
   after
   
   https://github.com/zhengruifeng/spark/actions/runs/7549204209/job/20552796796
   
   ```
   #48 [42/43] RUN python3.12 -m pip cache purge
   #48 0.670 Files removed: 392
   #48 DONE 0.7s
   
   #49 [43/43] RUN df -h
   #49 0.075 Filesystem      Size  Used Avail Use% Mounted on
   #49 0.075 overlay          84G   65G   19G  79% /
   #49 0.075 tmpfs            64M     0   64M   0% /dev
   #49 0.075 shm              64M     0   64M   0% /dev/shm
   #49 0.075 /dev/root        84G   65G   19G  79% /etc/resolv.conf
   #49 0.075 tmpfs           7.9G     0  7.9G   0% /proc/acpi
   #49 0.075 tmpfs           7.9G     0  7.9G   0% /sys/firmware
   #49 0.075 tmpfs           7.9G     0  7.9G   0% /proc/scsi
   ```
   ### Does this PR introduce _any_ user-facing change?
   no, infra-only
   
   ### How was this patch tested?
   ci
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46745][INFRA] Purge pip cache in dockerfile [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #44768:
URL: https://github.com/apache/spark/pull/44768#issuecomment-1897745898

   > Yes, this will reduce the final status of filesystems.
   > 
   > At the same time, this increases the number of layers, doesn't it?
   > 
   > I'm curious about the download size, @zhengruifeng . Could you check the result of
   > 
   > ```
   > $ docker images
   > ```
   
   The size of current PR is 11.1GB
   
   ```
   ruifeng.zheng@xxx:~/spark$ docker images | grep cleanup
   test_cleanup                                                                                                      0.1              97b0f1ca0bb6   43 seconds ago   11.1GB
   ruifeng.zheng@xxx:~/spark$
   ```
   
   
   probably we can combine those pip commands to control the number of layers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46745][INFRA] Purge pip cache in dockerfile [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #44768:
URL: https://github.com/apache/spark/pull/44768#issuecomment-1897762527

   thanks @dongjoon-hyun and @HyukjinKwon for reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46745][INFRA] Purge pip cache in dockerfile [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #44768: [SPARK-46745][INFRA] Purge pip cache in dockerfile
URL: https://github.com/apache/spark/pull/44768


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46745][INFRA] Purge pip cache in dockerfile [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #44768:
URL: https://github.com/apache/spark/pull/44768#issuecomment-1897761945

   Merged to master. Thank you, @zhengruifeng and @HyukjinKwon .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46745][INFRA] Purge pip cache in dockerfile [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #44768:
URL: https://github.com/apache/spark/pull/44768#discussion_r1455031288


##########
.github/workflows/build_and_test.yml:
##########
@@ -417,10 +417,6 @@ jobs:
     - name: Free up disk space
       shell: 'script -q -e -c "bash {0}"'
       run: |
-        if [[ "$MODULES_TO_TEST" != *"pyspark-ml"* ]] && [[ "$BRANCH" != "branch-3.5" ]]; then

Review Comment:
   here doesn't make much sense any more, since we install those libraries for each python version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org