You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/06 21:28:13 UTC

[GitHub] [arrow] kszucs commented on a change in pull request #9891: ARROW-12112: [CI] Reduce footprint of conda-integration image

kszucs commented on a change in pull request #9891:
URL: https://github.com/apache/arrow/pull/9891#discussion_r608192333



##########
File path: ci/docker/conda-integration.dockerfile
##########
@@ -25,32 +25,28 @@ ARG node=14
 ARG jdk=8
 ARG go=1.15
 
+# Uninstall unused space-consuming packages
+# (XXX: it would be better not to install them, but they are used by other
+#  builds which are also based on conda-cpp)
+RUN conda uninstall -q clang llvmdev valgrind
+
+# Install Archery and integration dependencies
 COPY ci/conda_env_archery.yml /arrow/ci/
 RUN conda install -q \
         --file arrow/ci/conda_env_archery.yml \
         numpy \
         maven=${maven} \
         nodejs=${node} \
         openjdk=${jdk} && \
-    conda clean --all
+    conda clean --all --force-pkgs-dirs
 
-RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+# Install Rust with only the needed components
+# (rustfmt is needed for tonic-build to compile the protobuf definitions)
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --profile=minimal -y && \
+    $HOME/.cargo/bin/rustup component add rustfmt
 
 ENV GOROOT=/opt/go \
     GOBIN=/opt/go/bin \
     GOPATH=/go \
     PATH=/opt/go/bin:$PATH
 RUN wget -nv -O - https://dl.google.com/go/go${go}.linux-${arch}.tar.gz | tar -xzf - -C /opt
-
-ENV ARROW_BUILD_INTEGRATION=ON \

Review comment:
       > Hmm... what is the rationale for putting some environment variables here and some others in docker-compose, then?
   To prevent the need of duplicating the environment variables many times in the docker-compose.yml.
   For example we set the following in the `conda-python.dockerfile`:
   
   ```dockerfile
   ENV ARROW_PYTHON=ON \
       ARROW_BUILD_STATIC=OFF \
       ARROW_BUILD_TESTS=OFF \
       ARROW_BUILD_UTILITIES=OFF \
       ARROW_TENSORFLOW=ON \
       ARROW_USE_GLOG=OFF
   ```
   
   I we remove that then we'd need to repeat these variables to all of the child services:
   ```
         - conda-python-pandas
         - conda-python-dask
         - conda-python-hdfs
         - conda-python-jpype
         - conda-python-turbodbc
         - conda-python-kartothek
         - conda-python-spark
    ```
   > 
   > (also, I don't think any anything inherits conda-integration)
   
   Not at the moment, but it aligns with the rest of the dockerfiles.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org