You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by po...@apache.org on 2020/06/09 09:04:45 UTC

[airflow] 32/36: Replaces cloud-provider CLIs in CI image with scripts running containers (#9129)

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v1-10-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit c0534d83ebbb5550f82842cca4c890432983fe49
Author: Jarek Potiuk <ja...@polidea.com>
AuthorDate: Thu Jun 4 19:12:09 2020 +0200

    Replaces cloud-provider CLIs in CI image with scripts running containers (#9129)
    
    The clis are replaced with scripts that will pull and run
    docker images when they are needed.
    
    Added Azure CLI as well.
    
    Closes: #8946 #8947 #8785
    (cherry picked from commit a39e9a352050f96dedadc39ab3d985065971c98c)
---
 BREEZE.rst                                        | 43 ++++++++++++++++--
 Dockerfile.ci                                     | 55 ++++++++++-------------
 TESTING.rst                                       | 10 ++---
 scripts/ci/_utils.sh                              |  4 +-
 scripts/ci/docker-compose/forward-credentials.yml |  6 +--
 scripts/ci/docker-compose/local-prod.yml          |  1 +
 scripts/ci/docker-compose/local.yml               |  2 +-
 7 files changed, 72 insertions(+), 49 deletions(-)

diff --git a/BREEZE.rst b/BREEZE.rst
index 7648b9d..aec975c 100644
--- a/BREEZE.rst
+++ b/BREEZE.rst
@@ -105,8 +105,9 @@ Docker in WSL 2
     and git pull the Airflow repo there.
 
 - **WSL 2 Memory Usage** :
-    WSL 2 can consume a lot of memory under the process name "Vmmem". To reclaim
-    the memory after development you can:
+    WSL 2 can consume a lot of memory under the process name "Vmmem". To reclaim the memory after
+    development you can:
+
       * On the Linux distro clear cached memory: ``sudo sysctl -w vm.drop_caches=3``
       * If no longer using Docker you can quit Docker Desktop
         (right click system try icon and select "Quit Docker Desktop")
@@ -195,7 +196,7 @@ On macOS, 2GB of RAM are available for your Docker containers by default, but mo
 On Windows WSL 2 expect the Linux Disto and Docker containers to use 7 - 8 GB of RAM.
 
 Airflow Directory Structure inside Docker
------------------------------------------
+=========================================
 
 When you are in the CI container, the following directories are used:
 
@@ -231,6 +232,42 @@ from your ``logs`` directory in the Airflow sources, so all logs created in the
 visible in the host as well. Every time you enter the container, the ``logs`` directory is
 cleaned so that logs do not accumulate.
 
+CLIs for cloud providers
+========================
+
+For development convenience we installed simple wrappers for the most common cloud providers CLIs. Those
+CLIs are not installed when you build or pull the image - they will be downloaded as docker images
+the first time you attempt to use them. It is downloaded and executed in your host's docker engine so once
+it is downloaded, it will stay until you remove the downloaded images from your host container.
+
+For each of those CLI credentials are taken (automatically) from the credentials you have defined in
+your ${HOME} directory on host.
+
+Those are currently installed CLIs (they are available as aliases to the docker commands):
+
++-----------------------+----------+-------------------------------------------------+-------------------+
+| Cloud Provider        | CLI tool | Docker image                                    | Configuration dir |
++=======================+==========+=================================================+===================+
+| Amazon Web Services   | aws      | amazon/aws-cli:latest                           | .aws              |
++-----------------------+----------+-------------------------------------------------+-------------------+
+| Microsoft Azure       | az       | mcr.microsoft.com/azure-cli:latest              | .azure            |
++-----------------------+----------+-------------------------------------------------+-------------------+
+| Google Cloud Platform | bq       | gcr.io/google.com/cloudsdktool/cloud-sdk:latest | .config/gcloud    |
+|                       +----------+-------------------------------------------------+-------------------+
+|                       | gcloud   | gcr.io/google.com/cloudsdktool/cloud-sdk:latest | .config/gcloud    |
+|                       +----------+-------------------------------------------------+-------------------+
+|                       | gsutil   | gcr.io/google.com/cloudsdktool/cloud-sdk:latest | .config/gcloud    |
++-----------------------+----------+-------------------------------------------------+-------------------+
+
+For each of the CLIs we have also an accompanying ``*-update`` alias (for example ``aws-update``) which
+will pull the latest image for the tool. Note that all Google Cloud Platform tools are served by one
+image and they are updated together.
+
+Also - in case you run several different Breeze containers in parallel (from different directories,
+with different versions) - they docker images for CLI Cloud Providers tools are shared so if you update it
+for one Breeze container, they will also get updated for all the other containers.
+
+
 Using the Airflow Breeze Environment
 =====================================
 
diff --git a/Dockerfile.ci b/Dockerfile.ci
index d15cff0..8035fa2 100644
--- a/Dockerfile.ci
+++ b/Dockerfile.ci
@@ -165,36 +165,6 @@ RUN echo "Pip version: ${PIP_VERSION}"
 
 RUN pip install --upgrade pip==${PIP_VERSION}
 
-# Install Google SDK
-ENV GCLOUD_HOME="/opt/gcloud" CLOUDSDK_PYTHON=python${PYTHON_MAJOR_MINOR_VERSION}
-
-RUN GCLOUD_VERSION="274.0.1" \
-    && GCOUD_URL="https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-${GCLOUD_VERSION}-linux-x86_64.tar.gz" \
-    && GCLOUD_TMP_FILE="/tmp/gcloud.tar.gz" \
-    && export CLOUDSDK_CORE_DISABLE_PROMPTS=1 \
-    && mkdir -p /opt/gcloud \
-    && curl "${GCOUD_URL}" -o "${GCLOUD_TMP_FILE}"\
-    && tar xzf "${GCLOUD_TMP_FILE}" --strip-components 1 -C "${GCLOUD_HOME}" \
-    && rm -rf "${GCLOUD_TMP_FILE}" \
-    && ${GCLOUD_HOME}/bin/gcloud components install beta \
-    && echo '. /opt/gcloud/completion.bash.inc' >> /etc/bash.bashrc
-
-ENV PATH="$PATH:${GCLOUD_HOME}/bin"
-
-# Install AWS CLI
-# Unfortunately, AWS does not provide a versioned bundle
-ENV AWS_HOME="/opt/aws"
-
-RUN AWS_TMP_DIR="/tmp/awscli/" \
-    && AWS_TMP_BUNDLE="${AWS_TMP_DIR}/awscli-bundle.zip" \
-    && AWS_URL="https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" \
-    && mkdir -pv "${AWS_TMP_DIR}" \
-    && curl "${AWS_URL}" -o "${AWS_TMP_BUNDLE}" \
-    && unzip "${AWS_TMP_BUNDLE}" -d "${AWS_TMP_DIR}" \
-    && "${AWS_TMP_DIR}/awscli-bundle/install" -i "${AWS_HOME}" -b /usr/local/bin/aws \
-    && echo "complete -C '${AWS_HOME}/bin/aws_completer' aws" >> /etc/bash.bashrc \
-    && rm -rf "${AWS_TMP_DIR}"
-
 ARG HOME=/root
 ENV HOME=${HOME}
 
@@ -206,8 +176,8 @@ ENV AIRFLOW_SOURCES=${AIRFLOW_SOURCES}
 
 WORKDIR ${AIRFLOW_SOURCES}
 
-RUN mkdir -pv ${AIRFLOW_HOME} \
-    mkdir -pv ${AIRFLOW_HOME}/dags \
+RUN mkdir -pv ${AIRFLOW_HOME} && \
+    mkdir -pv ${AIRFLOW_HOME}/dags && \
     mkdir -pv ${AIRFLOW_HOME}/logs
 
 # Increase the value here to force reinstalling Apache Airflow pip dependencies
@@ -337,6 +307,27 @@ RUN if [[ -n "${ADDITIONAL_PYTHON_DEPS}" ]]; then \
         pip install ${ADDITIONAL_PYTHON_DEPS}; \
     fi
 
+RUN \
+    AWSCLI_IMAGE="amazon/aws-cli:latest" && \
+    AZURECLI_IMAGE="mcr.microsoft.com/azure-cli:latest" && \
+    GCLOUD_IMAGE="gcr.io/google.com/cloudsdktool/cloud-sdk:latest" && \
+    echo "docker run --rm -it -v \${HOST_HOME}/.aws:/root/.aws ${AWSCLI_IMAGE} \"\$@\"" \
+        > /usr/bin/aws && \
+    echo "docker pull ${AWSCLI_IMAGE}" > /usr/bin/aws-update && \
+    echo "docker run --rm -it -v \${HOST_HOME}/.azure:/root/.azure ${AZURECLI_IMAGE} \"\$@\"" \
+        > /usr/bin/az && \
+    echo "docker pull ${AZURECLI_IMAGE}" > /usr/bin/az-update && \
+    echo "docker run --rm -it -v \${HOST_HOME}/.config:/root/.config ${GCLOUD_IMAGE} bq \"\$@\"" \
+        > /usr/bin/bq && \
+    echo "docker pull ${GCLOUD_IMAGE}" > /usr/bin/bq-update && \
+    echo "docker run --rm -it -v \${HOST_HOME}/.config:/root/.config ${GCLOUD_IMAGE} gcloud \"\$@\"" \
+        > /usr/bin/gcloud && \
+    echo "docker pull ${GCLOUD_IMAGE}" > /usr/bin/gcloud-update && \
+    echo "docker run --rm -it -v \${HOST_HOME}/.config:/root/.config ${GCLOUD_IMAGE} gsutil \"\$@\"" \
+        > /usr/bin/gsutil && \
+    echo "docker pull ${GCLOUD_IMAGE}" > /usr/bin/gsutil-update && \
+    chmod a+x /usr/bin/aws /usr/bin/az /usr/bin/bq /usr/bin/gcloud /usr/bin/gsutil
+
 WORKDIR ${AIRFLOW_SOURCES}
 
 ENV PATH="${HOME}:${PATH}"
diff --git a/TESTING.rst b/TESTING.rst
index fc8a046..ea6884e 100644
--- a/TESTING.rst
+++ b/TESTING.rst
@@ -621,12 +621,10 @@ credentials stored in your ``home`` directory. Use this feature with care as it
 visible to anything that you have installed inside the Docker container.
 
 Currently forwarded credentials are:
-  * all credentials stored in ``${HOME}/.config`` (for example, GCP credentials)
-  * credentials stored in ``${HOME}/.gsutil`` for ``gsutil`` tool from GCS
-  * credentials stored in ``${HOME}/.aws``, ``${HOME}/.boto``, and ``${HOME}/.s3`` (for AWS authentication)
-  * credentials stored in ``${HOME}/.docker`` for docker
-  * credentials stored in ``${HOME}/.kube`` for kubectl
-
+  * credentials stored in ``${HOME}/.aws`` for the aws Amazon Web Services client
+  * credentials stored in ``${HOME}/.azure`` for the az Microsoft Azure client
+  * credentials stored in ``${HOME}/.config`` for gcloud Google Cloud Platform client (among others)
+  * credentials stored in ``${HOME}/.docker`` for docker client
 
 Adding a New System Test
 --------------------------
diff --git a/scripts/ci/_utils.sh b/scripts/ci/_utils.sh
index d0b8251..3bb6ee2 100644
--- a/scripts/ci/_utils.sh
+++ b/scripts/ci/_utils.sh
@@ -257,7 +257,6 @@ function generate_local_mounts_list {
         "$prefix".flake8:/opt/airflow/.flake8:cached
         "$prefix".github:/opt/airflow/.github:cached
         "$prefix".inputrc:/root/.inputrc:cached
-        "$prefix".kube:/root/.kube:cached
         "$prefix".rat-excludes:/opt/airflow/.rat-excludes:cached
         "$prefix"CHANGELOG.txt:/opt/airflow/CHANGELOG.txt:cached
         "$prefix"LICENSE:/opt/airflow/LICENSE:cached
@@ -754,7 +753,7 @@ function get_remote_image_info() {
 # Note that this only matters if you have any of the important files changed since the last build
 # of your image such as Dockerfile.ci, setup.py etc.
 #
-MAGIC_CUT_OFF_NUMBER_OF_LAYERS=34
+MAGIC_CUT_OFF_NUMBER_OF_LAYERS=41
 
 # Compares layers from both remote and local image and set FORCE_PULL_IMAGES to true in case
 # More than the last NN layers are different.
@@ -1805,7 +1804,6 @@ function delete_cluster() {
     echo
     echo "Deleted cluster ${KIND_CLUSTER_NAME}"
     echo
-    rm -rf "${HOME}/.kube/*"
 }
 
 function perform_kind_cluster_operation() {
diff --git a/scripts/ci/docker-compose/forward-credentials.yml b/scripts/ci/docker-compose/forward-credentials.yml
index e2d5f75..875b1ce 100644
--- a/scripts/ci/docker-compose/forward-credentials.yml
+++ b/scripts/ci/docker-compose/forward-credentials.yml
@@ -23,9 +23,7 @@ services:
     # To inside docker. Use with care - your credentials will be available to
     # Everything you install in Docker
     volumes:
+      - ${HOME}/.aws:/root/.aws:cached
+      - ${HOME}/.azure:/root/.azure:cached
       - ${HOME}/.config:/root/.config:cached
-      - ${HOME}/.boto:/root/.boto:cached
       - ${HOME}/.docker:/root/.docker:cached
-      - ${HOME}/.gsutil:/root/.gsutil:cached
-      - ${HOME}/.kube:/root/.kube:cached
-      - ${HOME}/.s3:/root/.s3:cached
diff --git a/scripts/ci/docker-compose/local-prod.yml b/scripts/ci/docker-compose/local-prod.yml
index 6092323..ae8317d 100644
--- a/scripts/ci/docker-compose/local-prod.yml
+++ b/scripts/ci/docker-compose/local-prod.yml
@@ -39,4 +39,5 @@ services:
     environment:
       - HOST_USER_ID
       - HOST_GROUP_ID
+      - HOST_HOME=${HOME}
       - PYTHONDONTWRITEBYTECODE
diff --git a/scripts/ci/docker-compose/local.yml b/scripts/ci/docker-compose/local.yml
index 42609e6..793a322 100644
--- a/scripts/ci/docker-compose/local.yml
+++ b/scripts/ci/docker-compose/local.yml
@@ -32,7 +32,6 @@ services:
       - ../../../.flake8:/opt/airflow/.flake8:cached
       - ../../../.github:/opt/airflow/.github:cached
       - ../../../.inputrc:/root/.inputrc:cached
-      - ../../../.kube:/root/.kube:cached
       - ../../../.rat-excludes:/opt/airflow/.rat-excludes:cached
       - ../../../CHANGELOG.txt:/opt/airflow/CHANGELOG.txt:cached
       - ../../../LICENSE:/opt/airflow/LICENSE:cached
@@ -60,6 +59,7 @@ services:
     environment:
       - HOST_USER_ID
       - HOST_GROUP_ID
+      - HOST_HOME=${HOME}
       - PYTHONDONTWRITEBYTECODE
     ports:
       - "${WEBSERVER_HOST_PORT}:8080"