You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by po...@apache.org on 2020/06/09 09:04:45 UTC
[airflow] 32/36: Replaces cloud-provider CLIs in CI image with
scripts running containers (#9129)
This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch v1-10-test
in repository https://gitbox.apache.org/repos/asf/airflow.git
commit c0534d83ebbb5550f82842cca4c890432983fe49
Author: Jarek Potiuk <ja...@polidea.com>
AuthorDate: Thu Jun 4 19:12:09 2020 +0200
Replaces cloud-provider CLIs in CI image with scripts running containers (#9129)
The clis are replaced with scripts that will pull and run
docker images when they are needed.
Added Azure CLI as well.
Closes: #8946 #8947 #8785
(cherry picked from commit a39e9a352050f96dedadc39ab3d985065971c98c)
---
BREEZE.rst | 43 ++++++++++++++++--
Dockerfile.ci | 55 ++++++++++-------------
TESTING.rst | 10 ++---
scripts/ci/_utils.sh | 4 +-
scripts/ci/docker-compose/forward-credentials.yml | 6 +--
scripts/ci/docker-compose/local-prod.yml | 1 +
scripts/ci/docker-compose/local.yml | 2 +-
7 files changed, 72 insertions(+), 49 deletions(-)
diff --git a/BREEZE.rst b/BREEZE.rst
index 7648b9d..aec975c 100644
--- a/BREEZE.rst
+++ b/BREEZE.rst
@@ -105,8 +105,9 @@ Docker in WSL 2
and git pull the Airflow repo there.
- **WSL 2 Memory Usage** :
- WSL 2 can consume a lot of memory under the process name "Vmmem". To reclaim
- the memory after development you can:
+ WSL 2 can consume a lot of memory under the process name "Vmmem". To reclaim the memory after
+ development you can:
+
* On the Linux distro clear cached memory: ``sudo sysctl -w vm.drop_caches=3``
* If no longer using Docker you can quit Docker Desktop
(right click system try icon and select "Quit Docker Desktop")
@@ -195,7 +196,7 @@ On macOS, 2GB of RAM are available for your Docker containers by default, but mo
On Windows WSL 2 expect the Linux Disto and Docker containers to use 7 - 8 GB of RAM.
Airflow Directory Structure inside Docker
------------------------------------------
+=========================================
When you are in the CI container, the following directories are used:
@@ -231,6 +232,42 @@ from your ``logs`` directory in the Airflow sources, so all logs created in the
visible in the host as well. Every time you enter the container, the ``logs`` directory is
cleaned so that logs do not accumulate.
+CLIs for cloud providers
+========================
+
+For development convenience we installed simple wrappers for the most common cloud providers CLIs. Those
+CLIs are not installed when you build or pull the image - they will be downloaded as docker images
+the first time you attempt to use them. It is downloaded and executed in your host's docker engine so once
+it is downloaded, it will stay until you remove the downloaded images from your host container.
+
+For each of those CLI credentials are taken (automatically) from the credentials you have defined in
+your ${HOME} directory on host.
+
+Those are currently installed CLIs (they are available as aliases to the docker commands):
+
++-----------------------+----------+-------------------------------------------------+-------------------+
+| Cloud Provider | CLI tool | Docker image | Configuration dir |
++=======================+==========+=================================================+===================+
+| Amazon Web Services | aws | amazon/aws-cli:latest | .aws |
++-----------------------+----------+-------------------------------------------------+-------------------+
+| Microsoft Azure | az | mcr.microsoft.com/azure-cli:latest | .azure |
++-----------------------+----------+-------------------------------------------------+-------------------+
+| Google Cloud Platform | bq | gcr.io/google.com/cloudsdktool/cloud-sdk:latest | .config/gcloud |
+| +----------+-------------------------------------------------+-------------------+
+| | gcloud | gcr.io/google.com/cloudsdktool/cloud-sdk:latest | .config/gcloud |
+| +----------+-------------------------------------------------+-------------------+
+| | gsutil | gcr.io/google.com/cloudsdktool/cloud-sdk:latest | .config/gcloud |
++-----------------------+----------+-------------------------------------------------+-------------------+
+
+For each of the CLIs we have also an accompanying ``*-update`` alias (for example ``aws-update``) which
+will pull the latest image for the tool. Note that all Google Cloud Platform tools are served by one
+image and they are updated together.
+
+Also - in case you run several different Breeze containers in parallel (from different directories,
+with different versions) - they docker images for CLI Cloud Providers tools are shared so if you update it
+for one Breeze container, they will also get updated for all the other containers.
+
+
Using the Airflow Breeze Environment
=====================================
diff --git a/Dockerfile.ci b/Dockerfile.ci
index d15cff0..8035fa2 100644
--- a/Dockerfile.ci
+++ b/Dockerfile.ci
@@ -165,36 +165,6 @@ RUN echo "Pip version: ${PIP_VERSION}"
RUN pip install --upgrade pip==${PIP_VERSION}
-# Install Google SDK
-ENV GCLOUD_HOME="/opt/gcloud" CLOUDSDK_PYTHON=python${PYTHON_MAJOR_MINOR_VERSION}
-
-RUN GCLOUD_VERSION="274.0.1" \
- && GCOUD_URL="https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-${GCLOUD_VERSION}-linux-x86_64.tar.gz" \
- && GCLOUD_TMP_FILE="/tmp/gcloud.tar.gz" \
- && export CLOUDSDK_CORE_DISABLE_PROMPTS=1 \
- && mkdir -p /opt/gcloud \
- && curl "${GCOUD_URL}" -o "${GCLOUD_TMP_FILE}"\
- && tar xzf "${GCLOUD_TMP_FILE}" --strip-components 1 -C "${GCLOUD_HOME}" \
- && rm -rf "${GCLOUD_TMP_FILE}" \
- && ${GCLOUD_HOME}/bin/gcloud components install beta \
- && echo '. /opt/gcloud/completion.bash.inc' >> /etc/bash.bashrc
-
-ENV PATH="$PATH:${GCLOUD_HOME}/bin"
-
-# Install AWS CLI
-# Unfortunately, AWS does not provide a versioned bundle
-ENV AWS_HOME="/opt/aws"
-
-RUN AWS_TMP_DIR="/tmp/awscli/" \
- && AWS_TMP_BUNDLE="${AWS_TMP_DIR}/awscli-bundle.zip" \
- && AWS_URL="https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" \
- && mkdir -pv "${AWS_TMP_DIR}" \
- && curl "${AWS_URL}" -o "${AWS_TMP_BUNDLE}" \
- && unzip "${AWS_TMP_BUNDLE}" -d "${AWS_TMP_DIR}" \
- && "${AWS_TMP_DIR}/awscli-bundle/install" -i "${AWS_HOME}" -b /usr/local/bin/aws \
- && echo "complete -C '${AWS_HOME}/bin/aws_completer' aws" >> /etc/bash.bashrc \
- && rm -rf "${AWS_TMP_DIR}"
-
ARG HOME=/root
ENV HOME=${HOME}
@@ -206,8 +176,8 @@ ENV AIRFLOW_SOURCES=${AIRFLOW_SOURCES}
WORKDIR ${AIRFLOW_SOURCES}
-RUN mkdir -pv ${AIRFLOW_HOME} \
- mkdir -pv ${AIRFLOW_HOME}/dags \
+RUN mkdir -pv ${AIRFLOW_HOME} && \
+ mkdir -pv ${AIRFLOW_HOME}/dags && \
mkdir -pv ${AIRFLOW_HOME}/logs
# Increase the value here to force reinstalling Apache Airflow pip dependencies
@@ -337,6 +307,27 @@ RUN if [[ -n "${ADDITIONAL_PYTHON_DEPS}" ]]; then \
pip install ${ADDITIONAL_PYTHON_DEPS}; \
fi
+RUN \
+ AWSCLI_IMAGE="amazon/aws-cli:latest" && \
+ AZURECLI_IMAGE="mcr.microsoft.com/azure-cli:latest" && \
+ GCLOUD_IMAGE="gcr.io/google.com/cloudsdktool/cloud-sdk:latest" && \
+ echo "docker run --rm -it -v \${HOST_HOME}/.aws:/root/.aws ${AWSCLI_IMAGE} \"\$@\"" \
+ > /usr/bin/aws && \
+ echo "docker pull ${AWSCLI_IMAGE}" > /usr/bin/aws-update && \
+ echo "docker run --rm -it -v \${HOST_HOME}/.azure:/root/.azure ${AZURECLI_IMAGE} \"\$@\"" \
+ > /usr/bin/az && \
+ echo "docker pull ${AZURECLI_IMAGE}" > /usr/bin/az-update && \
+ echo "docker run --rm -it -v \${HOST_HOME}/.config:/root/.config ${GCLOUD_IMAGE} bq \"\$@\"" \
+ > /usr/bin/bq && \
+ echo "docker pull ${GCLOUD_IMAGE}" > /usr/bin/bq-update && \
+ echo "docker run --rm -it -v \${HOST_HOME}/.config:/root/.config ${GCLOUD_IMAGE} gcloud \"\$@\"" \
+ > /usr/bin/gcloud && \
+ echo "docker pull ${GCLOUD_IMAGE}" > /usr/bin/gcloud-update && \
+ echo "docker run --rm -it -v \${HOST_HOME}/.config:/root/.config ${GCLOUD_IMAGE} gsutil \"\$@\"" \
+ > /usr/bin/gsutil && \
+ echo "docker pull ${GCLOUD_IMAGE}" > /usr/bin/gsutil-update && \
+ chmod a+x /usr/bin/aws /usr/bin/az /usr/bin/bq /usr/bin/gcloud /usr/bin/gsutil
+
WORKDIR ${AIRFLOW_SOURCES}
ENV PATH="${HOME}:${PATH}"
diff --git a/TESTING.rst b/TESTING.rst
index fc8a046..ea6884e 100644
--- a/TESTING.rst
+++ b/TESTING.rst
@@ -621,12 +621,10 @@ credentials stored in your ``home`` directory. Use this feature with care as it
visible to anything that you have installed inside the Docker container.
Currently forwarded credentials are:
- * all credentials stored in ``${HOME}/.config`` (for example, GCP credentials)
- * credentials stored in ``${HOME}/.gsutil`` for ``gsutil`` tool from GCS
- * credentials stored in ``${HOME}/.aws``, ``${HOME}/.boto``, and ``${HOME}/.s3`` (for AWS authentication)
- * credentials stored in ``${HOME}/.docker`` for docker
- * credentials stored in ``${HOME}/.kube`` for kubectl
-
+ * credentials stored in ``${HOME}/.aws`` for the aws Amazon Web Services client
+ * credentials stored in ``${HOME}/.azure`` for the az Microsoft Azure client
+ * credentials stored in ``${HOME}/.config`` for gcloud Google Cloud Platform client (among others)
+ * credentials stored in ``${HOME}/.docker`` for docker client
Adding a New System Test
--------------------------
diff --git a/scripts/ci/_utils.sh b/scripts/ci/_utils.sh
index d0b8251..3bb6ee2 100644
--- a/scripts/ci/_utils.sh
+++ b/scripts/ci/_utils.sh
@@ -257,7 +257,6 @@ function generate_local_mounts_list {
"$prefix".flake8:/opt/airflow/.flake8:cached
"$prefix".github:/opt/airflow/.github:cached
"$prefix".inputrc:/root/.inputrc:cached
- "$prefix".kube:/root/.kube:cached
"$prefix".rat-excludes:/opt/airflow/.rat-excludes:cached
"$prefix"CHANGELOG.txt:/opt/airflow/CHANGELOG.txt:cached
"$prefix"LICENSE:/opt/airflow/LICENSE:cached
@@ -754,7 +753,7 @@ function get_remote_image_info() {
# Note that this only matters if you have any of the important files changed since the last build
# of your image such as Dockerfile.ci, setup.py etc.
#
-MAGIC_CUT_OFF_NUMBER_OF_LAYERS=34
+MAGIC_CUT_OFF_NUMBER_OF_LAYERS=41
# Compares layers from both remote and local image and set FORCE_PULL_IMAGES to true in case
# More than the last NN layers are different.
@@ -1805,7 +1804,6 @@ function delete_cluster() {
echo
echo "Deleted cluster ${KIND_CLUSTER_NAME}"
echo
- rm -rf "${HOME}/.kube/*"
}
function perform_kind_cluster_operation() {
diff --git a/scripts/ci/docker-compose/forward-credentials.yml b/scripts/ci/docker-compose/forward-credentials.yml
index e2d5f75..875b1ce 100644
--- a/scripts/ci/docker-compose/forward-credentials.yml
+++ b/scripts/ci/docker-compose/forward-credentials.yml
@@ -23,9 +23,7 @@ services:
# To inside docker. Use with care - your credentials will be available to
# Everything you install in Docker
volumes:
+ - ${HOME}/.aws:/root/.aws:cached
+ - ${HOME}/.azure:/root/.azure:cached
- ${HOME}/.config:/root/.config:cached
- - ${HOME}/.boto:/root/.boto:cached
- ${HOME}/.docker:/root/.docker:cached
- - ${HOME}/.gsutil:/root/.gsutil:cached
- - ${HOME}/.kube:/root/.kube:cached
- - ${HOME}/.s3:/root/.s3:cached
diff --git a/scripts/ci/docker-compose/local-prod.yml b/scripts/ci/docker-compose/local-prod.yml
index 6092323..ae8317d 100644
--- a/scripts/ci/docker-compose/local-prod.yml
+++ b/scripts/ci/docker-compose/local-prod.yml
@@ -39,4 +39,5 @@ services:
environment:
- HOST_USER_ID
- HOST_GROUP_ID
+ - HOST_HOME=${HOME}
- PYTHONDONTWRITEBYTECODE
diff --git a/scripts/ci/docker-compose/local.yml b/scripts/ci/docker-compose/local.yml
index 42609e6..793a322 100644
--- a/scripts/ci/docker-compose/local.yml
+++ b/scripts/ci/docker-compose/local.yml
@@ -32,7 +32,6 @@ services:
- ../../../.flake8:/opt/airflow/.flake8:cached
- ../../../.github:/opt/airflow/.github:cached
- ../../../.inputrc:/root/.inputrc:cached
- - ../../../.kube:/root/.kube:cached
- ../../../.rat-excludes:/opt/airflow/.rat-excludes:cached
- ../../../CHANGELOG.txt:/opt/airflow/CHANGELOG.txt:cached
- ../../../LICENSE:/opt/airflow/LICENSE:cached
@@ -60,6 +59,7 @@ services:
environment:
- HOST_USER_ID
- HOST_GROUP_ID
+ - HOST_HOME=${HOME}
- PYTHONDONTWRITEBYTECODE
ports:
- "${WEBSERVER_HOST_PORT}:8080"