You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/03/30 20:08:21 UTC

[GitHub] [airflow] martin-kokos commented on a change in pull request #7832: Add production image support

martin-kokos commented on a change in pull request #7832: Add production image support
URL: https://github.com/apache/airflow/pull/7832#discussion_r400462180
 
 

 ##########
 File path: IMAGES.rst
 ##########
 @@ -0,0 +1,304 @@
+
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+.. contents:: :local:
+
+Airflow docker images
+=====================
+
+Airflow has two images (build from Dockerfiles):
+
+* CI image (Dockerfile.ci) - used for running tests and local development
+* Production image (Dockerfile) - used to run production-ready Airlfow installations
+
+Image naming conventions
+========================
+
+The images are named as follows:
+
+``apache/airflow:<BRANCH_OR_TAG>-python<PYTHON_MAJOR_MINOR_VERSION>[-ci][-manifest]``
+
+where:
+
+* BRANCH_OR_TAG - branch or tag used when creating the image. Examples: master, v1-10-test, 1.10.10
+  The ``master`` and ``v1-10-test`` labels are built from branches so they change over time. the 1.10.* and in
+  the future ``2.*`` labels are build from git tags and they are "fixed" once built.
+* PYTHON_MAJOR_MINOR_VERSION - version of python used to build the image. Examples: 3.5, 3.7
+* The ``-ci`` suffix is added for CI images
+* The ``-manifest`` is added for manifest images (see below for explanation of manifest images)
+
+Building docker images
+======================
+
+The easiest way to build those images is to use `<BREEZE.rst>`_.
+
+You can build the CI image using this command:
+
+.. code-block::
+
+  ./breeze build-image
+
+You can build production image using this command:
+
+.. code-block::
+
+  ./breeze build-image --production-image
+
+By adding ``--python <PYTHON_MAJOR_MINOR_VERSION>`` parameter you can build the
+image version for the chosen python version.
+
+The images are build with default extras - different extras for CI and production image and you
+can change the extras via the ``--extras`` parameters. You can see default extras used via
+``./breeze flags``.
+
+For example if you want to build python 3.7 version of production image with
+"all" extras installed you should run this command:
+
+.. code-block::
+
+  ./breeze build-image --python 3.7 --extras "all" --production-image
+
+The command that builds the CI image are optimized to minimize the time needed to rebuild the image when
+the source code of Airflow evolves. This means that if you already have the image locally downloaded and
+built, the scripts will determine whether the rebuild is needed in the first place. Then the scripts will
+make sure that minimal number of steps are executed to rebuild parts of the image (for example,
+PIP dependencies) and will give you an image consistent with the one used during Continuous Integration.
+
+The command that build the production image are optimised for size of the image.
+
+In Breeze by default the images with are built using local sources of Apache Airflow. However
+you can also build production images from github sources - providing ``--install-airflow-version``
+parameter to Breeze. This will install airflow inside the production image using sources downloaded from
+specified tag or branch. Internally airflow will be installed using the command:
+
+.. code-block::
+
+    pip install https://github.com/apache/airflow/archive/<tag>>.tar.gz#egg=apache-airflow \
+       --constraint https://raw.githubusercontent.com/apache/airflow/<tag>/requirements/requirements-python3.7.txt
+
+
+Technical details of Airflow images
+===================================
+
+The CI image is used by Breeze as shell image but it is also used during CI builds on Travis.
+The image is single segment image that contains Airflow installation with "all" dependencies installed.
+It is optimised for rebuild speed (AIRFLOW_CONTAINER_CI_OPTIMISED_BUILD flag set to "true").
+It installs PIP dependencies from the current branch first - so that any changes in setup.py do not trigger
+reinstalling of all dependencies. There is a second step of installation that re-installs the dependencies
+from the latest sources so that we are sure that latest dependencies are installed.
+
+The production image is a multi-segment image. The first segment "airflow-build-image" contains all the
+build essentials and related dependencies that allow to install airflow locally. By default the image is
+build from a released version of Airflow from Github, but by providing some extra arguments you can also
+build it from local sources. This is particularly useful in CI environment where we are using the image
+to run Kubernetes tests. See below for the list of arguments that should be provided to build
+production image from the local sources.
+
+Note! Breeze by default builds production image from local sources. You can change it's behaviour by
+providing ``--install-airflow-version`` parameter, where you can specify the
+tag/branch used to download Airflow package from in github repository. You can
+also change the repository itself by adding --dockerhub-user and --dockerhub-repo flag values.
+
+Manually building the images
+----------------------------
+
+You can build the default production image with standard ``docker build`` command but they will only build
+default versions of the image and will not use the dockerhub versions of images as cache.
+
+
+CI images
+.........
+
+The following arguments can be used for CI images:
+
+* ARG PYTHON_BASE_IMAGE="python:3.6-slim-buster" - Base python image
+* ARG AIRFLOW_VERSION="2.0.0.dev0" - version of Airflow
+* ARG PYTHON_MAJOR_MINOR_VERSION="3.6" - major/minor version of Python (should match base image)
+* ARG DEPENDENCIES_EPOCH_NUMBER="2" - increasing this number will reinstall all apt dependencies
+* ARG KUBECTL_VERSION="v1.15.3" - version of kubectl installed
+* ARG KIND_VERSION="v0.6.1" - version of kind installed
+* ARG PIP_NO_CACHE_DIR="true" - if true, then no pip cache will be stored
+* ARG PIP_VERSION="19.0.2" - version of PIP to use
+* ARG HOME=/root - Home directory of the root user (CI image has root user as default)
+* ARG AIRFLOW_HOME=/root/airflow - Airflow's HOME (that's where logs and sqlite databases are stored)
+* ARG AIRFLOW_SOURCES=/opt/airflow - Mounted sources of Airflow
+* ARG PIP_DEPENDENCIES_EPOCH_NUMBER="3" - increasing that number will reinstall all PIP dependencies
+* ARG CASS_DRIVER_NO_CYTHON="1" - if set to 1 no CYTHON compilation is done for cassandra driver (much faster)
+* ARG AIRFLOW_CONTAINER_CI_OPTIMISED_BUILD="true" if set then PIP dependencies are installed from repo first
+      before they are reinstalled from local sources. This allows for incremental faster builds when
+      requirements change
+* ARG AIRFLOW_REPO=apache/airflow - the repository from which PIP dependencies are installed (CI optimised)
+* ARG AIRFLOW_BRANCH=master - the branch from which PIP dependencies are installed (CI optimised)
+* ARG AIRFLOW_CI_BUILD_EPOCH="1" - increasing this value will reinstall PIP dependencies from the repository
+      from scratch
+* ARG AIRFLOW_EXTRAS="all" - extras to install
+* ARG ADDITIONAL_PYTHON_DEPS="" - additional python dependencies to install
+
+Here are some examples of how CI images can built manually. CI is always built from local sources.
+
+This builds the CI image in version 3.7 with default extras ("all").
+
+.. code-block::
+
+  docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7
+
+
+This builds the CI image in version 3.7 with default extras ("all").
+
+.. code-block::
+
+  docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7
+
+This builds the CI image in version 3.7 with just  extras ("all").
+
+.. code-block::
+
+  docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7
+
+
+Production images
+.................
+
+The following arguments can be used for CI images:
+
+* ARG PYTHON_BASE_IMAGE="python:3.6-slim-buster" - Base python image
+* ARG PYTHON_MAJOR_MINOR_VERSION="3.6" - major/minor version of Python (should match base image)
+* ARG AIRFLOW_VERSION="2.0.0.dev0" - version of Airflow
+* ARG AIRFLOW_ORG="apache" - Github organisation from which Airflow is installed (when installed from repo)
+* ARG AIRFLOW_REPO="airflow" - Github repository from which Airflow is installed (when installed from repo)
+* ARG AIRFLOW_GIT_REFERENCE="master" - reference (branch or tag) from Github repository from which
+    Airflow is installed (when installed from repo)
+* ARG REQUIREMENTS_GIT_REFERENCE="master" - reference (branch or tag) from Github repository from which
+    requirements are downloaded for constraints (when installed from repo).
+* ARG WWW_FOLDER="www" - folder where www pages are generated - it should be set to www_rbac in case
+    of 1.10 image builds.
+* ARG AIRFLOW_EXTRAS=(see Dockerfile)  Default extras with which airflow is installed
+* ARG AIRFLOW_HOME=/opt/airflow - Airflow's HOME (that's where logs and sqlite databases are stored)
+* ARG AIRFLOW_UID="50000" - Airflow user UID
+* ARG AIRFLOW_GID="50000" - Airflow group GID
+* ARG PIP_VERSION="19.0.2" - version of PIP to use
+* ARG CASS_DRIVER_BUILD_CONCURRENCY="8" - Number of processors to use for cassandra PIP install (speeds up
+       installing in case cassandra extra is used).
+
+Those variables should be overwritten only if the production image is built from local sources
+rather than from GitHub repository.
+
+* COPY_SOURCE - should be set to "." if you the image is built from sources
+* COPY_TARGET - should be set to a airflow source directory inside container if the image is built
+  from sources
+* AIRFLOW_SOURCES - should point to the same as COPY_TARGET in case production image is installed from
+  the local sources rather than from Github repository. If left empty it points to sources from
+  GitHub repository derived from AIRFLOW_ORG, AIRFLOW_REPO, AIRFLOW_GIT_REFERENCE
+* CONSTRAINT_REQUIREMENTS should point to locally available requirements file if the image is built
+  from sources. If left empty it will point to the right requirement file from GitHub repository
+  derived from AIRFLOW_ORG, AIRFLOW_REPO, REQUIREMENTS_GIT_REFERENCE
+* ENTRYPOINT_FILE - sources point to entrypoint.sh file when installing from sources if left empty
+  it points to sources from GitHub repository derived from
+  AIRFLOW_ORG, AIRFLOW_REPO, REQUIREMENTS_GIT_REFERENCE
+
+The below will production image in version 3.6 with default extras from master branch in Github:
+
+.. code-block::
+
+  docker build .
+
+The below will build the production image in version 3.7 with default extras from 1.10.9 tag and
+requirements taken from v1-10-test branch in Github. Note that versions 1.10.9 and below
+have no requirements so requirements should be taken from head of the v1-10-test branch. Once we
+release 1.10.10 we can take them from the 1.10.10 tag.
+
+.. code-block::
+
+  docker build . --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 --build-arg AIRFLOW_GIT_REFERENCE=1.10.9 \
+    --build-arg REQUIREMENTS_GIT_REFERENCE=v1-10-test --build-arg WWW_FOLDER="www_rbac"
+
+The below will build the production image in version 3.7 with default extras from master branch in Github.
+
+.. code-block::
+
+  docker build . --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7
+
+The below will build the production image in version 3.6 with default extras from current sources.
+
+.. code-block::
+
+  docker build . --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 --build-arg COPY_SOURCE=. \
+    --build-arg COPY_TARGET=/opt/airflow --build-arg AIRFLOW_SOURCES=/opt/airflow \
+    --build-arg CONSTRAINT_REQUIREMENTS=requirements/requirements-python3.7.txt" \
 
 Review comment:
   A starting double-quote fell out. I appreciate your work. Thank you.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services