You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by ka...@apache.org on 2021/01/14 09:26:45 UTC

[airflow] branch master updated: Structure and small content improvements in installation.rst (#13661)

This is an automated email from the ASF dual-hosted git repository.

kamilbregula pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/master by this push:
     new 39f0365  Structure and small content improvements in installation.rst (#13661)
39f0365 is described below

commit 39f03656d95eb9904f6ef2b6e64e14b25fa7d215
Author: Kamil BreguĊ‚a <mi...@users.noreply.github.com>
AuthorDate: Thu Jan 14 10:26:31 2021 +0100

    Structure and small content improvements in installation.rst (#13661)
    
    * Structure improvments in installation.rst file
    
    * fixup! Structure improvments in installation.rst file
---
 docs/apache-airflow/installation.rst          | 227 +++++++++++++-------------
 docs/apache-airflow/production-deployment.rst |   2 +
 2 files changed, 119 insertions(+), 110 deletions(-)

diff --git a/docs/apache-airflow/installation.rst b/docs/apache-airflow/installation.rst
index 28b773f..dd31ba5 100644
--- a/docs/apache-airflow/installation.rst
+++ b/docs/apache-airflow/installation.rst
@@ -21,9 +21,16 @@ Installation
 
 .. contents:: :local:
 
+This page describes installations using the ``apache-airflow`` package `published in
+Pypi <https://pypi.org/project/apache-airflow/>`__, but some information may be useful during
+installation with other tools as well.
+
+.. note::
+
+    Airflow is also distributed as a Docker image (OCI Image). For more information, see: :ref:`docker_image`
 
 Prerequisites
--------------
+'''''''''''''
 
 Airflow is tested with:
 
@@ -38,90 +45,44 @@ Airflow is tested with:
 * Kubernetes: 1.16.9, 1.17.5, 1.18.6
 
 **Note:** MySQL 5.x versions are unable to or have limitations with
-running multiple schedulers -- please see the "Scheduler" docs. MariaDB is not tested/recommended.
+running multiple schedulers -- please see: :doc:`/scheduler`. MariaDB is not tested/recommended.
 
 **Note:** SQLite is used in Airflow tests. Do not use it in production. We recommend
 using the latest stable version of SQLite for local development.
 
-Getting Airflow
-'''''''''''''''
-
-Airflow is published as ``apache-airflow`` package in PyPI. Installing it however might be sometimes tricky
-because Airflow is a bit of both a library and application. Libraries usually keep their dependencies open and
-applications usually pin them, but we should do neither and both at the same time. We decided to keep
-our dependencies as open as possible (in ``setup.cfg`` and ``setup.py``) so users can install different
-version of libraries if needed. This means that from time to time plain ``pip install apache-airflow`` will
-not work or will produce unusable Airflow installation.
+Please note that with respect to Python 3 support, Airflow 2.0.0 has been
+tested with Python 3.6, 3.7, and 3.8, but does not yet support Python 3.9.
 
-In order to have repeatable installation, however, starting from **Airflow 1.10.10** and updated in
-**Airflow 1.10.13** we also keep a set of "known-to-be-working" constraint files in the
-``constraints-master``, ``constraints-2-0`` and ``constraints-1-10`` orphan branches.
-Those "known-to-be-working" constraints are per major/minor python version. You can use them as constraint
-files when installing Airflow from PyPI. Note that you have to specify correct Airflow version
-and python versions in the URL.
+Installation tools
+''''''''''''''''''
 
 The official way of installing Airflow is with the ``pip`` tool.
 There was a recent (November 2020) change in resolver, so currently only 20.2.4 version is officially
 supported, although you might have a success with 20.3.3+ version (to be confirmed if all initial
-issues from ``pip`` 20.3.0 release have been fixed in 20.3.3).
+issues from ``pip`` 20.3.0 release have been fixed in 20.3.3). In order to install Airflow you need to
+either downgrade pip to version 20.2.4 ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3, you need to add option
+``--use-deprecated legacy-resolver`` to your pip install command.
 
 While they are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
 `pip-tools <https://pypi.org/project/pip-tools/>`_, but they do not share the same workflow as
-``pip``- especially when it comes to constraint vs. requirements management.
+``pip`` - especially when it comes to constraint vs. requirements management.
 Installing via ``Poetry`` or ``pip-tools`` is not currently supported. If you wish to install airflow
-using those tools you should use the constraint files described below and convert them to appropriate
+using those tools you should use the :ref:`constraint files <installation:constraints>`  and convert them to appropriate
 format and workflow that your tool requires.
 
-  **Prerequisites**
-
-  On Debian based Linux OS:
-
-  .. code-block:: bash
-
-      sudo apt-get update
-      sudo apt-get install build-essential
-
-
-1. Installing just Airflow
-
-.. note::
-
-   On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
-   does not yet work with Apache Airflow and might lead to errors in installation - depends on your choice
-   of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
-   ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3, you need to add option
-   ``--use-deprecated legacy-resolver`` to your pip install command.
-
-
-.. code-block:: bash
-
-    AIRFLOW_VERSION=2.0.0
-    PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
-    # For example: 3.6
-    CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
-    # For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.0.0/constraints-3.6.txt
-    pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
-
-Please note that with respect to Python 3 support, Airflow 2.0.0 has been
-tested with Python 3.6, 3.7, and 3.8, but does not yet support Python 3.9.
-
-2. Installing with extras (for example postgres, google)
-
-.. note::
-
-   On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
-   does not yet work with Apache Airflow and might lead to errors in installation - depends on your choice
-   of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
-   ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3, you need to add option
-   ``--use-deprecated legacy-resolver`` to your pip install command.
+.. _installation:extra_packages:
 
+Extra Packages
+''''''''''''''
 
-.. code-block:: bash
+The ``apache-airflow`` PyPI basic package only installs what's needed to get started.
+Subpackages can be installed depending on what will be useful in your
+environment. For instance, if you don't need connectivity with Postgres,
+you won't have to go through the trouble of installing the ``postgres-devel``
+yum package, or whatever equivalent applies on the distribution you are using.
 
-    AIRFLOW_VERSION=2.0.0
-    PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
-    CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
-    pip install "apache-airflow[postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
+Behind the scenes, Airflow does conditional imports of operators that require
+these extra dependencies.
 
 Most of the extras are linked to a corresponding providers package. For example "amazon" extra
 has a corresponding ``apache-airflow-providers-amazon`` providers package to be installed. When you install
@@ -129,26 +90,25 @@ Airflow with such extras, the necessary provider packages are installed automati
 PyPI for those packages). However you can freely upgrade and install provider packages independently from
 the main Airflow installation.
 
-Python versions support
-'''''''''''''''''''''''
+For the list of the subpackages and what they enable, see: :doc:`extra-packages-ref`.
 
-As of Airflow 2.0 we agreed to certain rules we follow for Python support. They are based on the official
-release schedule of Python, nicely summarized in the
-`Python Developer's Guide <https://devguide.python.org/#status-of-python-branches>`_
+.. _installation:provider_packages:
 
-1. We end support for Python versions when they reach EOL (For Python 3.6 it means that we will stop supporting it
-   on 23.12.2021).
+Provider packages
+'''''''''''''''''
 
-2. The "oldest" supported version of Python is the default one. "Default" is only meaningful in terms of
-   "smoke tests" in CI PRs which are run using this default version.
+Unlike Apache Airflow 1.10, the Airflow 2.0 is delivered in multiple, separate, but connected packages.
+The core of Airflow scheduling system is delivered as ``apache-airflow`` package and there are around
+60 providers packages which can be installed separately as so called "Airflow Provider packages".
+The default Airflow installation doesn't have many integrations and you have to install them yourself.
 
-3. We support a new version of Python after it is officially released, as soon as we manage to make
-   it works in our CI pipeline (which might not be immediate) and release a new version of Airflow
-   (non-Patch version) based on this CI set-up.
+You can even develop and install your own providers for Airflow. For more information,
+see: :doc:`apache-airflow-providers:index`
 
+For the list of the provider packages and what they enable, see: :doc:`apache-airflow-providers:packages-ref`.
 
-Requirements
-''''''''''''
+System dependencies
+'''''''''''''''''''
 
 You need certain system level requirements in order to install Airflow. Those are requirements that are known
 to be needed for Linux system (Tested on Ubuntu Buster LTS) :
@@ -171,44 +131,83 @@ to be needed for Linux system (Tested on Ubuntu Buster LTS) :
 
 You also need database client packages (Postgres or MySQL) if you want to use those databases.
 
-If the ``airflow`` command is not getting recognized (can happen on Windows when using WSL), then
-ensure that ``~/.local/bin`` is in your ``PATH`` environment variable, and add it in if necessary:
+.. _installation:constraints:
 
-.. code-block:: bash
+Constraints files
+'''''''''''''''''
 
-    PATH=$PATH:~/.local/bin
+Airflow installation might be sometimes tricky because Airflow is a bit of both a library and application.
+Libraries usually keep their dependencies open and applications usually pin them, but we should do neither
+and both at the same time. We decided to keep our dependencies as open as possible
+(in ``setup.cfg`` and ``setup.py``) so users can install different
+version of libraries if needed. This means that from time to time plain ``pip install apache-airflow`` will
+not work or will produce unusable Airflow installation.
 
-.. _installation:extra_packages:
+In order to have repeatable installation, starting from **Airflow 1.10.10** and updated in
+**Airflow 1.10.13** we also keep a set of "known-to-be-working" constraint files in the
+``constraints-master``, ``constraints-2-0`` and ``constraints-1-10`` orphan branches and then we create tag
+for each released version e.g. ``constraints-2.0.0``. This way, when we keep a tested and working set of dependencies.
 
-Extra Packages
-''''''''''''''
+Those "known-to-be-working" constraints are per major/minor python version. You can use them as constraint
+files when installing Airflow from PyPI. Note that you have to specify correct Airflow version
+and python versions in the URL.
 
-The ``apache-airflow`` PyPI basic package only installs what's needed to get started.
-Subpackages can be installed depending on what will be useful in your
-environment. For instance, if you don't need connectivity with Postgres,
-you won't have to go through the trouble of installing the ``postgres-devel``
-yum package, or whatever equivalent applies on the distribution you are using.
+You can create the URL to the file substituting the variables in the template below.
 
-Behind the scenes, Airflow does conditional imports of operators that require
-these extra dependencies.
+.. code-block::
 
-For the list of the subpackages and what they enable, see: :doc:`extra-packages-ref`.
+  https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt
 
-.. _installation:provider_packages:
+where:
 
-Provider packages
-'''''''''''''''''
+- ``AIRFLOW_VERSION`` - Airflow version (e.g. ``2.0.0``) or ``master``, ``2-0``, ``1-10`` for latest development version
+- ``PYTHON_VERSION`` Python version e.g. ``3.8``, ``3.7``
 
-Unlike Apache Airflow 1.10, the Airflow 2.0 is delivered in multiple, separate, but connected packages.
-The core of Airflow scheduling system is delivered as ``apache-airflow`` package and there are around
-60 providers packages which can be installed separately as so called "Airflow Provider packages".
-The default Airflow installation doesn't have many integrations and you have to install them yourself.
+Installation script
+'''''''''''''''''''
 
-You can even develop and install your own providers for Airflow. For more information,
-see: :doc:`apache-airflow-providers:index`
+In order to simplify the installation, we have prepared a script that will select `the constraints file <installation:constraints>`__ compatible with your Python version
 
-For the list of the provider packages and what they enable, see: :doc:`apache-airflow-providers:packages-ref`.
+**Plain installation:**
+
+If you don't need to install any extra extra, you can use the command set below:
+
+.. code-block:: bash
+
+    AIRFLOW_VERSION=2.0.0
+    PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
+    # For example: 3.6
+    CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
+    # For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.0.0/constraints-3.6.txt
+    pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
+
+**Installing with extras**
+
+If you need to install additional :ref:`extra packages <installation:extra_packages>`, you can use the script below.
+
+.. code-block:: bash
+
+    AIRFLOW_VERSION=2.0.0
+    PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
+    CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
+    pip install "apache-airflow[postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
+
+Python versions support
+'''''''''''''''''''''''
+
+As of Airflow 2.0 we agreed to certain rules we follow for Python support. They are based on the official
+release schedule of Python, nicely summarized in the
+`Python Developer's Guide <https://devguide.python.org/#status-of-python-branches>`_
 
+1. We end support for Python versions when they reach EOL (For Python 3.6 it means that we will stop supporting it
+   on 23.12.2021).
+
+2. The "oldest" supported version of Python is the default one. "Default" is only meaningful in terms of
+   "smoke tests" in CI PRs which are run using this default version.
+
+3. We support a new version of Python after it is officially released, as soon as we manage to make
+   it works in our CI pipeline (which might not be immediate) and release a new version of Airflow
+   (non-Patch version) based on this CI set-up.
 
 Initializing Airflow Database
 '''''''''''''''''''''''''''''
@@ -225,16 +224,24 @@ run tasks:
 
     airflow db init
 
-Docker image
-''''''''''''
-
-Airflow is also distributed as a Docker image (OCI Image). For more information, see: :doc:`production-deployment`
 
 Troubleshooting
 '''''''''''''''
 
 This section describes how to troubleshoot installation issues.
 
+Airflow command is not recognized
+"""""""""""""""""""""""""""""""""
+
+If the ``airflow`` command is not getting recognized (can happen on Windows when using WSL), then
+ensure that ``~/.local/bin`` is in your ``PATH`` environment variable, and add it in if necessary:
+
+.. code-block:: bash
+
+    PATH=$PATH:~/.local/bin
+
+You can also start airflow with ``python -m airflow``
+
 ``Symbol not found: _Py_GetArgcArgv``
 """""""""""""""""""""""""""""""""""""
 
diff --git a/docs/apache-airflow/production-deployment.rst b/docs/apache-airflow/production-deployment.rst
index 78903b2..0d21abc 100644
--- a/docs/apache-airflow/production-deployment.rst
+++ b/docs/apache-airflow/production-deployment.rst
@@ -108,6 +108,8 @@ Strategies for mitigation:
 * When running on kubernetes, use a ``livenessProbe`` on the scheduler deployment to fail if the scheduler has not heartbeat in a while.
   `Example: <https://github.com/apache/airflow/blob/190066cf201e5b0442bbbd6df74efecae523ee76/chart/templates/scheduler/scheduler-deployment.yaml#L118-L136>`_.
 
+.. _docker_image:
+
 Production Container Images
 ===========================