You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by ka...@apache.org on 2021/01/14 09:26:45 UTC
[airflow] branch master updated: Structure and small content
improvements in installation.rst (#13661)
This is an automated email from the ASF dual-hosted git repository.
kamilbregula pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/master by this push:
new 39f0365 Structure and small content improvements in installation.rst (#13661)
39f0365 is described below
commit 39f03656d95eb9904f6ef2b6e64e14b25fa7d215
Author: Kamil BreguĊa <mi...@users.noreply.github.com>
AuthorDate: Thu Jan 14 10:26:31 2021 +0100
Structure and small content improvements in installation.rst (#13661)
* Structure improvments in installation.rst file
* fixup! Structure improvments in installation.rst file
---
docs/apache-airflow/installation.rst | 227 +++++++++++++-------------
docs/apache-airflow/production-deployment.rst | 2 +
2 files changed, 119 insertions(+), 110 deletions(-)
diff --git a/docs/apache-airflow/installation.rst b/docs/apache-airflow/installation.rst
index 28b773f..dd31ba5 100644
--- a/docs/apache-airflow/installation.rst
+++ b/docs/apache-airflow/installation.rst
@@ -21,9 +21,16 @@ Installation
.. contents:: :local:
+This page describes installations using the ``apache-airflow`` package `published in
+Pypi <https://pypi.org/project/apache-airflow/>`__, but some information may be useful during
+installation with other tools as well.
+
+.. note::
+
+ Airflow is also distributed as a Docker image (OCI Image). For more information, see: :ref:`docker_image`
Prerequisites
--------------
+'''''''''''''
Airflow is tested with:
@@ -38,90 +45,44 @@ Airflow is tested with:
* Kubernetes: 1.16.9, 1.17.5, 1.18.6
**Note:** MySQL 5.x versions are unable to or have limitations with
-running multiple schedulers -- please see the "Scheduler" docs. MariaDB is not tested/recommended.
+running multiple schedulers -- please see: :doc:`/scheduler`. MariaDB is not tested/recommended.
**Note:** SQLite is used in Airflow tests. Do not use it in production. We recommend
using the latest stable version of SQLite for local development.
-Getting Airflow
-'''''''''''''''
-
-Airflow is published as ``apache-airflow`` package in PyPI. Installing it however might be sometimes tricky
-because Airflow is a bit of both a library and application. Libraries usually keep their dependencies open and
-applications usually pin them, but we should do neither and both at the same time. We decided to keep
-our dependencies as open as possible (in ``setup.cfg`` and ``setup.py``) so users can install different
-version of libraries if needed. This means that from time to time plain ``pip install apache-airflow`` will
-not work or will produce unusable Airflow installation.
+Please note that with respect to Python 3 support, Airflow 2.0.0 has been
+tested with Python 3.6, 3.7, and 3.8, but does not yet support Python 3.9.
-In order to have repeatable installation, however, starting from **Airflow 1.10.10** and updated in
-**Airflow 1.10.13** we also keep a set of "known-to-be-working" constraint files in the
-``constraints-master``, ``constraints-2-0`` and ``constraints-1-10`` orphan branches.
-Those "known-to-be-working" constraints are per major/minor python version. You can use them as constraint
-files when installing Airflow from PyPI. Note that you have to specify correct Airflow version
-and python versions in the URL.
+Installation tools
+''''''''''''''''''
The official way of installing Airflow is with the ``pip`` tool.
There was a recent (November 2020) change in resolver, so currently only 20.2.4 version is officially
supported, although you might have a success with 20.3.3+ version (to be confirmed if all initial
-issues from ``pip`` 20.3.0 release have been fixed in 20.3.3).
+issues from ``pip`` 20.3.0 release have been fixed in 20.3.3). In order to install Airflow you need to
+either downgrade pip to version 20.2.4 ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3, you need to add option
+``--use-deprecated legacy-resolver`` to your pip install command.
While they are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
`pip-tools <https://pypi.org/project/pip-tools/>`_, but they do not share the same workflow as
-``pip``- especially when it comes to constraint vs. requirements management.
+``pip`` - especially when it comes to constraint vs. requirements management.
Installing via ``Poetry`` or ``pip-tools`` is not currently supported. If you wish to install airflow
-using those tools you should use the constraint files described below and convert them to appropriate
+using those tools you should use the :ref:`constraint files <installation:constraints>` and convert them to appropriate
format and workflow that your tool requires.
- **Prerequisites**
-
- On Debian based Linux OS:
-
- .. code-block:: bash
-
- sudo apt-get update
- sudo apt-get install build-essential
-
-
-1. Installing just Airflow
-
-.. note::
-
- On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
- does not yet work with Apache Airflow and might lead to errors in installation - depends on your choice
- of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
- ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3, you need to add option
- ``--use-deprecated legacy-resolver`` to your pip install command.
-
-
-.. code-block:: bash
-
- AIRFLOW_VERSION=2.0.0
- PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
- # For example: 3.6
- CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
- # For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.0.0/constraints-3.6.txt
- pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
-
-Please note that with respect to Python 3 support, Airflow 2.0.0 has been
-tested with Python 3.6, 3.7, and 3.8, but does not yet support Python 3.9.
-
-2. Installing with extras (for example postgres, google)
-
-.. note::
-
- On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
- does not yet work with Apache Airflow and might lead to errors in installation - depends on your choice
- of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
- ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3, you need to add option
- ``--use-deprecated legacy-resolver`` to your pip install command.
+.. _installation:extra_packages:
+Extra Packages
+''''''''''''''
-.. code-block:: bash
+The ``apache-airflow`` PyPI basic package only installs what's needed to get started.
+Subpackages can be installed depending on what will be useful in your
+environment. For instance, if you don't need connectivity with Postgres,
+you won't have to go through the trouble of installing the ``postgres-devel``
+yum package, or whatever equivalent applies on the distribution you are using.
- AIRFLOW_VERSION=2.0.0
- PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
- CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
- pip install "apache-airflow[postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
+Behind the scenes, Airflow does conditional imports of operators that require
+these extra dependencies.
Most of the extras are linked to a corresponding providers package. For example "amazon" extra
has a corresponding ``apache-airflow-providers-amazon`` providers package to be installed. When you install
@@ -129,26 +90,25 @@ Airflow with such extras, the necessary provider packages are installed automati
PyPI for those packages). However you can freely upgrade and install provider packages independently from
the main Airflow installation.
-Python versions support
-'''''''''''''''''''''''
+For the list of the subpackages and what they enable, see: :doc:`extra-packages-ref`.
-As of Airflow 2.0 we agreed to certain rules we follow for Python support. They are based on the official
-release schedule of Python, nicely summarized in the
-`Python Developer's Guide <https://devguide.python.org/#status-of-python-branches>`_
+.. _installation:provider_packages:
-1. We end support for Python versions when they reach EOL (For Python 3.6 it means that we will stop supporting it
- on 23.12.2021).
+Provider packages
+'''''''''''''''''
-2. The "oldest" supported version of Python is the default one. "Default" is only meaningful in terms of
- "smoke tests" in CI PRs which are run using this default version.
+Unlike Apache Airflow 1.10, the Airflow 2.0 is delivered in multiple, separate, but connected packages.
+The core of Airflow scheduling system is delivered as ``apache-airflow`` package and there are around
+60 providers packages which can be installed separately as so called "Airflow Provider packages".
+The default Airflow installation doesn't have many integrations and you have to install them yourself.
-3. We support a new version of Python after it is officially released, as soon as we manage to make
- it works in our CI pipeline (which might not be immediate) and release a new version of Airflow
- (non-Patch version) based on this CI set-up.
+You can even develop and install your own providers for Airflow. For more information,
+see: :doc:`apache-airflow-providers:index`
+For the list of the provider packages and what they enable, see: :doc:`apache-airflow-providers:packages-ref`.
-Requirements
-''''''''''''
+System dependencies
+'''''''''''''''''''
You need certain system level requirements in order to install Airflow. Those are requirements that are known
to be needed for Linux system (Tested on Ubuntu Buster LTS) :
@@ -171,44 +131,83 @@ to be needed for Linux system (Tested on Ubuntu Buster LTS) :
You also need database client packages (Postgres or MySQL) if you want to use those databases.
-If the ``airflow`` command is not getting recognized (can happen on Windows when using WSL), then
-ensure that ``~/.local/bin`` is in your ``PATH`` environment variable, and add it in if necessary:
+.. _installation:constraints:
-.. code-block:: bash
+Constraints files
+'''''''''''''''''
- PATH=$PATH:~/.local/bin
+Airflow installation might be sometimes tricky because Airflow is a bit of both a library and application.
+Libraries usually keep their dependencies open and applications usually pin them, but we should do neither
+and both at the same time. We decided to keep our dependencies as open as possible
+(in ``setup.cfg`` and ``setup.py``) so users can install different
+version of libraries if needed. This means that from time to time plain ``pip install apache-airflow`` will
+not work or will produce unusable Airflow installation.
-.. _installation:extra_packages:
+In order to have repeatable installation, starting from **Airflow 1.10.10** and updated in
+**Airflow 1.10.13** we also keep a set of "known-to-be-working" constraint files in the
+``constraints-master``, ``constraints-2-0`` and ``constraints-1-10`` orphan branches and then we create tag
+for each released version e.g. ``constraints-2.0.0``. This way, when we keep a tested and working set of dependencies.
-Extra Packages
-''''''''''''''
+Those "known-to-be-working" constraints are per major/minor python version. You can use them as constraint
+files when installing Airflow from PyPI. Note that you have to specify correct Airflow version
+and python versions in the URL.
-The ``apache-airflow`` PyPI basic package only installs what's needed to get started.
-Subpackages can be installed depending on what will be useful in your
-environment. For instance, if you don't need connectivity with Postgres,
-you won't have to go through the trouble of installing the ``postgres-devel``
-yum package, or whatever equivalent applies on the distribution you are using.
+You can create the URL to the file substituting the variables in the template below.
-Behind the scenes, Airflow does conditional imports of operators that require
-these extra dependencies.
+.. code-block::
-For the list of the subpackages and what they enable, see: :doc:`extra-packages-ref`.
+ https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt
-.. _installation:provider_packages:
+where:
-Provider packages
-'''''''''''''''''
+- ``AIRFLOW_VERSION`` - Airflow version (e.g. ``2.0.0``) or ``master``, ``2-0``, ``1-10`` for latest development version
+- ``PYTHON_VERSION`` Python version e.g. ``3.8``, ``3.7``
-Unlike Apache Airflow 1.10, the Airflow 2.0 is delivered in multiple, separate, but connected packages.
-The core of Airflow scheduling system is delivered as ``apache-airflow`` package and there are around
-60 providers packages which can be installed separately as so called "Airflow Provider packages".
-The default Airflow installation doesn't have many integrations and you have to install them yourself.
+Installation script
+'''''''''''''''''''
-You can even develop and install your own providers for Airflow. For more information,
-see: :doc:`apache-airflow-providers:index`
+In order to simplify the installation, we have prepared a script that will select `the constraints file <installation:constraints>`__ compatible with your Python version
-For the list of the provider packages and what they enable, see: :doc:`apache-airflow-providers:packages-ref`.
+**Plain installation:**
+
+If you don't need to install any extra extra, you can use the command set below:
+
+.. code-block:: bash
+
+ AIRFLOW_VERSION=2.0.0
+ PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
+ # For example: 3.6
+ CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
+ # For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.0.0/constraints-3.6.txt
+ pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
+
+**Installing with extras**
+
+If you need to install additional :ref:`extra packages <installation:extra_packages>`, you can use the script below.
+
+.. code-block:: bash
+
+ AIRFLOW_VERSION=2.0.0
+ PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
+ CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
+ pip install "apache-airflow[postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
+
+Python versions support
+'''''''''''''''''''''''
+
+As of Airflow 2.0 we agreed to certain rules we follow for Python support. They are based on the official
+release schedule of Python, nicely summarized in the
+`Python Developer's Guide <https://devguide.python.org/#status-of-python-branches>`_
+1. We end support for Python versions when they reach EOL (For Python 3.6 it means that we will stop supporting it
+ on 23.12.2021).
+
+2. The "oldest" supported version of Python is the default one. "Default" is only meaningful in terms of
+ "smoke tests" in CI PRs which are run using this default version.
+
+3. We support a new version of Python after it is officially released, as soon as we manage to make
+ it works in our CI pipeline (which might not be immediate) and release a new version of Airflow
+ (non-Patch version) based on this CI set-up.
Initializing Airflow Database
'''''''''''''''''''''''''''''
@@ -225,16 +224,24 @@ run tasks:
airflow db init
-Docker image
-''''''''''''
-
-Airflow is also distributed as a Docker image (OCI Image). For more information, see: :doc:`production-deployment`
Troubleshooting
'''''''''''''''
This section describes how to troubleshoot installation issues.
+Airflow command is not recognized
+"""""""""""""""""""""""""""""""""
+
+If the ``airflow`` command is not getting recognized (can happen on Windows when using WSL), then
+ensure that ``~/.local/bin`` is in your ``PATH`` environment variable, and add it in if necessary:
+
+.. code-block:: bash
+
+ PATH=$PATH:~/.local/bin
+
+You can also start airflow with ``python -m airflow``
+
``Symbol not found: _Py_GetArgcArgv``
"""""""""""""""""""""""""""""""""""""
diff --git a/docs/apache-airflow/production-deployment.rst b/docs/apache-airflow/production-deployment.rst
index 78903b2..0d21abc 100644
--- a/docs/apache-airflow/production-deployment.rst
+++ b/docs/apache-airflow/production-deployment.rst
@@ -108,6 +108,8 @@ Strategies for mitigation:
* When running on kubernetes, use a ``livenessProbe`` on the scheduler deployment to fail if the scheduler has not heartbeat in a while.
`Example: <https://github.com/apache/airflow/blob/190066cf201e5b0442bbbd6df74efecae523ee76/chart/templates/scheduler/scheduler-deployment.yaml#L118-L136>`_.
+.. _docker_image:
+
Production Container Images
===========================