You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/14 17:47:40 UTC

[GitHub] [airflow] potiuk opened a new pull request #11529: Behaviour to install all airflow providers added

potiuk opened a new pull request #11529:
URL: https://github.com/apache/airflow/pull/11529


   In Airflow 2.0 we decided to split Airlow into separate providers.
   this means that when you prepare core airflow package, providers
   are not installed by default. This is not very convenient for
   local development though and for docker images built from sources,
   where you would like to install all providers by default.
   
   A new INSTALL_ALL_AIRFLOW_PROVIDERS environment variable controls
   this behaviour now. It is is set to "true", all packages including
   provider packages are installed. If missing or set to false, only
   the core provider package is installed.
   
   For Breeze, the default is set to "true", as for those cases you
   want to install all providers in your environment. Similarly if you
   build the production image from sources. However when you build
   image using github tag or pip package, you should specify
   appropriate extras to install the required provider packages.
   
   Note that if you install Airflow via 'pip install .' from sources
   in local virtualenv, provider packages are not going to be
   installed unless you set INSTALL_ALL_AIRFLOW_PROVIDERS to "true".
   
   Fixes #11489
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11529:
URL: https://github.com/apache/airflow/pull/11529#issuecomment-708572246


   Hey @ashb @kaxil  @turbaszek, others. I looked how to solve the installation problem described in #11489 and failing Kubernetes builds (caused likely by the provider split) and I believe i found the best approach. I think we want to keep this:
   
   * in any setup where we use packages, we want to install separately airflow and providers
   * in any setup where we use sources, we want to install airflow + providers together for development convenience
   
   The proposal I have is a variable INSTALL_PROVIDERS_FROM_SOURCES. When this flag is "true", airflow will install providers from sources, when it is missing or anything else but "true", it will install only airflow core.  By default in Breeze and when building images from sources I set this variable to "true" (but it can be set to false by --skip-installing-airflow-providers flag so that you can also install a "bare" airflow in Breeze or prepare a "bare" image without any providers easily. Together with in-progess #11464 (I will have to add conditional dependencies there and rebase on top of this) it will have exactly the desired effect:
   
   * when installing airflow with Breeze from sources (both prod and CI), we continue installing all providers from sources like we have in 1.10. Extras will not cause installing of apache-airflow-providers-* as those dependencies will be disabled in this case. You can also disable installing all providers by ``-skip-install0ing-airlfow-providers``
   
   * when installing airflow with Breeze  from PyPI or GitHub, only the providers required by "extras" will be installed
   
   * when installing airflow locally with "." (without -e) only bare Airflow will be installed, provders will only be available if you happen to be in the airflow directory or if you install provider packages manually. You can also set INSTALL_PROVIDER_SOURCES="true" before installation and then all providers will be installed as in 1.10.
   
   * when installing airflow locally with -e '.'  - providers will be automatically installed and available. Since 'airflow' is taken direclty from sources, providers are there and they will be importable.
   
   
   I tested all the above scenarios and I think it makes perfect sense. Let me know what you think.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #11529:
URL: https://github.com/apache/airflow/pull/11529#discussion_r504927758



##########
File path: INSTALL
##########
@@ -42,6 +42,17 @@ python setup.py install
 pip install . \
   --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"
 
+By default `pip install` in Airflow 2.0 installs only the provider packages that are needed by the extras,
+however if you want to install all providers (which was default behaviour in 1.10.*)
+you can do it by setting environment variable INSTALL_PROVIDERS_FROM_SOURCES to `true`.
+
+INSTALL_PROVIDERS_FROM_SOURCES="true" pip install . \
+  --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"
+
+
+You can also install airflow in development mode where not

Review comment:
       incomplete sentence




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11529:
URL: https://github.com/apache/airflow/pull/11529#issuecomment-708676387


   [The Workflow run](https://github.com/apache/airflow/actions/runs/307373711) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11529:
URL: https://github.com/apache/airflow/pull/11529#discussion_r504924002



##########
File path: IMAGES.rst
##########
@@ -390,6 +390,22 @@ The following build arguments (``--build-arg`` in docker build command) can be u
 |                                          |                                          | one of the folders included in           |
 |                                          |                                          | dockerignore                             |

Review comment:
       ```suggestion
   |                                          |                                          | dockeri gnore                            |
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #11529:
URL: https://github.com/apache/airflow/pull/11529


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #11529:
URL: https://github.com/apache/airflow/pull/11529#discussion_r504927597



##########
File path: IMAGES.rst
##########
@@ -388,7 +388,23 @@ The following build arguments (``--build-arg`` in docker build command) can be u
 |                                          |                                          | file has to be in docker context so      |
 |                                          |                                          | it's best to place such file in          |
 |                                          |                                          | one of the folders included in           |
-|                                          |                                          | dockerignore                             |
+|                                          |                                          | dockeri gnore                            |

Review comment:
       ```suggestion
   |                                          |                                          | dockerignore                            |
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11529:
URL: https://github.com/apache/airflow/pull/11529#discussion_r504924221



##########
File path: docs/production-deployment.rst
##########
@@ -387,6 +387,14 @@ The following build arguments (``--build-arg`` in docker build command) can be u
 | ``AIRFLOW_BRANCH``                       | ``master``                               | the branch from which PIP dependencies   |
 |                                          |                                          | are pre-installed initially              |
 +------------------------------------------+------------------------------------------+------------------------------------------+
+| ``AIRFLOW_CONSTRAINTS_LOCATION``         |                                          | If not empty, it will override the       |
+|                                          |                                          | source of the constraints with the       |
+|                                          |                                          | specified URL or file. Note that the     |
+|                                          |                                          | file has to be in docker context so      |
+|                                          |                                          | it's best to place such file in          |
+|                                          |                                          | one of the folders included in           |
+|                                          |                                          | dockerignore                             |

Review comment:
       ```suggestion
   |                                          |                                          | docker ignore                            |
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #11529:
URL: https://github.com/apache/airflow/pull/11529#discussion_r504930868



##########
File path: CONTRIBUTING.rst
##########
@@ -520,9 +524,94 @@ yandexcloud, all, devel_ci
 
   .. END EXTRAS HERE
 
+Provider packages
+-----------------
 
-Airflow dependencies
---------------------
+Airflow 2.0 is split into core and providers. They are delivered as separate packages:
+
+* apache-airflow: core
+* apache-airflow-providers-*: More than 50 provider packages
+
+In Airflow 1.10 all those providers were installed together within one single package and when you installed
+airflow locally, from sources, they were also installed. In Airflow 2.0, providers are separated out,
+and not installed together with the core, unless you set INSTALL_PROVIDERS_FROM_SOURCES environment
+variable to ``true``.
+
+In Breeze - which is a development variable, INSTALL_PROVIDERS_FROM_SOURCES is set to true, but you
+can also add ``--skip-installing-airflow-providers`` flag to Breeze to skip installing them when
+building the images.
+
+One watch-out - providers are still always installed (or rather available) if you install airflow from
+sources using ``-e`` (or ``--editable``) flag. In such case airflow is read directly from the sources
+without copying airflow packages to the usual installation location, and since 'providers' folder is
+in this airflow folder - the providers package is importable.
+
+Some of the packages have cross-dependencies with other providers packages. This typically happens for
+transfer operators where operators use hooks from the other providers in case they are transferring
+data between the providers. The list of dependencies is maintained (automatically with pre-commits)
+in the ``airflow/providers/dependencies.json``. Pre-commits are also used to generate dependencies.
+The dependency list is automatically used during pypi packages generation.
+
+Cross-dependencies between provider packages are converted into extras - if you need functionality from
+the other provider package you can install it adding [extra] after the
+apache-airflow-backport-providers-PROVIDER for example ``pip install
+apache-airflow-backport-providers-google[amazon]`` in case you want to use GCP
+transfer operators from Amazon ECS.
+
+If you add a new dependency between different providers packages, it will be detected automatically during
+pre-commit phase and pre-commit will fail - and add entry in dependencies.json so that the package extra
+dependencies are properly added when package is installed.
+
+You can regenerate the whole list of provider dependencies by running this command (you need to have
+``pre-commits`` installed).
+
+.. code-block:: bash
+
+  pre-commit run build-providers-dependencies
+
+
+Here is the list of packages and their extras:
+
+
+  .. START PACKAGE DEPENDENCIES HERE
+
+========================== ===========================
+Package                    Extras
+========================== ===========================
+amazon                     apache.hive,google,imap,mongo,mysql,postgres,ssh
+apache.druid               apache.hive
+apache.hive                amazon,microsoft.mssql,mysql,presto,samba,vertica
+apache.livy                http
+dingding                   http
+discord                    http
+google                     amazon,apache.cassandra,cncf.kubernetes,facebook,microsoft.azure,microsoft.mssql,mysql,postgres,presto,sftp
+hashicorp                  google
+microsoft.azure            google,oracle
+microsoft.mssql            odbc
+mysql                      amazon,presto,vertica
+opsgenie                   http
+postgres                   amazon
+sftp                       ssh
+slack                      http
+snowflake                  slack
+========================== ===========================
+
+  .. END PACKAGE DEPENDENCIES HERE
+
+Backport providers
+------------------
+
+You can also build backport provider packages for Airflow 1.10. They aim to provide a bridge when users
+of Airflow 1.10 want to migrate to Airflow 2.0. The backport packages are named similarly to the
+provider packages, but with "backport" added:
+
+* 'apache-airflow-backport-provider-*

Review comment:
       ```suggestion
   * 'apache-airflow-backport-provider-*'
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11529:
URL: https://github.com/apache/airflow/pull/11529#issuecomment-708590292


   [The Workflow run](https://github.com/apache/airflow/actions/runs/307107015) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #11529:
URL: https://github.com/apache/airflow/pull/11529#discussion_r504930067



##########
File path: CONTRIBUTING.rst
##########
@@ -520,9 +524,94 @@ yandexcloud, all, devel_ci
 
   .. END EXTRAS HERE
 
+Provider packages
+-----------------
 
-Airflow dependencies
---------------------
+Airflow 2.0 is split into core and providers. They are delivered as separate packages:
+
+* apache-airflow: core
+* apache-airflow-providers-*: More than 50 provider packages
+
+In Airflow 1.10 all those providers were installed together within one single package and when you installed
+airflow locally, from sources, they were also installed. In Airflow 2.0, providers are separated out,
+and not installed together with the core, unless you set INSTALL_PROVIDERS_FROM_SOURCES environment
+variable to ``true``.

Review comment:
       ```suggestion
   In Airflow 1.10 all those providers were installed together within one single package and when you installed
   airflow locally, from sources, they were also installed. In Airflow 2.0, providers are separated out,
   and not installed together with the core, unless you set ``INSTALL_PROVIDERS_FROM_SOURCES`` environment
   variable to ``true``.
   ```

##########
File path: CONTRIBUTING.rst
##########
@@ -520,9 +524,94 @@ yandexcloud, all, devel_ci
 
   .. END EXTRAS HERE
 
+Provider packages
+-----------------
 
-Airflow dependencies
---------------------
+Airflow 2.0 is split into core and providers. They are delivered as separate packages:
+
+* apache-airflow: core
+* apache-airflow-providers-*: More than 50 provider packages
+
+In Airflow 1.10 all those providers were installed together within one single package and when you installed
+airflow locally, from sources, they were also installed. In Airflow 2.0, providers are separated out,
+and not installed together with the core, unless you set INSTALL_PROVIDERS_FROM_SOURCES environment
+variable to ``true``.
+
+In Breeze - which is a development variable, INSTALL_PROVIDERS_FROM_SOURCES is set to true, but you

Review comment:
       ```suggestion
   In Breeze - which is a development variable, ``INSTALL_PROVIDERS_FROM_SOURCES`` is set to true, but you
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11529: Behaviour to install all airflow providers added

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11529:
URL: https://github.com/apache/airflow/pull/11529#issuecomment-708657357


   All resolved @kaxil!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org