You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by po...@apache.org on 2020/12/09 10:37:19 UTC

[airflow] branch master updated: Adds documentation about custom providers. (#12921)

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/master by this push:
     new 7bd867d  Adds documentation about custom providers. (#12921)
7bd867d is described below

commit 7bd867d74518331a4b7f9bcb2a0f6adffc63fe22
Author: Jarek Potiuk <ja...@polidea.com>
AuthorDate: Wed Dec 9 11:35:54 2020 +0100

    Adds documentation about custom providers. (#12921)
    
    Closes: #11429
---
 docs/apache-airflow-providers/index.rst         | 171 +++++++++++++++++++-----
 docs/apache-airflow/backport-providers.rst      |   5 -
 docs/apache-airflow/concepts.rst                |  41 ++++--
 docs/apache-airflow/howto/connection.rst        |  25 ++++
 docs/apache-airflow/howto/custom-operator.rst   |   3 +-
 docs/apache-airflow/howto/define_extra_link.rst |  26 +++-
 docs/apache-airflow/index.rst                   |   2 +-
 docs/apache-airflow/installation.rst            |   5 +-
 8 files changed, 220 insertions(+), 58 deletions(-)

diff --git a/docs/apache-airflow-providers/index.rst b/docs/apache-airflow-providers/index.rst
index 0e431b7..179ef44 100644
--- a/docs/apache-airflow-providers/index.rst
+++ b/docs/apache-airflow-providers/index.rst
@@ -69,56 +69,159 @@ Separate provider packages provide the possibilities that were not available in
    following the usual tests you have in your environment.
 
 
+Extending Airflow Connections and Extra links via Providers
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+Providers, can not only deliver operators, hooks, sensor, transfer operators to communicate with
+multitude of external systems, but they can also extend Airflow. Airflow has several extension capabilities
+that can be used by providers. Airflow automatically discovers, which providers add those additional
+capabilities and once you install provider package and re-start Airflow, those are becoming automatically
+available to Airflow Users.
+
+The capabilities are:
+
+* Adding Extra Links to operators delivered by the provider.
+  See :doc:`apache-airflow:howto/define_extra_link`
+  for description of what extra links are and examples of provider registering an operator with extra links
+
+* Adding custom connection types, extending connection form and handling custom form field behaviour for the
+  connections defined by the provider. See :doc:`apache-airflow:howto/connection` for description of
+  connection and what capabilities of custom connection you can define.
+
+How to create your own provider
+"""""""""""""""""""""""""""""""
+
+Adding provider to Airflow is just a matter of building a Python package and adding the right meta-data to
+the package. We are using standard mechanism of python to define
+`entry points <https://docs.python.org/3/library/importlib.metadata.html#entry-points>`_ . Your package
+needs to define appropriate entry-point ``apache_airflow_provider`` which has to point to a callable
+implemented by your package and return a dictionary containing the list of discoverable capabilities
+of your package. The dictionary has to follow the
+`json-schema specification <https://github.com/apache/airflow/blob/master/airflow/provider.yaml.schema.json>`_.
+
+Most of the schema provides extension point for the documentation (which you might want to also use for
+your own purpose) but the two important fields from the extensibility point of view are:
+
+* ``extra-links`` - this field should contain the list of all operator class names that are adding extra links
+  capability. See :doc:`apache-airflow:howto/define_extra_link` for description of how to add extra link
+  capability to the operators of yours.
+
+* ``hook-class-names`` - this field should contain the list of all hook class names that provide
+  custom connection types with custom extra fields and field behaviour. See
+  :doc:`apache-airflow:howto/connection` for more details.
+
+
+When your providers are installed you can query the installed providers and their capabilities with
+``airflow providers`` command. This way you can verify if your providers are properly recognized and whether
+they define the extensions properly. See :doc:`cli-and-env-variables-ref` for details of available CLI
+sub-commands.
+
+When you write your own provider, consider following the
+`Naming conventions for provider packages <https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#naming-conventions-for-provider-packages>`_
+
+
 Q&A for Airflow and Providers
 '''''''''''''''''''''''''''''
 
 Upgrading Airflow 2.0 and Providers
 """""""""""""""""""""""""""""""""""
 
-Q. **When upgrading to a new Airflow version such as 2.0, but possibly 2.0.1 and beyond, is the best practice
-   to also upgrade provider packages at the same time?**
+**When upgrading to a new Airflow version such as 2.0, but possibly 2.0.1 and beyond, is the best practice
+to also upgrade provider packages at the same time?**
 
-A. It depends on your use case. If you have automated or semi-automated verification of your installation,
-   that you can run a new version of Airflow including all provider packages, then definitely go for it.
-   If you rely more on manual testing, it is advised that you upgrade in stages. Depending on your choice
-   you can either upgrade all used provider packages first, and then upgrade Airflow Core or the other way
-   round. The first approach - when you first upgrade all providers is probably safer, as you can do it
-   incrementally, step-by-step replacing provider by provider in your environment.
+It depends on your use case. If you have automated or semi-automated verification of your installation,
+that you can run a new version of Airflow including all provider packages, then definitely go for it.
+If you rely more on manual testing, it is advised that you upgrade in stages. Depending on your choice
+you can either upgrade all used provider packages first, and then upgrade Airflow Core or the other way
+round. The first approach - when you first upgrade all providers is probably safer, as you can do it
+incrementally, step-by-step replacing provider by provider in your environment.
 
 Using Backport Providers in Airflow 1.10
 """"""""""""""""""""""""""""""""""""""""
 
-Q. **I have an Airflow version (1.10.12) running and it is stable. However, because of a Cloud provider change,
-   I would like to upgrade the provider package. If I don't need to upgrade the Airflow version anymore,
-   how do I know that this provider version is compatible with my Airflow version?**
-
-
-A. Backport Provider Packages (those are needed in 1.10.* Airflow series) are going to be released for
-   3 months after the release. We will stop releasing new updates to the backport providers afterwards.
-   You will be able to continue using the provider packages that you already use and unless you need to
-   get some new release of the provider that is only released for 2.0, there is no need to upgrade
-   Airflow. This might happen if for example the provider is migrated to use newer version of client
-   libraries or when new features/operators/hooks are added to it. Those changes will only be
-   backported to 1.10.* compatible backport providers up to 3 months after releasing Airflow 2.0.
-   Also we expect more providers, changes and fixes added to the existing providers to come after the
-   3 months pass. Eventually you will have to upgrade to Airflow 2.0 if you would like to make use of those.
-   When it comes to compatibility of providers with different Airflow 2 versions, each
-   provider package will keep its own dependencies, and while we expect those providers to be generally
-   backwards-compatible, particular versions of particular providers might introduce dependencies on
-   specific Airflow versions.
+**I have an Airflow version (1.10.12) running and it is stable. However, because of a Cloud provider change,
+I would like to upgrade the provider package. If I don't need to upgrade the Airflow version anymore,
+how do I know that this provider version is compatible with my Airflow version?**
+
+
+Backport Provider Packages (those are needed in 1.10.* Airflow series) are going to be released for
+3 months after the release. We will stop releasing new updates to the backport providers afterwards.
+You will be able to continue using the provider packages that you already use and unless you need to
+get some new release of the provider that is only released for 2.0, there is no need to upgrade
+Airflow. This might happen if for example the provider is migrated to use newer version of client
+libraries or when new features/operators/hooks are added to it. Those changes will only be
+backported to 1.10.* compatible backport providers up to 3 months after releasing Airflow 2.0.
+Also we expect more providers, changes and fixes added to the existing providers to come after the
+3 months pass. Eventually you will have to upgrade to Airflow 2.0 if you would like to make use of those.
+When it comes to compatibility of providers with different Airflow 2 versions, each
+provider package will keep its own dependencies, and while we expect those providers to be generally
+backwards-compatible, particular versions of particular providers might introduce dependencies on
+specific Airflow versions.
 
 Customizing Provider Packages
 """""""""""""""""""""""""""""
 
-Q. **I have an older version of my provider package which we have lightly customized and is working
-   fine with my MSSQL installation. I am upgrading my Airflow version. Do I need to upgrade my provider,
-   or can I keep it as it is.**
+**I have an older version of my provider package which we have lightly customized and is working
+fine with my MSSQL installation. I am upgrading my Airflow version. Do I need to upgrade my provider,
+or can I keep it as it is?**
+
+It depends on the scope of customization. There is no need to upgrade the provider packages to later
+versions unless you want to upgrade to Airflow version that introduces backwards-incompatible changes.
+Generally speaking, with Airflow 2 we are following the `Semver <https://semver.org/>`_  approach where
+we will introduce backwards-incompatible changes in Major releases, so all your modifications (as long
+as you have not used internal Airflow classes) should work for All Airflow 2.* versions.
+
+
+Creating your own providers
+"""""""""""""""""""""""""""
+
+**When I write my own provider, do I need to do anything special to make it available to others?**
+
+You do not need to do anything special besides creating the ``apache_airflow_provider`` entry point
+returning properly formatted meta-data (dictionary with ``extra-links`` and ``hook-class-names`` fields.
+
+
+**Should I named my provider specifically or should it be created in ``airflow.providers`` package?**
+
+We have quite a number (>70) of providers managed by the community and we are going to maintain them
+together with Apache Airflow. All those providers have well-defined structured and follow the
+naming conventions we defined and they are all in ``airflow.providers`` package. If your intention is
+to contribute your provider, then you should follow those conventions and make a PR to Apache Airflow
+to contribute to it. But you are free to use any package name as long as there are no conflicts with other
+names,so preferably choose package that is in your "domain".
+
+**Is there a convention for a connection id and type?**
+
+Very good question. Glad that you asked. We usually follow the convention ``<NAME>_default`` for connection
+id and just ``<NAME>`` for connection type. Few examples:
+
+* ``google_cloud_default`` id and ``google_cloud_platform`` type
+* ``aws_default`` id and ``aws`` type
+
+You should follow this convention. It is important, to use unique names for connection type,
+so it should be unique for your provider. If two providers try to add connection with the same type
+only one of them will succeed.
+
+**Can I contribute my own provider to Apache Airflow?**
+
+However, community only accepts providers that are generic enough and can be managed
+by the community, so we might not always be in the position to accept such contributions. In case you
+have your own, specific provider, you are free to use your own structure and package names and to
+publish the providers in whatever form you find appropriate.
+
+**Can I advertise my own provider to Apache Airflow users and share it with others as package in PyPI?**
+
+Absolutely! We have an `Ecosystem <https://airflow.apache.org/ecosystem/>`_ area on our website where
+we share non-community managed extensions and work for Airflow. Feel free to make a PR to the page and
+add we will evaluate and merge it when we see that such provider can be useful for the community of
+Airflow users.
+
+**Can I charge for the use of my provider?**
 
-A. It depends on the scope of customization. There is no need to upgrade the provider packages to later
-   versions unless you want to upgrade to Airflow version that introduces backwards-incompatible changes.
-   Generally speaking, with Airflow 2 we are following the `Semver <https://semver.org/>`_  approach where
-   we will introduce backwards-incompatible changes in Major releases, so all your modifications (as long
-   as you have not used internal Airflow classes) should work for All Airflow 2.* versions.
+This is something that is outside of our control and domain. As an Apache project, we are
+commercial-friendly and there are many businesses built around Apache Airflow and many other
+Apache projects. As a community, we provide all the software for free and this will never
+change. What 3rd-party developers are doing is not under control of Apache Airflow community.
 
 
 Content
diff --git a/docs/apache-airflow/backport-providers.rst b/docs/apache-airflow/backport-providers.rst
index 5bfdc38..6d3a730 100644
--- a/docs/apache-airflow/backport-providers.rst
+++ b/docs/apache-airflow/backport-providers.rst
@@ -16,17 +16,12 @@
     under the License.
 
 
-
 Backport Providers
 ------------------
 
 Context: Airflow 2.0 operators, hooks, and secrets
 ''''''''''''''''''''''''''''''''''''''''''''''''''
 
-Currently, stable Apache Airflow versions are from the ``1.10.*`` series. We are working on the future, major version of
-Airflow - 2.0.* series. It is going to be released in 2020. However, the exact time of release depends on
-many factors and is not yet confirmed.
-
 We already have a lot of changes in the operators, transfers, hooks, sensors, secrets for many external systems, but
 they are not used nor tested widely because they are part of the master/2.0 release.
 
diff --git a/docs/apache-airflow/concepts.rst b/docs/apache-airflow/concepts.rst
index 31f5d2e..77b2f5d 100644
--- a/docs/apache-airflow/concepts.rst
+++ b/docs/apache-airflow/concepts.rst
@@ -424,24 +424,39 @@ combining them into a single operator. If it absolutely can't be avoided,
 Airflow does have a feature for operator cross-communication called XCom that is
 described in the section :ref:`XComs <concepts:xcom>`
 
-Airflow provides operators for many common tasks, including:
+Airflow provides many built-in operators for many common tasks, including:
 
 - :class:`~airflow.operators.bash.BashOperator` - executes a bash command
 - :class:`~airflow.operators.python.PythonOperator` - calls an arbitrary Python function
 - :class:`~airflow.operators.email.EmailOperator` - sends an email
+
+There are also other, commonly used operators that are installed together with airflow automatically,
+by pre-installing some :doc:`apache-airflow-providers:index` packages (they are always available no
+matter which extras you chose when installing Apache Airflow:
+
 - :class:`~airflow.providers.http.operators.http.SimpleHttpOperator` - sends an HTTP request
-- :class:`~airflow.providers.mysql.operators.mysql.MySqlOperator`,
-  :class:`~airflow.providers.sqlite.operators.sqlite.SqliteOperator`,
-  :class:`~airflow.providers.postgres.operators.postgres.PostgresOperator`,
-  :class:`~airflow.providers.microsoft.mssql.operators.mssql.MsSqlOperator`,
-  :class:`~airflow.providers.oracle.operators.oracle.OracleOperator`,
-  :class:`~airflow.providers.jdbc.operators.jdbc.JdbcOperator`, etc. - executes a SQL command
-
-In addition to these basic building blocks, there are many more specific
-operators: :class:`~airflow.providers.docker.operators.docker.DockerOperator`,
-:class:`~airflow.providers.apache.hive.operators.hive.HiveOperator`, :class:`~airflow.providers.amazon.aws.operators.s3_file_transform.S3FileTransformOperator`,
-:class:`~airflow.providers.mysql.transfers.presto_to_mysql.PrestoToMySqlOperator`,
-:class:`~airflow.providers.slack.operators.slack.SlackAPIOperator`... you get the idea!
+- :class:`~airflow.providers.sqlite.operators.sqlite.SqliteOperator` - SQLite DB operator
+
+In addition to these basic building blocks, there are many more specific operators developed by the
+community that you can install additionally by installing community-maintained provider packages. You
+can install them by adding an extra (for example (``[mysql]``) when installing Airflow or by installing
+additional packages manually (for example ``apache-airflow-providers-mysql`` package).
+
+Some examples of popular operators are:
+
+- :class:`~airflow.providers.mysql.operators.mysql.MySqlOperator`
+- :class:`~airflow.providers.postgres.operators.postgres.PostgresOperator`
+- :class:`~airflow.providers.microsoft.mssql.operators.mssql.MsSqlOperator`
+- :class:`~airflow.providers.oracle.operators.oracle.OracleOperator`
+- :class:`~airflow.providers.jdbc.operators.jdbc.JdbcOperator`
+- :class:`~airflow.providers.docker.operators.docker.DockerOperator`
+- :class:`~airflow.providers.apache.hive.operators.hive.HiveOperator`
+- :class:`~airflow.providers.amazon.aws.operators.s3_file_transform.S3FileTransformOperator`
+- :class:`~airflow.providers.mysql.transfers.presto_to_mysql.PrestoToMySqlOperator`,
+- :class:`~airflow.providers.slack.operators.slack.SlackAPIOperator`
+
+But there are many, many more - you can see the list of those by following the providers documentation
+at :doc:`apache-airflow-providers:index`.
 
 Operators are only loaded by Airflow if they are assigned to a DAG.
 
diff --git a/docs/apache-airflow/howto/connection.rst b/docs/apache-airflow/howto/connection.rst
index bc1d431..25f1646 100644
--- a/docs/apache-airflow/howto/connection.rst
+++ b/docs/apache-airflow/howto/connection.rst
@@ -322,3 +322,28 @@ Passwords cannot be manipulated or read without the key. For information on conf
 
 In addition to retrieving connections from environment variables or the metastore database, you can enable
 an secrets backend to retrieve connections. For more details see :doc:`/security/secrets/secrets-backend/index`.
+
+
+Custom connection types
+-----------------------
+
+Airflow allows to define custom connection types - including modification of the add/edit form for the
+connections. Custom connection types are defined in community maintained providers, but also you can add
+custom providers, that can add their own connection types. See :doc:`apache-airflow-providers:index`
+for description on how to add your own connection type via custom providers.
+
+The custom connection types are defined via Hooks delivered by the providers. The Hooks can implement
+methods defined in the protocol :class:`~airflow.hooks.base_hook.DiscoverableHook`. Note that your custom
+Hook should not derive from the class, the class is merely there to document expectations about class
+fields and methods that your Hook might define.
+
+By implementing those method in the hooks of yours and exposing them via ``hook-class-names`` array in
+the provider meta-data you can customize Airflow by:
+
+* Adding custom connection type
+* Adding automated Hook creation from the connection type
+* Adding custom form widget to display and edit custom "extra" parameters in your connection URL
+* Hiding fields that are not used for your connection
+* Adding placeholders showing examples of how fields should be formatted
+
+You can read more about details how to add custom connection type in the :doc:`apache-airflow-providers:index`
diff --git a/docs/apache-airflow/howto/custom-operator.rst b/docs/apache-airflow/howto/custom-operator.rst
index f42ed3b..f6bb443 100644
--- a/docs/apache-airflow/howto/custom-operator.rst
+++ b/docs/apache-airflow/howto/custom-operator.rst
@@ -91,7 +91,8 @@ Hooks act as an interface to communicate with the external shared resources in a
 For example, multiple tasks in a DAG can require access to a MySQL database. Instead of
 creating a connection per task, you can retrieve a connection from the hook and utilize it.
 Hook also helps to avoid storing connection auth parameters in a DAG.
-See :doc:`connection` for how to create and manage connections.
+See :doc:`connection` for how to create and manage connections and :doc:`apache-airflow-providers:index` for
+details of how to add your custom connection types via providers.
 
 Let's extend our previous example to fetch name from MySQL:
 
diff --git a/docs/apache-airflow/howto/define_extra_link.rst b/docs/apache-airflow/howto/define_extra_link.rst
index 2ab2659..d9277ae 100644
--- a/docs/apache-airflow/howto/define_extra_link.rst
+++ b/docs/apache-airflow/howto/define_extra_link.rst
@@ -55,15 +55,15 @@ The following code shows how to add extra links to an operator:
             self.log.info("Hello World!")
 
 You can also add a global operator extra link that will be available to
-all the operators through an airflow plugin. Learn more about it in the
-:ref:`plugin example <plugin-example>`.
+all the operators through an airflow plugin or through airflow providers. You can learn more about it in the
+:ref:`plugin example <plugin-example>` and in :doc:`apache-airflow-providers:index`.
 
 
 Add or override Links to Existing Operators
 -------------------------------------------
 
 You can also add (or override) an extra link to an existing operators
-through an Airflow plugin.
+through an Airflow plugin or custom provider.
 
 For example, the following Airflow plugin will add an Operator Link on all
 tasks using :class:`~airflow.providers.amazon.aws.transfers.gcs_to_s3.GCSToS3Operator` operator.
@@ -130,3 +130,23 @@ Console, but if we wanted to change that link we could:
     class AirflowExtraLinkPlugin(AirflowPlugin):
         name = "extra_link_plugin"
         operator_extra_links = [BigQueryConsoleLink(), ]
+
+
+**Adding Operator Links via Providers**
+
+As explained in :doc:`apache-airflow-providers:index`, when you create your own Airflow Provider, you can
+specify the list of operators that provide extra link capability. This happens by including the operator
+class name in the ``provider-info`` information stored in your Provider's package meta-data:
+
+Example meta-data required in your provider-info dictionary (this is part of the meta-data returned
+by ``apache-airflow-providers-google`` provider currently:
+
+.. code-block:: yaml
+
+    extra-links:
+      - airflow.providers.google.cloud.operators.bigquery.BigQueryConsoleLink
+      - airflow.providers.google.cloud.operators.bigquery.BigQueryConsoleIndexableLink
+      - airflow.providers.google.cloud.operators.mlengine.AIPlatformConsoleLink
+
+
+You can include as many operators with extra links as you want.
diff --git a/docs/apache-airflow/index.rst b/docs/apache-airflow/index.rst
index 96c32a4..39db70b 100644
--- a/docs/apache-airflow/index.rst
+++ b/docs/apache-airflow/index.rst
@@ -97,11 +97,11 @@ Content
     lineage
     dag-serialization
     modules_management
-    backport-providers
     smart-sensor
     changelog
     best-practices
     production-deployment
+    backport-providers
     faq
     privacy_notice
 
diff --git a/docs/apache-airflow/installation.rst b/docs/apache-airflow/installation.rst
index ad878dd..9b7cd5d 100644
--- a/docs/apache-airflow/installation.rst
+++ b/docs/apache-airflow/installation.rst
@@ -149,10 +149,13 @@ Unlike Apache Airflow 1.10, the Airflow 2.0 is delivered in multiple, separate,
 The core of Airflow scheduling system is delivered as ``apache-airflow`` package and there are around
 60 providers packages which can be installed separately as so called "Airflow Provider packages".
 The default Airflow installation doesn't have many integrations and you have to install them yourself.
-For more information, see: :doc:`apache-airflow-providers:index`
+
+You can even develop and install your own providers for Airflow. For more information,
+see: :doc:`apache-airflow-providers:index`
 
 For the list of the provider packages and what they enable, see: :doc:`apache-airflow-providers:packages-ref`.
 
+
 Initializing Airflow Database
 '''''''''''''''''''''''''''''