You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by ka...@apache.org on 2018/05/25 08:37:25 UTC

incubator-airflow git commit: [AIRFLOW-2523] Add how-to for managing GCP connections

Repository: incubator-airflow
Updated Branches:
  refs/heads/master 66f00bbf7 -> 4c0d67f0d


[AIRFLOW-2523] Add how-to for managing GCP connections

I'd like to have how-to guides for all connection
types, or at least the
different categories of connection types. I found
it difficult to figure
out how to manage a GCP connection, this commit
add a how-to guide for
this.

Also, since creating and editing connections
really aren't all that
different, the PR renames the "creating
connections" how-to to "managing
connections".

Closes #3419 from tswast/howto


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/4c0d67f0
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/4c0d67f0
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/4c0d67f0

Branch: refs/heads/master
Commit: 4c0d67f0d0094a5ef8a5ec2407fe91d16af01129
Parents: 66f00bb
Author: Tim Swast <sw...@google.com>
Authored: Fri May 25 09:37:29 2018 +0100
Committer: Kaxil Naik <ka...@apache.org>
Committed: Fri May 25 09:37:29 2018 +0100

----------------------------------------------------------------------
 docs/concepts.rst                 |  28 ++++---
 docs/howto/create-connection.rst  |   8 --
 docs/howto/index.rst              |   2 +-
 docs/howto/manage-connections.rst | 135 +++++++++++++++++++++++++++++++++
 docs/howto/secure-connections.rst |   7 --
 docs/img/connection_create.png    | Bin 0 -> 41547 bytes
 docs/img/connection_edit.png      | Bin 0 -> 53636 bytes
 docs/img/connections.png          | Bin 93057 -> 48442 bytes
 docs/integration.rst              |   3 +
 9 files changed, 152 insertions(+), 31 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c0d67f0/docs/concepts.rst
----------------------------------------------------------------------
diff --git a/docs/concepts.rst b/docs/concepts.rst
index c28b10f..866f916 100644
--- a/docs/concepts.rst
+++ b/docs/concepts.rst
@@ -308,6 +308,8 @@ UI. As slots free up, queued tasks start running based on the
 Note that by default tasks aren't assigned to any pool and their
 execution parallelism is only limited to the executor's setting.
 
+.. _concepts-connections:
+
 Connections
 ===========
 
@@ -324,16 +326,12 @@ from ``BaseHook``, Airflow will choose one connection randomly, allowing
 for some basic load balancing and fault tolerance when used in conjunction
 with retries.
 
-Airflow also has the ability to reference connections via environment
-variables from the operating system. The environment variable needs to be
-prefixed with ``AIRFLOW_CONN_`` to be considered a connection. When
-referencing the connection in the Airflow pipeline, the ``conn_id`` should
-be the name of the variable without the prefix. For example, if the ``conn_id``
-is named ``postgres_master`` the environment variable should be named
-``AIRFLOW_CONN_POSTGRES_MASTER`` (note that the environment variable must be
-all uppercase). Airflow assumes the value returned from the environment
-variable to be in a URI format (e.g.
-``postgres://user:password@localhost:5432/master`` or ``s3://accesskey:secretkey@S3``).
+Many hooks have a default ``conn_id``, where operators using that hook do not
+need to supply an explicit connection ID. For example, the default
+``conn_id`` for the :class:`~airflow.hooks.postgres_hook.PostgresHook` is
+``postgres_default``.
+
+See :doc:`howto/manage-connections` for how to create and manage connections.
 
 Queues
 ======
@@ -410,7 +408,7 @@ Variables
 Variables are a generic way to store and retrieve arbitrary content or
 settings as a simple key value store within Airflow. Variables can be
 listed, created, updated and deleted from the UI (``Admin -> Variables``),
-code or CLI. In addition, json settings files can be bulk uploaded through 
+code or CLI. In addition, json settings files can be bulk uploaded through
 the UI. While your pipeline code definition and most of your constants
 and variables should be defined in code and stored in source control,
 it can be useful to have some variables or configuration items
@@ -427,18 +425,18 @@ The second call assumes ``json`` content and will be deserialized into
 ``bar``. Note that ``Variable`` is a sqlalchemy model and can be used
 as such.
 
-You can use a variable from a jinja template with the syntax : 
+You can use a variable from a jinja template with the syntax :
 
 .. code:: bash
 
     echo {{ var.value.<variable_name> }}
-    
-or if you need to deserialize a json object from the variable : 
+
+or if you need to deserialize a json object from the variable :
 
 .. code:: bash
 
     echo {{ var.json.<variable_name> }}
-    
+
 
 Branching
 =========

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c0d67f0/docs/howto/create-connection.rst
----------------------------------------------------------------------
diff --git a/docs/howto/create-connection.rst b/docs/howto/create-connection.rst
deleted file mode 100644
index ba9f444..0000000
--- a/docs/howto/create-connection.rst
+++ /dev/null
@@ -1,8 +0,0 @@
-Creating a Connection
-=====================
-
-Connections in Airflow pipelines can be created using environment variables.
-The environment variable needs to have a prefix of ``AIRFLOW_CONN_`` for
-Airflow with the value in a URI format to use the connection properly. Please
-see the :doc:`../../concepts` documentation for more information on environment
-variables and connections.

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c0d67f0/docs/howto/index.rst
----------------------------------------------------------------------
diff --git a/docs/howto/index.rst b/docs/howto/index.rst
index 5c22a5d..1342ed8 100644
--- a/docs/howto/index.rst
+++ b/docs/howto/index.rst
@@ -12,8 +12,8 @@ configuring an Airflow environment.
 
     set-config
     initialize-database
+    manage-connections
     secure-connections
-    create-connection
     write-logs
     executor/use-celery
     executor/use-dask

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c0d67f0/docs/howto/manage-connections.rst
----------------------------------------------------------------------
diff --git a/docs/howto/manage-connections.rst b/docs/howto/manage-connections.rst
new file mode 100644
index 0000000..f520315
--- /dev/null
+++ b/docs/howto/manage-connections.rst
@@ -0,0 +1,135 @@
+Managing Connections
+=====================
+
+Airflow needs to know how to connect to your environment. Information
+such as hostname, port, login and passwords to other systems and services is
+handled in the ``Admin->Connection`` section of the UI. The pipeline code you
+will author will reference the 'conn_id' of the Connection objects.
+
+.. image:: ../img/connections.png
+
+Connections can be created and managed using either the UI or environment
+variables.
+
+See the :ref:`Connenctions Concepts <concepts-connections>` documentation for
+more information.
+
+Creating a Connection with the UI
+---------------------------------
+
+Open the ``Admin->Connection`` section of the UI. Click the ``Create`` link
+to create a new connection.
+
+.. image:: ../img/connection_create.png
+
+1. Fill in the ``Conn Id`` field with the desired connection ID. It is
+   recommended that you use lower-case characters and separate words with
+   underscores.
+2. Choose the connection type with the ``Conn Type`` field.
+3. Fill in the remaining fields. See
+   :ref:`manage-connections-connection-types` for a description of the fields
+   belonging to the different connection types.
+4. Click the ``Save`` button to create the connection.
+
+Editing a Connection with the UI
+--------------------------------
+
+Open the ``Admin->Connection`` section of the UI. Click the pencil icon next
+to the connection you wish to edit in the connection list.
+
+.. image:: ../img/connection_edit.png
+
+Modify the connection properties and click the ``Save`` button to save your
+changes.
+
+Creating a Connection with Environment Variables
+------------------------------------------------
+
+Connections in Airflow pipelines can be created using environment variables.
+The environment variable needs to have a prefix of ``AIRFLOW_CONN_`` for
+Airflow with the value in a URI format to use the connection properly.
+
+When referencing the connection in the Airflow pipeline, the ``conn_id``
+should be the name of the variable without the prefix. For example, if the
+``conn_id`` is named ``postgres_master`` the environment variable should be
+named ``AIRFLOW_CONN_POSTGRES_MASTER`` (note that the environment variable
+must be all uppercase). Airflow assumes the value returned from the
+environment variable to be in a URI format (e.g.
+``postgres://user:password@localhost:5432/master`` or
+``s3://accesskey:secretkey@S3``).
+
+.. _manage-connections-connection-types:
+
+Connection Types
+----------------
+
+.. _connection-type-GCP:
+
+Google Cloud Platform
+~~~~~~~~~~~~~~~~~~~~~
+
+The Google Cloud Platform connection type enables the :ref:`GCP Integrations
+<GCP>`.
+
+Authenticating to GCP
+'''''''''''''''''''''
+
+There are two ways to connect to GCP using Airflow.
+
+1. Use `Application Default Credentials
+   <https://google-auth.readthedocs.io/en/latest/reference/google.auth.html#google.auth.default>`_,
+   such as via the metadata server when running on Google Compute Engine.
+2. Use a `service account
+   <https://cloud.google.com/docs/authentication/#service_accounts>`_ key
+   file (JSON format) on disk.
+
+Default Connection IDs
+''''''''''''''''''''''
+
+The following connection IDs are used by default.
+
+``bigquery_default``
+    Used by the :class:`~airflow.contrib.hooks.bigquery_hook.BigQueryHook`
+    hook.
+
+``google_cloud_datastore_default``
+    Used by the :class:`~airflow.contrib.hooks.datastore_hook.DatastoreHook`
+    hook.
+
+``google_cloud_default``
+    Used by the
+    :class:`~airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook`,
+    :class:`~airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook`,
+    :class:`~airflow.contrib.hooks.gcp_dataproc_hook.DataProcHook`,
+    :class:`~airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook`, and
+    :class:`~airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook` hooks.
+
+Configuring the Connection
+''''''''''''''''''''''''''
+
+Project Id (required)
+    The Google Cloud project ID to connect to.
+
+Keyfile Path
+    Path to a `service account
+    <https://cloud.google.com/docs/authentication/#service_accounts>`_ key
+    file (JSON format) on disk.
+
+    Not required if using application default credentials.
+
+Keyfile JSON
+    Contents of a `service account
+    <https://cloud.google.com/docs/authentication/#service_accounts>`_ key
+    file (JSON format) on disk. It is recommended to :doc:`Secure your connections <secure-connections>` if using this method to authenticate.
+
+    Not required if using application default credentials.
+
+Scopes (comma separated)
+    A list of comma-separated `Google Cloud scopes
+    <https://developers.google.com/identity/protocols/googlescopes>`_ to
+    authenticate with.
+
+    .. note::
+        Scopes are ignored when using application default credentials. See
+        issue `AIRFLOW-2522
+        <https://issues.apache.org/jira/browse/AIRFLOW-2522>`_.

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c0d67f0/docs/howto/secure-connections.rst
----------------------------------------------------------------------
diff --git a/docs/howto/secure-connections.rst b/docs/howto/secure-connections.rst
index 5d468b3..b9c1fa1 100644
--- a/docs/howto/secure-connections.rst
+++ b/docs/howto/secure-connections.rst
@@ -1,13 +1,6 @@
 Securing Connections
 ====================
 
-Airflow needs to know how to connect to your environment. Information
-such as hostname, port, login and passwords to other systems and services is
-handled in the ``Admin->Connection`` section of the UI. The pipeline code you
-will author will reference the 'conn_id' of the Connection objects.
-
-.. image:: ../img/connections.png
-
 By default, Airflow will save the passwords for the connection in plain text
 within the metadata database. The ``crypto`` package is highly recommended
 during installation. The ``crypto`` package does require that your operating

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c0d67f0/docs/img/connection_create.png
----------------------------------------------------------------------
diff --git a/docs/img/connection_create.png b/docs/img/connection_create.png
new file mode 100644
index 0000000..8a574d4
Binary files /dev/null and b/docs/img/connection_create.png differ

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c0d67f0/docs/img/connection_edit.png
----------------------------------------------------------------------
diff --git a/docs/img/connection_edit.png b/docs/img/connection_edit.png
new file mode 100644
index 0000000..c6d14da
Binary files /dev/null and b/docs/img/connection_edit.png differ

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c0d67f0/docs/img/connections.png
----------------------------------------------------------------------
diff --git a/docs/img/connections.png b/docs/img/connections.png
index d07a130..3a28473 100644
Binary files a/docs/img/connections.png and b/docs/img/connections.png differ

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c0d67f0/docs/integration.rst
----------------------------------------------------------------------
diff --git a/docs/integration.rst b/docs/integration.rst
index 9fa9bbb..3d43685 100644
--- a/docs/integration.rst
+++ b/docs/integration.rst
@@ -316,6 +316,9 @@ Airflow has extensive support for the Google Cloud Platform. But note that most
 Operators are in the contrib section. Meaning that they have a *beta* status, meaning that
 they can have breaking changes between minor releases.
 
+See the :ref:`GCP connection type <connection-type-GCP>` documentation to
+configure connections to GCP.
+
 Logging
 '''''''