You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by po...@apache.org on 2022/01/23 13:21:54 UTC

[airflow] branch v2-2-test updated (eb3ded7 -> 1b1bfc5)

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a change to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git.


    from eb3ded7  Hide version selector for non-versioned packages (#21041)
     new 3714aa3  Adds back documentation about context usage in Python/@task (#18868)
     new ce341b6  Updating explicit arg example in TaskFlow API tutorial doc (#18907)
     new a718042  Adds Pendulum 1.x -> 2.x upgrade documentation (#18955)
     new d183404  Update CSV ingest code for tutorial (#18960)
     new 0b45b27  Add docker-compose explanation to conn localhost (#19076)
     new 2cc9ed0  Doc: Improve tutorial documentation and code (#19186)
     new d46674a  docs: reorder imports in tutorials 🎨 (#19035)
     new fbad277  Fix PostgresHook import in tutorial (#19374)
     new ce21b8e  Change the name of link to ASF downloads (#19441)
     new 7d37b0e  Clean up ``default_args`` usage in docs (#19803)
     new 4608eaa  Fix example code in Doc (#19824)
     new 346656f  Add requirements.txt description (#20048)
     new 239b1dc  Correct set-up-database.rst (#20090)
     new 8ecdcb9  Fix typo in MySQL Database creation code (Set up DB docs)  (#20102)
     new e2bb598  Fix grammar and typos in "Logging for Tasks" guide (#20146)
     new e0b262c  Deprecate smart sensors (#20151)
     new 312577e  Removes unnecessary --upgrade option from our examples (#20537)
     new 543a78b  Improve documentation on ``Params`` (#20567)
     new 111e8c1  Update operators.rst (#20640)
     new 7cd3fd6  Compare taskgroup and subdag (#20700)
     new 2dbe1e9  Update metric name in documentation (#20764)
     new c528166  Python3 requisite start local (#20777)
     new d5870f0  Doc: Added an enum param example (#20841)
     new 1b1bfc5  Fix grammar in ``dags.rst`` (#20988)

The 24 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 UPDATING.md                                        |  11 +-
 airflow/example_dags/example_subdag_operator.py    |  14 +-
 airflow/example_dags/tutorial.py                   |  51 +++---
 airflow/example_dags/tutorial_etl_dag.py           |  14 +-
 airflow/jobs/scheduler_job.py                      |  12 ++
 .../google/cloud/example_dags/example_functions.py |   2 +-
 airflow/sensors/base.py                            |   8 +
 docs/apache-airflow/best-practices.rst             |   2 +-
 docs/apache-airflow/concepts/dags.rst              |  84 +++++++--
 docs/apache-airflow/concepts/deferring.rst         |   3 +-
 docs/apache-airflow/concepts/operators.rst         |  14 ++
 docs/apache-airflow/concepts/params.rst            | 152 ++++++++++++++---
 docs/apache-airflow/concepts/smart-sensors.rst     |  26 ++-
 docs/apache-airflow/dag-run.rst                    |  16 +-
 docs/apache-airflow/faq.rst                        |   3 +-
 docs/apache-airflow/howto/operator/python.rst      |  10 ++
 docs/apache-airflow/howto/set-up-database.rst      |   4 +-
 docs/apache-airflow/installation/index.rst         |   2 +-
 .../installation/installing-from-pypi.rst          |   2 +-
 .../installation/installing-from-sources.rst       |   2 +-
 docs/apache-airflow/lineage.rst                    |   4 +-
 .../logging-monitoring/logging-tasks.rst           |  36 ++--
 docs/apache-airflow/logging-monitoring/metrics.rst |   8 +-
 docs/apache-airflow/pipeline_example.csv           | 190 ++++++++++-----------
 docs/apache-airflow/start/docker.rst               |   2 +
 docs/apache-airflow/start/local.rst                |   2 +
 docs/apache-airflow/timezone.rst                   |  14 +-
 docs/apache-airflow/tutorial.rst                   | 138 ++++++++-------
 docs/apache-airflow/tutorial_taskflow_api.rst      |  44 +++++
 docs/apache-airflow/upgrading-from-1-10/index.rst  |  15 ++
 .../installing-helm-chart-from-sources.rst         |   2 +-
 docs/installing-providers-from-sources.rst         |   2 +-
 32 files changed, 578 insertions(+), 311 deletions(-)

[airflow] 23/24: Doc: Added an enum param example (#20841)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit d5870f0d644d957654f70fee53597651a67a2d8a
Author: Matt Rixman <58...@users.noreply.github.com>
AuthorDate: Wed Jan 12 23:48:35 2022 -0700

    Doc: Added an enum param example (#20841)
    
    More examples makes it easier to compare our docs with the json-schema docs and figure out how they work together.
    I ended up doing something similar to this in my code and figured I'd contribute an example.
    
    Co-authored-by: Matt Rixman <Ma...@users.noreply.github.com>
    (cherry picked from commit 8dc68d47048d559cf4b76874d8d5e7a5af6359b6)
---
 docs/apache-airflow/concepts/params.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/apache-airflow/concepts/params.rst b/docs/apache-airflow/concepts/params.rst
index ef266ea..fd4a167 100644
--- a/docs/apache-airflow/concepts/params.rst
+++ b/docs/apache-airflow/concepts/params.rst
@@ -134,6 +134,9 @@ JSON Schema Validation
             # a required param which can be of multiple types
             "dummy": Param(type=["null", "number", "string"]),
 
+            # an enum param, must be one of three values
+            "enum_param": Param("foo", enum=["foo", "bar", 42]),
+
             # a param which uses json-schema formatting
             "email": Param(
                 default="example@example.com",

[airflow] 24/24: Fix grammar in ``dags.rst`` (#20988)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 1b1bfc5b4a1a3bb843e4912ac750bdb038ce2881
Author: aabhaschopra <51...@users.noreply.github.com>
AuthorDate: Fri Jan 21 19:59:26 2022 +0530

    Fix grammar in ``dags.rst`` (#20988)
    
    grammar correction
    
    (cherry picked from commit 754d8bcb5a2d461b71bebfa261a0c41a995d79e4)
---
 docs/apache-airflow/concepts/dags.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/apache-airflow/concepts/dags.rst b/docs/apache-airflow/concepts/dags.rst
index 8d9b387..3edaf35 100644
--- a/docs/apache-airflow/concepts/dags.rst
+++ b/docs/apache-airflow/concepts/dags.rst
@@ -260,7 +260,7 @@ The task_id returned by the Python function has to reference a task directly dow
 
     .. image:: /img/branch_note.png
 
-    The paths of the branching task are ``branch_a``, ``join`` and ``branch_b``. Since ``join`` is a downstream task of ``branch_a``, it will be still be run, even though it was not returned as part of the branch decision.
+    The paths of the branching task are ``branch_a``, ``join`` and ``branch_b``. Since ``join`` is a downstream task of ``branch_a``, it will still be run, even though it was not returned as part of the branch decision.
 
 The ``BranchPythonOperator`` can also be used with XComs allowing branching context to dynamically decide what branch to follow based on upstream tasks. For example:
 

[airflow] 03/24: Adds Pendulum 1.x -> 2.x upgrade documentation (#18955)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit a7180427e2957d518be122a044c00fd30700c8f5
Author: Pradyumna Rahul <pr...@gmail.com>
AuthorDate: Thu Oct 14 23:26:28 2021 +0530

    Adds Pendulum 1.x -> 2.x upgrade documentation (#18955)
    
    closes: #18634
    
    Adds documentation about the upgrade from Pendulum `1.x` to `2.x` as discussed in the issue.
    
    Assumptions that were made:
    
    - Most of the Pendulum changes are already documented in the official Pendulum docs.
    
    Added the following:
    
    - Mention the upgrade from `1.x` to `2.x`
    - Added an example of a code snippet that will now throw errors
    - Added link to official pendulum `2.x` docs that discuss the changes from `1.x` to `2.x`
    
    The macros documentation as mentioned in the issue were actually pointing to the updated Pendulum documentation, so no changes were added for the same. For instance, consider the link for the macro [prev_execution_date](https://pendulum.eustace.io/docs/#introduction)
    
    (cherry picked from commit 141d9f2d5d3e47fe7beebd6a56953df1f727746e)
---
 docs/apache-airflow/upgrading-from-1-10/index.rst | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/docs/apache-airflow/upgrading-from-1-10/index.rst b/docs/apache-airflow/upgrading-from-1-10/index.rst
index 603796c..85c04a5 100644
--- a/docs/apache-airflow/upgrading-from-1-10/index.rst
+++ b/docs/apache-airflow/upgrading-from-1-10/index.rst
@@ -355,6 +355,21 @@ Old Keys                 New keys
 
 For more information, visit https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-oauth
 
+**Breaking Change in Pendulum Support**
+
+Airflow has upgraded from Pendulum 1.x to Pendulum 2.x.
+This comes with a few breaking changes as certain methods and their definitions in Pendulum 2.x
+have changed or have been removed.
+
+For instance the following snippet will now throw errors:
+
+.. code-block:: python
+
+    execution_date.format("YYYY-MM-DD HH:mm:ss", formatter="alternative")
+
+as the ``formatter`` option is not supported in Pendulum 2.x and ``alternative`` is used by default.
+
+For more information, visit https://pendulum.eustace.io/blog/pendulum-2.0.0-is-out.html
 
 Step 6: Upgrade Configuration settings
 '''''''''''''''''''''''''''''''''''''''''''

[airflow] 12/24: Add requirements.txt description (#20048)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 346656ff38e8d2b7b94fa879ee86be253194ed48
Author: Dmytro Kazanzhy <dk...@gmail.com>
AuthorDate: Sun Dec 5 23:49:25 2021 +0200

    Add requirements.txt description (#20048)
    
    (cherry picked from commit 7627de383e5cdef91ca0871d8107be4e5f163882)
---
 docs/apache-airflow/howto/operator/python.rst | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/docs/apache-airflow/howto/operator/python.rst b/docs/apache-airflow/howto/operator/python.rst
index ec56b70..170adfe 100644
--- a/docs/apache-airflow/howto/operator/python.rst
+++ b/docs/apache-airflow/howto/operator/python.rst
@@ -80,6 +80,16 @@ Otherwise you won't have access to the most context variables of Airflow in ``op
 If you want the context related to datetime objects like ``data_interval_start`` you can add ``pendulum`` and
 ``lazy_object_proxy``.
 
+If additional parameters for package installation are needed pass them in ``requirements.txt`` as in the example below:
+
+.. code-block::
+
+  SomePackage==0.2.1 --pre --index-url http://some.archives.com/archives
+  AnotherPackage==1.4.3 --no-index --find-links /my/local/archives
+
+All supported options are listed in the `requirements file format <https://pip.pypa.io/en/stable/reference/requirements-file-format/#supported-options>`_.
+
+
 Templating
 ^^^^^^^^^^
 

[airflow] 05/24: Add docker-compose explanation to conn localhost (#19076)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 0b45b27580dcef38fa25c9f1ab44a6a71bb2583a
Author: Fernando Bugni <fe...@gmail.com>
AuthorDate: Tue Oct 19 17:24:32 2021 -0300

    Add docker-compose explanation to conn localhost (#19076)
    
    (cherry picked from commit fe2890246a7c3d014daa909cb19ac523edb15005)
---
 docs/apache-airflow/start/docker.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/apache-airflow/start/docker.rst b/docs/apache-airflow/start/docker.rst
index 22763d9..023f669 100644
--- a/docs/apache-airflow/start/docker.rst
+++ b/docs/apache-airflow/start/docker.rst
@@ -82,6 +82,8 @@ This file contains several service definitions:
 - ``postgres`` - The database.
 - ``redis`` - `The redis <https://redis.io/>`__ - broker that forwards messages from scheduler to worker.
 
+In general, if you want to use airflow locally, your DAGs may try to connect to servers which are running on the host. In order to achieve that, an extra configuration must be added in ``docker-compose.yaml``. For example, on Linux the configuration must be in the section ``services: airflow-worker`` adding ``extra_hosts: - "host.docker.internal:host-gateway"``; and use ``host.docker.internal`` instead of ``localhost``. This configuration vary in different platforms. Please, see document [...]
+
 All these services allow you to run Airflow with :doc:`CeleryExecutor </executor/celery>`. For more information, see :doc:`/concepts/overview`.
 
 Some directories in the container are mounted, which means that their contents are synchronized between your computer and the container.

[airflow] 22/24: Python3 requisite start local (#20777)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit c528166d696416950144d2891e14d0441ecac5d4
Author: Michael Robinson <68...@users.noreply.github.com>
AuthorDate: Mon Jan 10 14:55:33 2022 -0500

    Python3 requisite start local (#20777)
    
    (cherry picked from commit bb5bd64948596f284dd70929010724946cbfa414)
---
 docs/apache-airflow/start/local.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/apache-airflow/start/local.rst b/docs/apache-airflow/start/local.rst
index 6441306..8d0c71e 100644
--- a/docs/apache-airflow/start/local.rst
+++ b/docs/apache-airflow/start/local.rst
@@ -24,6 +24,8 @@ This quick start guide will help you bootstrap an Airflow standalone instance on
 
 .. note::
 
+   Successful installation requires a Python 3 environment.
+
    Only ``pip`` installation is currently officially supported.
 
    While there have been successes with using other tools like `poetry <https://python-poetry.org/>`_ or

[airflow] 01/24: Adds back documentation about context usage in Python/@task (#18868)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 3714aa321a52d1130885523c73d72845f12b9009
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Mon Oct 11 17:12:37 2021 +0200

    Adds back documentation about context usage in Python/@task (#18868)
    
    There were many questions recently along the line of
    "How do I access context from TaskFlow task". Surprisingly,
    the paragraph about accessing current context was removed from
    the "Concepts" (where it was there for Airflow 2.0.0) but was
    never added to the "TaskFlow Tutorial" where it actually belongs.
    
    Also Python Operator's description about passing context
    variables as kwargs have been removed when `provide_context`
    parameter was removed (it was only present in the docstring
    of `provide_context` and you could likely deduce this behaviour
    from several examples, but it was not mentioned anywhere.
    
    This PR adds the description with examples to the Python operator
    as well as adds similar description in TaskFlow tutorial, including
    the possibility of using `get_current_context` deep down the
    stack to retrieve the context variables even if they are not
    passed via kwargs.
    
    (cherry picked from commit d6c8730ebc737e0200091cdf58231cd3cf6afd84)
---
 docs/apache-airflow/tutorial_taskflow_api.rst | 42 +++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/docs/apache-airflow/tutorial_taskflow_api.rst b/docs/apache-airflow/tutorial_taskflow_api.rst
index 23c982b..076f48b 100644
--- a/docs/apache-airflow/tutorial_taskflow_api.rst
+++ b/docs/apache-airflow/tutorial_taskflow_api.rst
@@ -409,6 +409,48 @@ method of the Python operator.
 Current context is accessible only during the task execution. The context is not accessible during
 ``pre_execute`` or ``post_execute``. Calling this method outside execution context will raise an error.
 
+Accessing context variables in decorated tasks
+----------------------------------------------
+
+When running your callable, Airflow will pass a set of keyword arguments that can be used in your
+function. This set of kwargs correspond exactly to what you can use in your jinja templates.
+For this to work, you need to define ``**kwargs`` in your function header, or you can add directly the
+keyword arguments you would like to get - for example with the below code your callable will get
+the values of ``ti`` and ``next_ds`` context variables.
+
+With explicit arguments:
+
+.. code-block:: python
+
+   @task
+   def my_python_callable(ti, next_ds):
+       pass
+
+With kwargs:
+
+.. code-block:: python
+
+   @task
+   def my_python_callable(**kwargs):
+       ti = kwargs["ti"]
+       next_ds = kwargs["next_ds"]
+
+Also sometimes you might want to access the context somewhere deep the stack - and you do not want to pass
+the context variables from the task callable. You can do it via ``get_current_context``
+method of the Python operator.
+
+.. code-block:: python
+
+    from airflow.operators.python import get_current_context
+
+
+    def some_function_in_your_library():
+        context = get_current_context()
+        ti = context["ti"]
+
+Current context is accessible only during the task execution. The context is not accessible during
+``pre_execute`` or ``post_execute``. Calling this method outside execution context will raise an error.
+
 
 What's Next?
 ------------

[airflow] 19/24: Update operators.rst (#20640)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 111e8c167c76c7a55e02b7290a9b8114e680b8e2
Author: Marcin Molak <fr...@gmail.com>
AuthorDate: Thu Jan 6 16:36:09 2022 +0100

    Update operators.rst (#20640)
    
    (cherry picked from commit fa802ede6c4763c8f432100ca78a313f147a77a0)
---
 docs/apache-airflow/concepts/operators.rst | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/docs/apache-airflow/concepts/operators.rst b/docs/apache-airflow/concepts/operators.rst
index c66c2eb..13020f1 100644
--- a/docs/apache-airflow/concepts/operators.rst
+++ b/docs/apache-airflow/concepts/operators.rst
@@ -208,3 +208,17 @@ In this case, ``order_data`` argument is passed: ``{"1001": 301.27, "1002": 433.
 Airflow uses Jinja's `NativeEnvironment <https://jinja.palletsprojects.com/en/2.11.x/nativetypes/>`_
 when ``render_template_as_native_obj`` is set to ``True``.
 With ``NativeEnvironment``, rendering a template produces a native Python type.
+
+.. _concepts:reserved-keywords:
+
+Reserved params keyword
+-----------------------
+
+In Apache Airflow 2.2.0 ``params`` variable is used during DAG serialization. Please do not use that name in third party operators.
+If you upgrade your environment and get the following error:
+
+.. code-block::
+
+    AttributeError: 'str' object has no attribute '__module__'
+
+change name from ``params`` in your operators.

[airflow] 07/24: docs: reorder imports in tutorials 🎨 (#19035)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit d46674a8109e31e31ebbe7b9803a53f28a8dd58e
Author: Meysam <Me...@gmail.com>
AuthorDate: Fri Oct 29 04:48:03 2021 +0300

    docs: reorder imports in tutorials 🎨 (#19035)
    
    Co-authored-by: Kaxil Naik <ka...@gmail.com>
    (cherry picked from commit 04e5fefe2f6eb40165fccbea2eb4b72bf353938a)
---
 docs/apache-airflow/tutorial.rst | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/docs/apache-airflow/tutorial.rst b/docs/apache-airflow/tutorial.rst
index acb7e84..d40f8a9 100644
--- a/docs/apache-airflow/tutorial.rst
+++ b/docs/apache-airflow/tutorial.rst
@@ -415,10 +415,9 @@ Let's break this down into 2 steps: get data & merge data:
 
 .. code-block:: python
 
-  from airflow.decorators import dag, task
-  from airflow.hooks.postgres import PostgresHook
-  from datetime import datetime, timedelta
   import requests
+  from airflow.decorators import task
+  from airflow.hooks.postgres import PostgresHook
 
 
   @task
@@ -447,6 +446,10 @@ Here we are passing a ``GET`` request to get the data from the URL and save it i
 
 .. code-block:: python
 
+  from airflow.decorators import task
+  from airflow.hooks.postgres import PostgresHook
+
+
   @task
   def merge_data():
       query = """
@@ -475,10 +478,11 @@ Lets look at our DAG:
 
 .. code-block:: python
 
-  from airflow.decorators import dag, task
-  from airflow.hooks.postgres_hook import PostgresHook
   from datetime import datetime, timedelta
+
   import requests
+  from airflow.decorators import dag, task
+  from airflow.hooks.postgres import PostgresHook
 
 
   @dag(

[airflow] 08/24: Fix PostgresHook import in tutorial (#19374)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit fbad2776c21abcdbd78aa55ea59a24bca9fe5c04
Author: Will Douglas <wi...@gmail.com>
AuthorDate: Wed Nov 3 01:00:12 2021 -0600

    Fix PostgresHook import in tutorial (#19374)
    
    (cherry picked from commit 338822b41ecffa61b2cf47aed3c3845004c95f60)
---
 docs/apache-airflow/tutorial.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/apache-airflow/tutorial.rst b/docs/apache-airflow/tutorial.rst
index d40f8a9..7e27d54 100644
--- a/docs/apache-airflow/tutorial.rst
+++ b/docs/apache-airflow/tutorial.rst
@@ -447,7 +447,7 @@ Here we are passing a ``GET`` request to get the data from the URL and save it i
 .. code-block:: python
 
   from airflow.decorators import task
-  from airflow.hooks.postgres import PostgresHook
+  from airflow.providers.postgres.hooks.postgres import PostgresHook
 
 
   @task

[airflow] 10/24: Clean up ``default_args`` usage in docs (#19803)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 7d37b0e3ed9a62480e56a21b137b3b78e5fcc259
Author: Josh Fell <48...@users.noreply.github.com>
AuthorDate: Thu Nov 25 06:47:23 2021 -0500

    Clean up ``default_args`` usage in docs (#19803)
    
    This PR aligns `default_args` usage within docs to updates that have been made to example DAGs across the board. The main types of updates include:
    - Removing `start_date` from being declared in `default_args`.
    - Removing the pattern of declaring `default_args` separately from the `DAG()` object.
    - Updating `default_args` values to more relevant examples.
    - Replace `DummyOperator` with another operator to make some other `default_args` updates relevant and applicable.
    
    (cherry picked from commit 744d11bdb2acd52794a959572695943df8729a37)
---
 airflow/example_dags/example_subdag_operator.py    | 14 +++---
 airflow/example_dags/tutorial.py                   | 51 ++++++++++------------
 airflow/example_dags/tutorial_etl_dag.py           | 14 +++---
 .../google/cloud/example_dags/example_functions.py |  2 +-
 docs/apache-airflow/best-practices.rst             |  2 +-
 docs/apache-airflow/concepts/dags.rst              | 39 +++++++++++------
 docs/apache-airflow/dag-run.rst                    | 16 +++----
 docs/apache-airflow/faq.rst                        |  3 +-
 docs/apache-airflow/lineage.rst                    |  4 +-
 docs/apache-airflow/timezone.rst                   | 14 +++---
 docs/apache-airflow/tutorial.rst                   |  1 +
 11 files changed, 78 insertions(+), 82 deletions(-)

diff --git a/airflow/example_dags/example_subdag_operator.py b/airflow/example_dags/example_subdag_operator.py
index f27aec7..424dc7f 100644
--- a/airflow/example_dags/example_subdag_operator.py
+++ b/airflow/example_dags/example_subdag_operator.py
@@ -27,12 +27,12 @@ from airflow.utils.dates import days_ago
 
 DAG_NAME = 'example_subdag_operator'
 
-args = {
-    'owner': 'airflow',
-}
-
 with DAG(
-    dag_id=DAG_NAME, default_args=args, start_date=days_ago(2), schedule_interval="@once", tags=['example']
+    dag_id=DAG_NAME,
+    default_args={"retries": 2},
+    start_date=days_ago(2),
+    schedule_interval="@once",
+    tags=['example'],
 ) as dag:
 
     start = DummyOperator(
@@ -41,7 +41,7 @@ with DAG(
 
     section_1 = SubDagOperator(
         task_id='section-1',
-        subdag=subdag(DAG_NAME, 'section-1', args),
+        subdag=subdag(DAG_NAME, 'section-1', dag.default_args),
     )
 
     some_other_task = DummyOperator(
@@ -50,7 +50,7 @@ with DAG(
 
     section_2 = SubDagOperator(
         task_id='section-2',
-        subdag=subdag(DAG_NAME, 'section-2', args),
+        subdag=subdag(DAG_NAME, 'section-2', dag.default_args),
     )
 
     end = DummyOperator(
diff --git a/airflow/example_dags/tutorial.py b/airflow/example_dags/tutorial.py
index 38d4cbe..1049772 100644
--- a/airflow/example_dags/tutorial.py
+++ b/airflow/example_dags/tutorial.py
@@ -34,37 +34,34 @@ from airflow.operators.bash import BashOperator
 
 # [END import_module]
 
-# [START default_args]
-# These args will get passed on to each operator
-# You can override them on a per-task basis during operator initialization
-default_args = {
-    'owner': 'airflow',
-    'depends_on_past': False,
-    'email': ['airflow@example.com'],
-    'email_on_failure': False,
-    'email_on_retry': False,
-    'retries': 1,
-    'retry_delay': timedelta(minutes=5),
-    # 'queue': 'bash_queue',
-    # 'pool': 'backfill',
-    # 'priority_weight': 10,
-    # 'end_date': datetime(2016, 1, 1),
-    # 'wait_for_downstream': False,
-    # 'dag': dag,
-    # 'sla': timedelta(hours=2),
-    # 'execution_timeout': timedelta(seconds=300),
-    # 'on_failure_callback': some_function,
-    # 'on_success_callback': some_other_function,
-    # 'on_retry_callback': another_function,
-    # 'sla_miss_callback': yet_another_function,
-    # 'trigger_rule': 'all_success'
-}
-# [END default_args]
 
 # [START instantiate_dag]
 with DAG(
     'tutorial',
-    default_args=default_args,
+    # [START default_args]
+    # These args will get passed on to each operator
+    # You can override them on a per-task basis during operator initialization
+    default_args={
+        'depends_on_past': False,
+        'email': ['airflow@example.com'],
+        'email_on_failure': False,
+        'email_on_retry': False,
+        'retries': 1,
+        'retry_delay': timedelta(minutes=5),
+        # 'queue': 'bash_queue',
+        # 'pool': 'backfill',
+        # 'priority_weight': 10,
+        # 'end_date': datetime(2016, 1, 1),
+        # 'wait_for_downstream': False,
+        # 'sla': timedelta(hours=2),
+        # 'execution_timeout': timedelta(seconds=300),
+        # 'on_failure_callback': some_function,
+        # 'on_success_callback': some_other_function,
+        # 'on_retry_callback': another_function,
+        # 'sla_miss_callback': yet_another_function,
+        # 'trigger_rule': 'all_success'
+    },
+    # [END default_args]
     description='A simple tutorial DAG',
     schedule_interval=timedelta(days=1),
     start_date=datetime(2021, 1, 1),
diff --git a/airflow/example_dags/tutorial_etl_dag.py b/airflow/example_dags/tutorial_etl_dag.py
index d284452..8dd0ea4 100644
--- a/airflow/example_dags/tutorial_etl_dag.py
+++ b/airflow/example_dags/tutorial_etl_dag.py
@@ -37,18 +37,14 @@ from airflow.operators.python import PythonOperator
 
 # [END import_module]
 
-# [START default_args]
-# These args will get passed on to each operator
-# You can override them on a per-task basis during operator initialization
-default_args = {
-    'owner': 'airflow',
-}
-# [END default_args]
-
 # [START instantiate_dag]
 with DAG(
     'tutorial_etl_dag',
-    default_args=default_args,
+    # [START default_args]
+    # These args will get passed on to each operator
+    # You can override them on a per-task basis during operator initialization
+    default_args={'retries': 2},
+    # [END default_args]
     description='ETL DAG tutorial',
     schedule_interval=None,
     start_date=datetime(2021, 1, 1),
diff --git a/airflow/providers/google/cloud/example_dags/example_functions.py b/airflow/providers/google/cloud/example_dags/example_functions.py
index 03749ba..b32d718 100644
--- a/airflow/providers/google/cloud/example_dags/example_functions.py
+++ b/airflow/providers/google/cloud/example_dags/example_functions.py
@@ -75,7 +75,7 @@ body = {"name": FUNCTION_NAME, "entryPoint": GCF_ENTRYPOINT, "runtime": GCF_RUNT
 # [END howto_operator_gcf_deploy_body]
 
 # [START howto_operator_gcf_default_args]
-default_args = {'owner': 'airflow'}
+default_args = {'retries': '3'}
 # [END howto_operator_gcf_default_args]
 
 # [START howto_operator_gcf_deploy_variants]
diff --git a/docs/apache-airflow/best-practices.rst b/docs/apache-airflow/best-practices.rst
index 5ebed3b..951e6b4 100644
--- a/docs/apache-airflow/best-practices.rst
+++ b/docs/apache-airflow/best-practices.rst
@@ -504,7 +504,7 @@ This is an example test want to verify the structure of a code-generated DAG aga
         with DAG(
             dag_id=TEST_DAG_ID,
             schedule_interval="@daily",
-            default_args={"start_date": DATA_INTERVAL_START},
+            start_date=DATA_INTERVAL_START,
         ) as dag:
             MyCustomOperator(
                 task_id=TEST_TASK_ID,
diff --git a/docs/apache-airflow/concepts/dags.rst b/docs/apache-airflow/concepts/dags.rst
index 563264e..8aa4955 100644
--- a/docs/apache-airflow/concepts/dags.rst
+++ b/docs/apache-airflow/concepts/dags.rst
@@ -195,16 +195,19 @@ Otherwise, you must pass it into each Operator with ``dag=``.
 Default Arguments
 -----------------
 
-Often, many Operators inside a DAG need the same set of default arguments (such as their ``start_date``). Rather than having to specify this individually for every Operator, you can instead pass ``default_args`` to the DAG when you create it, and it will auto-apply them to any operator tied to it::
+Often, many Operators inside a DAG need the same set of default arguments (such as their ``retries``). Rather than having to specify this individually for every Operator, you can instead pass ``default_args`` to the DAG when you create it, and it will auto-apply them to any operator tied to it::
 
-    default_args = {
-        'start_date': datetime(2016, 1, 1),
-        'owner': 'airflow'
-    }
 
-    with DAG('my_dag', default_args=default_args) as dag:
-        op = DummyOperator(task_id='dummy')
-        print(op.owner)  # "airflow"
+
+    with DAG(
+        dag_id='my_dag',
+        start_date=datetime(2016, 1, 1),
+        schedule_interval='@daily',
+        catchup=False,
+        default_args={'retries': 2},
+    ) as dag:
+        op = BashOperator(task_id='dummy', bash_command='Hello World!')
+        print(op.retries)  # 2
 
 
 .. _concepts:dag-decorator:
@@ -464,12 +467,18 @@ Dependency relationships can be applied across all tasks in a TaskGroup with the
 
 TaskGroup also supports ``default_args`` like DAG, it will overwrite the ``default_args`` in DAG level::
 
-    with DAG(dag_id='dag1', default_args={'start_date': datetime(2016, 1, 1), 'owner': 'dag'}):
-        with TaskGroup('group1', default_args={'owner': 'group'}):
+    with DAG(
+        dag_id='dag1',
+        start_date=datetime(2016, 1, 1),
+        schedule_interval="@daily",
+        catchup=False,
+        default_args={'retries': 1},
+    ):
+        with TaskGroup('group1', default_args={'retries': 3}):
             task1 = DummyOperator(task_id='task1')
-            task2 = DummyOperator(task_id='task2', owner='task2')
-            print(task1.owner) # "group"
-            print(task2.owner) # "task2"
+            task2 = BashOperator(task_id='task2', bash_command='echo Hello World!', retries=2)
+            print(task1.retries) # 3
+            print(task2.retries) # 2
 
 If you want to see a more advanced use of TaskGroup, you can look at the ``example_task_group.py`` example DAG that comes with Airflow.
 
@@ -539,7 +548,9 @@ This is especially useful if your tasks are built dynamically from configuration
     ### My great DAG
     """
 
-    dag = DAG("my_dag", default_args=default_args)
+    dag = DAG(
+        "my_dag", start_date=datetime(2021, 1, 1), schedule_interval="@daily", catchup=False
+    )
     dag.doc_md = __doc__
 
     t = BashOperator("foo", dag=dag)
diff --git a/docs/apache-airflow/dag-run.rst b/docs/apache-airflow/dag-run.rst
index 39bd9d2..90bb404 100644
--- a/docs/apache-airflow/dag-run.rst
+++ b/docs/apache-airflow/dag-run.rst
@@ -114,19 +114,13 @@ in the configuration file. When turned off, the scheduler creates a DAG run only
     from datetime import datetime, timedelta
 
 
-    default_args = {
-        "owner": "airflow",
-        "depends_on_past": False,
-        "email": ["airflow@example.com"],
-        "email_on_failure": False,
-        "email_on_retry": False,
-        "retries": 1,
-        "retry_delay": timedelta(minutes=5),
-    }
-
     dag = DAG(
         "tutorial",
-        default_args=default_args,
+        default_args={
+            "depends_on_past": True,
+            "retries": 1,
+            "retry_delay": timedelta(minutes=3),
+        },
         start_date=datetime(2015, 12, 1),
         description="A simple tutorial DAG",
         schedule_interval="@daily",
diff --git a/docs/apache-airflow/faq.rst b/docs/apache-airflow/faq.rst
index 599a1f6..857e685 100644
--- a/docs/apache-airflow/faq.rst
+++ b/docs/apache-airflow/faq.rst
@@ -173,7 +173,8 @@ What's the deal with ``start_date``?
 
 ``start_date`` is partly legacy from the pre-DagRun era, but it is still
 relevant in many ways. When creating a new DAG, you probably want to set
-a global ``start_date`` for your tasks using ``default_args``. The first
+a global ``start_date`` for your tasks. This can be done by declaring your
+``start_date`` directly in the ``DAG()`` object. The first
 DagRun to be created will be based on the ``min(start_date)`` for all your
 tasks. From that point on, the scheduler creates new DagRuns based on
 your ``schedule_interval`` and the corresponding task instances run as your
diff --git a/docs/apache-airflow/lineage.rst b/docs/apache-airflow/lineage.rst
index f0b79aa..9b8bb71 100644
--- a/docs/apache-airflow/lineage.rst
+++ b/docs/apache-airflow/lineage.rst
@@ -32,11 +32,11 @@ works.
 
     from datetime import datetime, timedelta
 
-    from airflow.operators.bash import BashOperator
-    from airflow.operators.dummy import DummyOperator
     from airflow.lineage import AUTO
     from airflow.lineage.entities import File
     from airflow.models import DAG
+    from airflow.operators.bash import BashOperator
+    from airflow.operators.dummy import DummyOperator
 
     FILE_CATEGORIES = ["CAT1", "CAT2", "CAT3"]
 
diff --git a/docs/apache-airflow/timezone.rst b/docs/apache-airflow/timezone.rst
index f11a750..32e5223 100644
--- a/docs/apache-airflow/timezone.rst
+++ b/docs/apache-airflow/timezone.rst
@@ -86,15 +86,13 @@ and ``end_dates`` in your DAG definitions. This is mostly in order to preserve b
 case a naive ``start_date`` or ``end_date`` is encountered the default time zone is applied. It is applied
 in such a way that it is assumed that the naive date time is already in the default time zone. In other
 words if you have a default time zone setting of ``Europe/Amsterdam`` and create a naive datetime ``start_date`` of
-``datetime(2017,1,1)`` it is assumed to be a ``start_date`` of Jan 1, 2017 Amsterdam time.
+``datetime(2017, 1, 1)`` it is assumed to be a ``start_date`` of Jan 1, 2017 Amsterdam time.
 
 .. code-block:: python
 
-    default_args = dict(start_date=datetime(2016, 1, 1), owner="airflow")
-
-    dag = DAG("my_dag", default_args=default_args)
-    op = DummyOperator(task_id="dummy", dag=dag)
-    print(op.owner)  # Airflow
+    dag = DAG("my_dag", start_date=datetime(2017, 1, 1), default_args={"retries": 3})
+    op = BashOperator(task_id="dummy", bash_command="Hello World!", dag=dag)
+    print(op.retries)  # 3
 
 Unfortunately, during DST transitions, some datetimes don’t exist or are ambiguous.
 In such situations, pendulum raises an exception. That’s why you should always create aware
@@ -134,9 +132,7 @@ using ``pendulum``.
 
     local_tz = pendulum.timezone("Europe/Amsterdam")
 
-    default_args = dict(start_date=datetime(2016, 1, 1, tzinfo=local_tz), owner="airflow")
-
-    dag = DAG("my_tz_dag", default_args=default_args)
+    dag = DAG("my_tz_dag", start_date=datetime(2016, 1, 1, tzinfo=local_tz))
     op = DummyOperator(task_id="dummy", dag=dag)
     print(dag.timezone)  # <Timezone [Europe/Amsterdam]>
 
diff --git a/docs/apache-airflow/tutorial.rst b/docs/apache-airflow/tutorial.rst
index 7e27d54..babb8d6 100644
--- a/docs/apache-airflow/tutorial.rst
+++ b/docs/apache-airflow/tutorial.rst
@@ -77,6 +77,7 @@ of default parameters that we can use when creating tasks.
 
 .. exampleinclude:: /../../airflow/example_dags/tutorial.py
     :language: python
+    :dedent: 4
     :start-after: [START default_args]
     :end-before: [END default_args]
 

[airflow] 16/24: Deprecate smart sensors (#20151)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit e0b262c0139fa595fdbc6fd697c59e4cd514fb05
Author: Jed Cunningham <66...@users.noreply.github.com>
AuthorDate: Wed Dec 15 11:57:25 2021 -0700

    Deprecate smart sensors (#20151)
    
    Smart sensors are being replaced with Deferrable Operators. As they were
    marked as an early-access feature, we can remove them before Airflow 3.
    
    (cherry picked from commit 77813b40db99683dcf14b557f9cddc50080c9a6a)
---
 UPDATING.md                                    | 11 ++++++++++-
 airflow/jobs/scheduler_job.py                  | 12 ++++++++++++
 airflow/sensors/base.py                        |  8 ++++++++
 docs/apache-airflow/concepts/deferring.rst     |  3 ++-
 docs/apache-airflow/concepts/smart-sensors.rst | 26 +++++++++++++++++---------
 5 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/UPDATING.md b/UPDATING.md
index 718da8b..a75b2d6 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -27,6 +27,7 @@ assists users migrating to a new version.
 **Table of contents**
 
 - [Main](#main)
+- [Airflow 2.2.4](#airflow-224)
 - [Airflow 2.2.3](#airflow-223)
 - [Airflow 2.2.2](#airflow-222)
 - [Airflow 2.2.1](#airflow-221)
@@ -80,9 +81,17 @@ https://developers.google.com/style/inclusive-documentation
 
 -->
 
+## Airflow 2.2.4
+
+### Smart sensors deprecated
+
+Smart sensors, an "early access" feature added in Airflow 2, are now deprecated and will be removed in Airflow 2.4.0. They have been superseded by Deferable Operators, added in Airflow 2.2.0.
+
+See [Migrating to Deferrable Operators](https://airflow.apache.org/docs/apache-airflow/2.3.0/concepts/smart-sensors.html#migrating-to-deferrable-operators) for details on how to migrate.
+
 ## Airflow 2.2.3
 
-No breaking changes.
+Continuing the effort to bind TaskInstance to a DagRun, XCom entries are now also tied to a DagRun. Use the ``run_id`` argument to specify the DagRun instead.
 
 ## Airflow 2.2.2
 
diff --git a/airflow/jobs/scheduler_job.py b/airflow/jobs/scheduler_job.py
index 44c4df7..2fedf80 100644
--- a/airflow/jobs/scheduler_job.py
+++ b/airflow/jobs/scheduler_job.py
@@ -49,6 +49,7 @@ from airflow.stats import Stats
 from airflow.ti_deps.dependencies_states import EXECUTION_STATES
 from airflow.utils import timezone
 from airflow.utils.callback_requests import DagCallbackRequest, TaskCallbackRequest
+from airflow.utils.docs import get_docs_url
 from airflow.utils.event_scheduler import EventScheduler
 from airflow.utils.retries import MAX_DB_RETRIES, retry_db_transaction, run_with_db_retries
 from airflow.utils.session import create_session, provide_session
@@ -146,6 +147,17 @@ class SchedulerJob(BaseJob):
 
         self.dagbag = DagBag(dag_folder=self.subdir, read_dags_from_db=True, load_op_links=False)
 
+        if conf.getboolean('smart_sensor', 'use_smart_sensor'):
+            compatible_sensors = set(
+                map(lambda l: l.strip(), conf.get('smart_sensor', 'sensors_enabled').split(','))
+            )
+            docs_url = get_docs_url('concepts/smart-sensors.html#migrating-to-deferrable-operators')
+            warnings.warn(
+                f'Smart sensors are deprecated, yet can be used for {compatible_sensors} sensors.'
+                f' Please use Deferrable Operators instead. See {docs_url} for more info.',
+                DeprecationWarning,
+            )
+
     def register_signals(self) -> None:
         """Register signals that stop child processes"""
         signal.signal(signal.SIGINT, self._exit_gracefully)
diff --git a/airflow/sensors/base.py b/airflow/sensors/base.py
index 039a21a..a2ef9c4 100644
--- a/airflow/sensors/base.py
+++ b/airflow/sensors/base.py
@@ -19,6 +19,7 @@
 import datetime
 import hashlib
 import time
+import warnings
 from datetime import timedelta
 from typing import Any, Callable, Dict, Iterable
 
@@ -39,6 +40,7 @@ from airflow.utils import timezone
 # Google Provider before 3.0.0 imported apply_defaults from here.
 # See  https://github.com/apache/airflow/issues/16035
 from airflow.utils.decorators import apply_defaults  # noqa: F401
+from airflow.utils.docs import get_docs_url
 
 
 class BaseSensorOperator(BaseOperator, SkipMixin):
@@ -154,6 +156,12 @@ class BaseSensorOperator(BaseOperator, SkipMixin):
         :param context: TaskInstance template context from the ti.
         :return: boolean
         """
+        docs_url = get_docs_url('concepts/smart-sensors.html#migrating-to-deferrable-operators')
+        warnings.warn(
+            'Your sensor is using Smart Sensors, which are deprecated.'
+            f' Please use Deferrable Operators instead. See {docs_url} for more info.',
+            DeprecationWarning,
+        )
         poke_context = self.get_poke_context(context)
         execution_context = self.get_execution_context(context)
 
diff --git a/docs/apache-airflow/concepts/deferring.rst b/docs/apache-airflow/concepts/deferring.rst
index d9126c4..ca810d7 100644
--- a/docs/apache-airflow/concepts/deferring.rst
+++ b/docs/apache-airflow/concepts/deferring.rst
@@ -49,6 +49,7 @@ That's it; everything else will be automatically handled for you. If you're upgr
 
 Note that you cannot yet use the deferral ability from inside custom PythonOperator/TaskFlow Python functions; it is only available to traditional, class-based Operators at the moment.
 
+.. _deferring/writing:
 
 Writing Deferrable Operators
 ----------------------------
@@ -163,4 +164,4 @@ Note that every extra ``triggerer`` you run will result in an extra persistent c
 Smart Sensors
 -------------
 
-Deferrable Operators essentially supersede :doc:`Smart Sensors <smart-sensors>`, and should be preferred for almost all situations. They do solve fundamentally the same problem; Smart Sensors, however, only work for certain Sensor workload styles, have no redundancy, and require a custom DAG to run at all times.
+Deferrable Operators supersede :doc:`Smart Sensors <smart-sensors>`. They do solve fundamentally the same problem; Smart Sensors, however, only work for certain Sensor workload styles, have no redundancy, and require a custom DAG to run at all times.
diff --git a/docs/apache-airflow/concepts/smart-sensors.rst b/docs/apache-airflow/concepts/smart-sensors.rst
index e654d91..a188ea6 100644
--- a/docs/apache-airflow/concepts/smart-sensors.rst
+++ b/docs/apache-airflow/concepts/smart-sensors.rst
@@ -23,15 +23,11 @@ Smart Sensors
 
 .. warning::
 
-  This is an **early-access** feature and might change in incompatible ways in future Airflow versions.
-  However this feature can be considered bug-free, and Airbnb has been using this feature in production
-  since early 2020 and has significantly reduced their costs for heavy use of sensors.
-
-.. note::
-
-  :doc:`Deferrable Operators <deferring>` are a more flexible way to achieve efficient long-running sensors,
-  as well as allowing Operators to also achieve similar efficiency gains. If you are considering writing a
-  new Smart Sensor, you may want to instead write it as a Deferrable Operator.
+  This is a **deprecated early-access** feature that will be removed in Airflow 2.4.0.
+  It is superseded by :doc:`Deferrable Operators <deferring>`, which offer a more flexible way to
+  achieve efficient long-running sensors, as well as allowing operators to also achieve similar
+  efficiency gains. If you are considering writing a new Smart Sensor, you should instead write it
+  as a Deferrable Operator.
 
 The smart sensor is a service (run by a builtin DAG) which greatly reduces Airflow’s infrastructure
 cost by consolidating multiple instances of small, light-weight Sensors into a single process.
@@ -96,3 +92,15 @@ Support new operators in the smart sensor service
     include all key names used for initializing a sensor object.
 *   In ``airflow.cfg``, add the new operator's classname to ``[smart_sensor] sensors_enabled``.
     All supported sensors' classname should be comma separated.
+
+Migrating to Deferrable Operators
+----------------------------------
+
+There is not a direct migration path from Smart Sensors to :doc:`Deferrable Operators <deferring>`.
+You have a few paths forward, depending on your needs and situation:
+
+*   Do nothing - your DAGs will continue to run as-is, however they will no longer get the optimization smart sensors brought
+*   Deferrable Operator - move to a Deferrable Operator that alleviates the need for a sensor all-together
+*   Deferrable Sensor - move to an async version of the sensor you are already using
+
+See :ref:`Writing Deferrable Operators <deferring/writing>` for details on writing Deferrable Operators and Sensors.

[airflow] 02/24: Updating explicit arg example in TaskFlow API tutorial doc (#18907)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit ce341b6eeb2bb1827eb2e947179e628ff1e1648b
Author: Josh Fell <48...@users.noreply.github.com>
AuthorDate: Tue Oct 12 10:02:01 2021 -0400

    Updating explicit arg example in TaskFlow API tutorial doc (#18907)
    
    (cherry picked from commit b4321de8d17d8f167f2bb3f9ddb1e4ebeed0665e)
---
 docs/apache-airflow/tutorial_taskflow_api.rst | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/apache-airflow/tutorial_taskflow_api.rst b/docs/apache-airflow/tutorial_taskflow_api.rst
index 076f48b..e58f490 100644
--- a/docs/apache-airflow/tutorial_taskflow_api.rst
+++ b/docs/apache-airflow/tutorial_taskflow_api.rst
@@ -416,14 +416,16 @@ When running your callable, Airflow will pass a set of keyword arguments that ca
 function. This set of kwargs correspond exactly to what you can use in your jinja templates.
 For this to work, you need to define ``**kwargs`` in your function header, or you can add directly the
 keyword arguments you would like to get - for example with the below code your callable will get
-the values of ``ti`` and ``next_ds`` context variables.
+the values of ``ti`` and ``next_ds`` context variables. Note that when explicit keyword arguments are used,
+they must be made optional in the function header to avoid ``TypeError`` exceptions during DAG parsing as
+these values are not available until task execution.
 
 With explicit arguments:
 
 .. code-block:: python
 
    @task
-   def my_python_callable(ti, next_ds):
+   def my_python_callable(ti=None, next_ds=None):
        pass
 
 With kwargs:

[airflow] 20/24: Compare taskgroup and subdag (#20700)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 7cd3fd68fbee97ab84420e70d3f17cd6a21b9e84
Author: Alan Ma <al...@gmail.com>
AuthorDate: Sun Jan 9 13:58:26 2022 -0800

    Compare taskgroup and subdag (#20700)
    
    (cherry picked from commit 6b0c52898555641059e149c5ff0d9b46b2d45379)
---
 docs/apache-airflow/concepts/dags.rst | 43 +++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/docs/apache-airflow/concepts/dags.rst b/docs/apache-airflow/concepts/dags.rst
index 8aa4955..8d9b387 100644
--- a/docs/apache-airflow/concepts/dags.rst
+++ b/docs/apache-airflow/concepts/dags.rst
@@ -605,8 +605,47 @@ Some other tips when using SubDAGs:
 
 See ``airflow/example_dags`` for a demonstration.
 
-Note that :doc:`pools` are *not honored* by :class:`~airflow.operators.subdag.SubDagOperator`, and so
-resources could be consumed by SubdagOperators beyond any limits you may have set.
+
+.. note::
+
+    Parallelism is *not honored* by :class:`~airflow.operators.subdag.SubDagOperator`, and so resources could be consumed by SubdagOperators beyond any limits you may have set.
+
+
+
+TaskGroups vs SubDAGs
+----------------------
+
+SubDAGs, while serving a similar purpose as TaskGroups, introduces both performance and functional issues due to its implementation.
+
+* The SubDagOperator starts a BackfillJob, which ignores existing parallelism configurations potentially oversubscribing the worker environment.
+* SubDAGs have their own DAG attributes. When the SubDAG DAG attributes are inconsistent with its parent DAG, unexpected behavior can occur.
+* Unable to see the "full" DAG in one view as SubDAGs exists as a full fledged DAG.
+* SubDAGs introduces all sorts of edge cases and caveats. This can disrupt user experience and expectation.
+
+TaskGroups, on the other hand, is a better option given that it is purely a UI grouping concept. All tasks within the TaskGroup still behave as any other tasks outside of the TaskGroup.
+
+You can see the core differences between these two constructs.
+
++--------------------------------------------------------+--------------------------------------------------------+
+| TaskGroup                                              | SubDAG                                                 |
++========================================================+========================================================+
+| Repeating patterns as part of the same DAG             |  Repeating patterns as a separate DAG                  |
++--------------------------------------------------------+--------------------------------------------------------+
+| One set of views and statistics for the DAG            |  Separate set of views and statistics between parent   |
+|                                                        |  and child DAGs                                        |
++--------------------------------------------------------+--------------------------------------------------------+
+| One set of DAG configuration                           |  Several sets of DAG configurations                    |
++--------------------------------------------------------+--------------------------------------------------------+
+| Honors parallelism configurations through existing     |  Does not honor parallelism configurations due to      |
+| SchedulerJob                                           |  newly spawned BackfillJob                             |
++--------------------------------------------------------+--------------------------------------------------------+
+| Simple construct declaration with context manager      |  Complex DAG factory with naming restrictions          |
++--------------------------------------------------------+--------------------------------------------------------+
+
+.. note::
+
+    SubDAG is deprecated hence TaskGroup is always the preferred choice.
+
 
 
 Packaging DAGs

[airflow] 15/24: Fix grammar and typos in "Logging for Tasks" guide (#20146)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit e2bb59823971773665bf10a2d8fb9b8a877330e1
Author: Josh Fell <48...@users.noreply.github.com>
AuthorDate: Wed Dec 8 12:14:53 2021 -0500

    Fix grammar and typos in "Logging for Tasks" guide (#20146)
    
    (cherry picked from commit 70818319a038f1d17c179c278930b5b85035085d)
---
 .../logging-monitoring/logging-tasks.rst           | 36 +++++++++++-----------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/docs/apache-airflow/logging-monitoring/logging-tasks.rst b/docs/apache-airflow/logging-monitoring/logging-tasks.rst
index 043f8f7..13cb248 100644
--- a/docs/apache-airflow/logging-monitoring/logging-tasks.rst
+++ b/docs/apache-airflow/logging-monitoring/logging-tasks.rst
@@ -20,18 +20,18 @@
 Logging for Tasks
 =================
 
-Airflow writes logs for tasks in a way that allows to see the logs for each task separately via Airflow UI.
-The Core Airflow implements writing and serving logs locally. However you can also write logs to remote
-services - via community providers, but you can also write your own loggers.
+Airflow writes logs for tasks in a way that allows you to see the logs for each task separately in the Airflow UI.
+Core Airflow implements writing and serving logs locally. However, you can also write logs to remote
+services via community providers, or write your own loggers.
 
-Below we describe the local task logging, but Apache Airflow Community also releases providers for many
-services (:doc:`apache-airflow-providers:index`) and some of them also provide handlers that extend logging
-capability of Apache Airflow. You can see all those providers in :doc:`apache-airflow-providers:core-extensions/logging`.
+Below we describe the local task logging, the Apache Airflow Community also releases providers for many
+services (:doc:`apache-airflow-providers:index`) and some of them provide handlers that extend the logging
+capability of Apache Airflow. You can see all of these providers in :doc:`apache-airflow-providers:core-extensions/logging`.
 
 Writing logs Locally
 --------------------
 
-Users can specify the directory to place log files in ``airflow.cfg`` using
+You can specify the directory to place log files in ``airflow.cfg`` using
 ``base_log_folder``. By default, logs are placed in the ``AIRFLOW_HOME``
 directory.
 
@@ -40,18 +40,18 @@ directory.
 
 The following convention is followed while naming logs: ``{dag_id}/{task_id}/{logical_date}/{try_number}.log``
 
-In addition, users can supply a remote location to store current logs and backups.
+In addition, you can supply a remote location to store current logs and backups.
 
-In the Airflow Web UI, remote logs take precedence over local logs when remote logging is enabled. If remote logs
+In the Airflow UI, remote logs take precedence over local logs when remote logging is enabled. If remote logs
 can not be found or accessed, local logs will be displayed. Note that logs
-are only sent to remote storage once a task is complete (including failure); In other words, remote logs for
+are only sent to remote storage once a task is complete (including failure). In other words, remote logs for
 running tasks are unavailable (but local logs are available).
 
 
 Troubleshooting
 ---------------
 
-If you want to check which task handler is currently set, you can use ``airflow info`` command as in
+If you want to check which task handler is currently set, you can use the ``airflow info`` command as in
 the example below.
 
 .. code-block:: bash
@@ -67,7 +67,7 @@ the example below.
     Plugins Folder: [/root/airflow/plugins]
     Base Log Folder: [/root/airflow/logs]
 
-You can also use ``airflow config list`` to check that the logging configuration options have valid values.
+You can also run ``airflow config list`` to check that the logging configuration options have valid values.
 
 .. _write-logs-advanced:
 
@@ -75,9 +75,9 @@ Advanced configuration
 ----------------------
 
 Not all configuration options are available from the ``airflow.cfg`` file. Some configuration options require
-that the logging config class be overwritten. This can be done by ``logging_config_class`` option
-in ``airflow.cfg`` file. This option should specify the import path indicating to a configuration compatible with
-:func:`logging.config.dictConfig`. If your file is a standard import location, then you should set a :envvar:`PYTHONPATH` environment.
+that the logging config class be overwritten. This can be done via the ``logging_config_class`` option
+in ``airflow.cfg`` file. This option should specify the import path to a configuration compatible with
+:func:`logging.config.dictConfig`. If your file is a standard import location, then you should set a :envvar:`PYTHONPATH` environment variable.
 
 Follow the steps below to enable custom logging config class:
 
@@ -88,7 +88,7 @@ Follow the steps below to enable custom logging config class:
         export PYTHON_PATH=~/airflow/
 
 #. Create a directory to store the config file e.g. ``~/airflow/config``
-#. Create file called ``~/airflow/config/log_config.py`` with following content:
+#. Create file called ``~/airflow/config/log_config.py`` with following the contents:
 
     .. code-block:: python
 
@@ -113,14 +113,14 @@ See :doc:`../modules_management` for details on how Python and Airflow manage mo
 External Links
 --------------
 
-When using remote logging, users can configure Airflow to show a link to an external UI within the Airflow Web UI. Clicking the link redirects a user to the external UI.
+When using remote logging, you can configure Airflow to show a link to an external UI within the Airflow Web UI. Clicking the link redirects you to the external UI.
 
 Some external systems require specific configuration in Airflow for redirection to work but others do not.
 
 Serving logs from workers
 -------------------------
 
-Most task handlers send logs upon completion of a task. In order to view logs in real time, airflow automatically starts an http server to serve the logs in the following cases:
+Most task handlers send logs upon completion of a task. In order to view logs in real time, Airflow automatically starts an HTTP server to serve the logs in the following cases:
 
 - If ``SchedulerExecutor`` or ``LocalExecutor`` is used, then when ``airflow scheduler`` is running.
 - If ``CeleryExecutor`` is used, then when ``airflow worker`` is running.

[airflow] 21/24: Update metric name in documentation (#20764)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 2dbe1e9d84212cc167182ea1cc479aa6f1b59cc7
Author: humit <jh...@naver.com>
AuthorDate: Tue Jan 11 02:12:20 2022 +0900

    Update metric name in documentation (#20764)
    
    (cherry picked from commit 6825c9acf435bc0c7b6e77e552c2a33a4966d740)
---
 docs/apache-airflow/logging-monitoring/metrics.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/apache-airflow/logging-monitoring/metrics.rst b/docs/apache-airflow/logging-monitoring/metrics.rst
index 025e310c..c8fd182 100644
--- a/docs/apache-airflow/logging-monitoring/metrics.rst
+++ b/docs/apache-airflow/logging-monitoring/metrics.rst
@@ -101,13 +101,13 @@ Name                                        Description
                                             section (needed to send tasks to the executor) and found it locked by
                                             another process.
 ``sla_email_notification_failure``          Number of failed SLA miss email notification attempts
-``ti.start.<dagid>.<taskid>``               Number of started task in a given dag. Similar to <job_name>_start but for task
-``ti.finish.<dagid>.<taskid>.<state>``      Number of completed task in a given dag. Similar to <job_name>_end but for task
+``ti.start.<dag_id>.<task_id>``             Number of started task in a given dag. Similar to <job_name>_start but for task
+``ti.finish.<dag_id>.<task_id>.<state>``    Number of completed task in a given dag. Similar to <job_name>_end but for task
 ``dag.callback_exceptions``                 Number of exceptions raised from DAG callbacks. When this happens, it
                                             means DAG callback is not working.
 ``celery.task_timeout_error``               Number of ``AirflowTaskTimeout`` errors raised when publishing Task to Celery Broker.
-``task_removed_from_dag.<dagid>``           Number of tasks removed for a given dag (i.e. task no longer exists in DAG)
-``task_restored_to_dag.<dagid>``            Number of tasks restored for a given dag (i.e. task instance which was
+``task_removed_from_dag.<dag_id>``          Number of tasks removed for a given dag (i.e. task no longer exists in DAG)
+``task_restored_to_dag.<dag_id>``           Number of tasks restored for a given dag (i.e. task instance which was
                                             previously in REMOVED state in the DB is added to DAG file)
 ``task_instance_created-<operator_name>``   Number of tasks instances created for a given Operator
 ``triggers.blocked_main_thread``            Number of triggers that blocked the main thread (likely due to not being

[airflow] 11/24: Fix example code in Doc (#19824)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 4608eaa77d7cc77f45ab6046bc7470a39fcc2f37
Author: Ephraim Anierobi <sp...@gmail.com>
AuthorDate: Thu Nov 25 17:06:41 2021 +0100

    Fix example code in Doc (#19824)
    
    (cherry picked from commit 76c598a52e084a99dba4441c326be52d1c26634c)
---
 docs/apache-airflow/concepts/params.rst | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/docs/apache-airflow/concepts/params.rst b/docs/apache-airflow/concepts/params.rst
index 430de5d..c508279 100644
--- a/docs/apache-airflow/concepts/params.rst
+++ b/docs/apache-airflow/concepts/params.rst
@@ -32,21 +32,22 @@ which won't be doing any such validations.
     from airflow.models.param import Param
 
     with DAG(
-      'my_dag',
-      params: {
-        'int_param': Param(10, type='integer', minimum=0, maximum=20),          # a int param with default value
-        'str_param': Param(type='string', minLength=2, maxLength=4),            # a mandatory str param
-        'dummy_param': Param(type=['null', 'number', 'string'])                 # a param which can be None as well
-        'old_param': 'old_way_of_passing',                                      # i.e. no data or type validations
-        'simple_param': Param('im_just_like_old_param'),                        # i.e. no data or type validations
-        'email_param': Param(
-            default='example@example.com',
-            type='string',
-            format='idn-email',
-            minLength=5,
-            maxLength=255,
-        ),
-      }
+        'my_dag',
+        params={
+            'int_param': Param(10, type='integer', minimum=0, maximum=20),  # a int param with default value
+            'str_param': Param(type='string', minLength=2, maxLength=4),    # a mandatory str param
+            'dummy_param': Param(type=['null', 'number', 'string'])         # a param which can be None as well
+            'old_param': 'old_way_of_passing',                              # i.e. no data or type validations
+            'simple_param': Param('im_just_like_old_param'),                # i.e. no data or type validations
+            'email_param': Param(
+                default='example@example.com',
+                type='string',
+                format='idn-email',
+                minLength=5,
+                maxLength=255,
+            ),
+        },
+    )
 
 ``Param`` make use of `json-schema <https://json-schema.org/>`__ to define the properties and doing the
 validation, so one can use the full json-schema specifications mentioned at

[airflow] 17/24: Removes unnecessary --upgrade option from our examples (#20537)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 312577e1daa31c914e7cca3b40bc6bcbbe519cb6
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Tue Dec 28 21:11:07 2021 +0100

    Removes unnecessary --upgrade option from our examples (#20537)
    
    (cherry picked from commit 2976e6f829e727c01b9c2838e32d210d40e7a03c)
---
 docs/apache-airflow/installation/installing-from-pypi.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/apache-airflow/installation/installing-from-pypi.rst b/docs/apache-airflow/installation/installing-from-pypi.rst
index 95d0552..33a7d7c 100644
--- a/docs/apache-airflow/installation/installing-from-pypi.rst
+++ b/docs/apache-airflow/installation/installing-from-pypi.rst
@@ -118,7 +118,7 @@ being installed.
     AIRFLOW_VERSION=|version|
     PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
     CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
-    pip install --upgrade "apache-airflow[postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
+    pip install "apache-airflow[postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
 
 Installation and upgrading of Airflow providers separately
 ==========================================================

[airflow] 09/24: Change the name of link to ASF downloads (#19441)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit ce21b8e514841750adcdaa3ba93cbca3ea077ded
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Sun Nov 7 11:17:01 2021 +0100

    Change the name of link to ASF downloads (#19441)
    
    The ASF used to use mirrors to distribute their software, however
    recently they changed to use CDN. The mechanism might change in
    the future (even if currently CDN is used the ASF 'mirrors' page
    and closer.lua script provide a fully ASF-controlled mechanism to
    switch to the right mechanism, however technically speaking the
    current solution is not 'mirrors' but it is CDN, therefore it makes
    sense to rename it to generic downloads.
    
    (cherry picked from commit 3a7e687a369986a35c83ab374ed0e9dc040ecdae)
---
 docs/apache-airflow/installation/index.rst                   | 2 +-
 docs/apache-airflow/installation/installing-from-sources.rst | 2 +-
 docs/helm-chart/installing-helm-chart-from-sources.rst       | 2 +-
 docs/installing-providers-from-sources.rst                   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/apache-airflow/installation/index.rst b/docs/apache-airflow/installation/index.rst
index dfb18f5..afb695c 100644
--- a/docs/apache-airflow/installation/index.rst
+++ b/docs/apache-airflow/installation/index.rst
@@ -63,7 +63,7 @@ More details: :doc:`installing-from-sources`
 
 * This option is best if you expect to build all your software from sources.
 * Apache Airflow is one of the projects that belong to the `Apache Software Foundation <https://www.apache.org/>`__ .
-  It is a requirement for all ASF projects that they can be installed using official sources released via `Official Apache Mirrors <http://ws.apache.org/mirrors.cgi/>`__ .
+  It is a requirement for all ASF projects that they can be installed using official sources released via `Official Apache Downloads <http://ws.apache.org/mirrors.cgi/>`__ .
 * This is the best choice if you have a strong need to `verify the integrity and provenance of the software <https://www.apache.org/dyn/closer.cgi#verify>`__
 
 **Intended users**
diff --git a/docs/apache-airflow/installation/installing-from-sources.rst b/docs/apache-airflow/installation/installing-from-sources.rst
index 87ca997..2a62540 100644
--- a/docs/apache-airflow/installation/installing-from-sources.rst
+++ b/docs/apache-airflow/installation/installing-from-sources.rst
@@ -32,7 +32,7 @@ Released packages
 The ``source``, ``sdist`` and ``whl`` packages released are the "official" sources of installation that you
 can use if you want to verify the origin of the packages and want to verify checksums and signatures of
 the packages. The packages are available via the
-`Official Apache Software Foundations Mirrors <http://ws.apache.org/mirrors.cgi>`_
+`Official Apache Software Foundations Downloads <http://ws.apache.org/mirrors.cgi>`_
 
 
 The |version| downloads are available at:
diff --git a/docs/helm-chart/installing-helm-chart-from-sources.rst b/docs/helm-chart/installing-helm-chart-from-sources.rst
index cfd7a25..63f00b4 100644
--- a/docs/helm-chart/installing-helm-chart-from-sources.rst
+++ b/docs/helm-chart/installing-helm-chart-from-sources.rst
@@ -34,7 +34,7 @@ Released packages
 The sources and packages released are the "official" sources of installation that you can use if
 you want to verify the origin of the packages and want to verify checksums and signatures of the packages.
 The packages are available via the
-`Official Apache Software Foundations Mirrors <http://ws.apache.org/mirrors.cgi>`_
+`Official Apache Software Foundations Downloads <http://ws.apache.org/mirrors.cgi>`_
 
 The downloads are available at:
 
diff --git a/docs/installing-providers-from-sources.rst b/docs/installing-providers-from-sources.rst
index 415956d..c3d9400 100644
--- a/docs/installing-providers-from-sources.rst
+++ b/docs/installing-providers-from-sources.rst
@@ -34,7 +34,7 @@ Released packages
 The ``sdist`` and ``whl`` packages released are the "official" sources of installation that you can use if
 you want to verify the origin of the packages and want to verify checksums and signatures of the packages.
 The packages are available via the
-`Official Apache Software Foundations Mirrors <http://ws.apache.org/mirrors.cgi>`__
+`Official Apache Software Foundations Downloads <http://ws.apache.org/mirrors.cgi>`__
 
 The downloads are available at:
 

[airflow] 04/24: Update CSV ingest code for tutorial (#18960)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit d1834043864f58bbf3efea543271743dc33c4e14
Author: Daniel Robert <da...@acm.org>
AuthorDate: Thu Oct 14 18:22:10 2021 -0400

    Update CSV ingest code for tutorial (#18960)
    
    (cherry picked from commit fb00240db972ed2adb60357e3e9116830b3fad5b)
---
 docs/apache-airflow/pipeline_example.csv | 190 +++++++++++++++----------------
 docs/apache-airflow/tutorial.rst         |  54 ++++-----
 2 files changed, 120 insertions(+), 124 deletions(-)

diff --git a/docs/apache-airflow/pipeline_example.csv b/docs/apache-airflow/pipeline_example.csv
index a1b9f2d..7e600d3 100644
--- a/docs/apache-airflow/pipeline_example.csv
+++ b/docs/apache-airflow/pipeline_example.csv
@@ -1,96 +1,96 @@
 Serial Number,Company Name,Employee Markme,Description,Leave
-9.78819E+12,TALES OF SHIVA,Mark,mark,0
-9.7801E+12,1Q84 THE COMPLETE TRILOGY,HARUKI MURAKAMI,Mark,0
-9.7802E+12,MY KUMAN,Mark,Mark,0
-9.78001E+12,THE GOD OF SMAAL THINGS,ARUNDHATI ROY,4TH HARPER COLLINS,2
-9.78055E+12,THE BLACK CIRCLE,Mark,4TH HARPER COLLINS,0
-9.78813E+12,THE THREE LAWS OF PERFORMANCE,Mark,4TH HARPER COLLINS,0
-9.78938E+12,CHAMarkKYA MANTRA,Mark,4TH HARPER COLLINS,0
-9.78818E+12,59.FLAGS,Mark,4TH HARPER COLLINS,0
-9.78074E+12,THE POWER OF POSITIVE THINKING FROM,Mark,A & A PUBLISHER,0
-9.78938E+12,YOU CAN IF YO THINK YO CAN,PEALE,A & A PUBLISHER,0
-9.78818E+12,DONGRI SE DUBAI TAK (MPH),Mark,A & A PUBLISHER,0
-9.78819E+12,MarkLANDA ADYTAN KOSH,Mark,AADISH BOOK DEPOT,0
-9.78819E+12,MarkLANDA VISHAL SHABD SAGAR,-,AADISH BOOK DEPOT,1
-8187776021,MarkLANDA CONCISE DICT(ENG TO HINDI),Mark,AADISH BOOK DEPOT,0
-9.78938E+12,LIEUTEMarkMarkT GENERAL BHAGAT: A SAGA OF BRAVERY AND LEADERSHIP,Mark,AAM COMICS,2
-9.78938E+12,LN. MarkIK SUNDER SINGH,N.A,AAN COMICS,0
-9.78938E+12,I AM KRISHMark,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,1
-9.78938E+12,DON'T TEACH ME TOLERANCE INDIA,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,0
-9.78938E+12,MUJHE SAHISHNUTA MAT SIKHAO BHARAT,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,0
-9.78938E+12,SECRETS OF DESTINY,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,1
-9.78938E+12,BHAGYA KE RAHASYA (HINDI) SECRET OF DESTINY,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,1
-9.78819E+12,MEIN MANN HOON,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,0
-9.78938E+12,I AM THE MIND,DEEP TRIVEDI,AATMARAM & SONS,0
-9.78035E+12,THE ART OF CHOOSING,SHEEMark IYENGAR,ABACUS,0
-9.78035E+12,IN SPITE OF THE GODS,EDWARD LUCE,ABACUS,1
-9.78819E+12,QUESTIONS & ANWERS ABOUT THE GREAT BIBLE,Mark,ABC PUBLISHERS DISTRIBUTORS,4
-9.78938E+12,NIBANDH EVAM KAHANI LEKHAN { HINDI },Mark,ABHI BOOKS,1
-9.78933E+12,INDIAN ECONOMY SINCE INDEPENDENCE 27TH /E,UMA KAPILA,ACADEMIC FOUNDATION,1
-9.78817E+12,ECONOMIC DEVELOPMENT AND POLICY IN INDIA,UMA KAPILA,ACADEMIC FOUNDATION,1
-9.78933E+12,INDIAN ECONOMY PERFORMANCE 18TH/E  2017-2018,UMA KAPILA,ACADEMIC FOUNDATION,2
-9.78933E+12,INDIAN ECONOMIC DEVELOPMENTSINCE 1947 (NO RETURMarkBLE),UMA KAPILA,ACADEMIC FOUNDATION,1
-9.78938E+12,PRELIMS SPECIAL READING COMPREHENSION PAPER II CSAT,MarkGENDRA PRATAP,ACCESS PUBLISHING INDIA PVT.LTD,0
-9.78938E+12,THE CONSTITUTION OF INDIA 2ND / E,AR KHAN,ACCESS PUBLISHING INDIA PVT.LTD,10
-9.78939E+12,"INDIAN HERITAGE ,ART & CULTURE",MADHUKAR,ACCESS PUBLISHING INDIA PVT.LTD,10
-9.78938E+12,BHARAT KA SAMVIDHAN,AR KHAN,ACCESS PUBLISHING INDIA PVT.LTD,4
-9.78938E+12,"ETHICS, INTEGRITY & APTITUDE ( 3RD/E)","P N ROY ,G SUBBA RAO",ACCESS PUBLISHING INDIA PVT.LTD,10
-9.78938E+12,GENERAL STUDIES PAPER -- I (2016),Mark,ACCESS PUBLISHING INDIA PVT.LTD,0
-9.78938E+12,GENERAL STUDIES PAPER - II (2016),Mark,ACCESS PUBLISHING INDIA PVT.LTD,0
-9.78938E+12,INDIAN AND WORLD GEOGRAPHY 2E,D R KHULLAR,ACCESS PUBLISHING INDIA PVT.LTD,10
-9.78938E+12,VASTUNISTHA PRASHN SANGRAHA: BHARAT KA ITIHAS,MEEMarkKSHI KANT,ACCESS PUBLISHING INDIA PVT.LTD,0
-9.78938E+12,"PHYSICAL, HUMAN AND ECONOMIC GEOGRAPHY",D R KHULLAR,ACCESS PUBLISHING INDIA PVT.LTD,4
-9.78938E+12,WORLD GEOGRAPHY,DR KHULLAR,ACCESS PUBLISHING INDIA PVT.LTD,5
-9.78938E+12,INDIA: MAP ENTRIES IN GEOGRAPHY,MAJID HUSAIN,ACCESS PUBLISHING INDIA PVT.LTD,5
-9.78938E+12,GOOD GOVERMarkNCE IN INDIA 2/ED.,G SUBBA RAO,ACCESS PUBLISHING INDIA PVT.LTD,1
-9.78938E+12,KAMYABI KE SUTRA-CIVIL SEWA PARIKSHA AAP KI MUTTHI MEIN,ASHOK KUMAR,ACCESS PUBLISHING INDIA PVT.LTD,0
-9.78938E+12,GENERAL SCIENCE PRELIRY EXAM,Mark,ACCESS PUBLISHING INDIA PVT.LTD,0
-9.78174E+12,SUCCESS AND DYSLEXIA,SUCCESS AND DYSLEXIA,ACER PRESS,0
-9.78174E+12,AN EXTRAORDIMarkRY SCHOOL,SARA JAMES,ACER PRESS,0
-9.78174E+12,POWERFUL PRACTICES FOR READING IMPROVEMENT,GLASSWELL,ACER PRESS,0
-9.78174E+12,EARLY CHILDHOOD PLAY MATTERS,SHOMark BASS,ACER PRESS,0
-9.78174E+12,LEADING LEARNING AND TEACHING,STEPHEN DINHAM,ACER PRESS,0
-9.78174E+12,READING AND LEARNING DIFFICULTIES,PETER WESTWOOD,ACER PRESS,0
-9.78174E+12,NUMERACY AND LEARNING DIFFICULTIES,PETER WOODLAND],ACER PRESS,0
-9.78174E+12,TEACHING AND LEARNING DIFFICULTIES,PETER WOODLAND,ACER PRESS,0
-9.78174E+12,USING DATA TO IMPROVE LEARNING,ANTHONY SHADDOCK,ACER PRESS,0
-9.78174E+12,PATHWAYS TO SCHOOL SYSTEM IMPROVEMENT,MICHAEL GAFFNEY,ACER PRESS,0
-9.78174E+12,FOR THOSE WHO TEACH,PHIL RIDDEN,ACER PRESS,0
-9.78174E+12,KEYS TO SCHOOL LEADERSHIP,PHIL RIDDEN & JOHN DE NOBILE,ACER PRESS,0
-9.78174E+12,DIVERSE LITERACIES IN EARLY CHILDHOOD,LEONIE ARTHUR,ACER PRESS,0
-9.78174E+12,CREATIVE ARTS IN THE LIVESOF YOUNG CHILDREN,ROBYN EWING,ACER PRESS,0
-9.78174E+12,SOCIAL AND EMOTIOMarkL DEVELOPMENT,ROS LEYDEN AND ERIN SHALE,ACER PRESS,0
-9.78174E+12,DISCUSSIONS IN SCIENCE,TIM SPROD,ACER PRESS,0
-9.78174E+12,YOUNG CHILDREN LEARNING MATHEMATICS,ROBERT HUNTING,ACER PRESS,0
-9.78174E+12,COACHING CHILDREN,KELLY SUMICH,ACER PRESS,1
-9.78174E+12,TEACHING PHYSICAL EDUCATIOMarkL IN PRIMARY SCHOOL,JANET L CURRIE,ACER PRESS,0
-9.78174E+12,ASSESSMENT AND REPORTING,PHIL RIDDEN AND SANDY,ACER PRESS,0
-9.78174E+12,COLLABORATION IN LEARNING,MAL LEE AND LORRAE WARD,ACER PRESS,0
-9.78086E+12,RE-IMAGINING EDUCATIMarkL LEADERSHIP,BRIAN J.CALDWELL,ACER PRESS,0
-9.78086E+12,TOWARDS A MOVING SCHOOL,FLEMING & KLEINHENZ,ACER PRESS,0
-9.78086E+12,DESINGNING A THINKING A CURRICULAM,SUSAN WILKS,ACER PRESS,0
-9.78086E+12,LEADING A DIGITAL SCHOOL,MAL LEE AND MICHEAL GAFFNEY,ACER PRESS,0
-9.78086E+12,NUMERACY,WESTWOOD,ACER PRESS,0
-9.78086E+12,TEACHING ORAL LANGUAGE,JOHN MUNRO,ACER PRESS,0
-9.78086E+12,SPELLING,WESTWOOD,ACER PRESS,0
-9.78819E+12,STORIES OF SHIVA,Mark,ACK,0
-9.78819E+12,JAMSET  JI TATA: THE MAN WHO SAW TOMORROW,,ACK,0
-9.78818E+12,HEROES FROM THE MAHABHARTA { 5-IN-1 },Mark,ACK,0
-9.78818E+12,SURYA,,ACK,0
-9.78818E+12,TALES OF THE MOTHER GODDESS,-,ACK,0
-9.78818E+12,ADVENTURES OF KRISHMark,Mark,ACK,0
-9.78818E+12,MAHATMA GANDHI,Mark,ACK,1
-9.78818E+12,TALES FROM THE PANCHATANTRA 3-IN-1,-,ACK,0
-9.78818E+12,YET MORE TALES FROM THE JATAKAS { 3-IN-1 },AMarkNT PAI,ACK,0
-9.78818E+12,LEGENDARY RULERS OF INDIA,-,ACK,0
-9.78818E+12,GREAT INDIAN CLASSIC,Mark,ACK,0
-9.78818E+12,TULSIDAS ' RAMAYAMark,Mark,ACK,0
-9.78818E+12,TALES OF HANUMAN,-,ACK,0
-9.78818E+12,VALMIKI'S RAMAYAMark,A C K,ACK,1
-9.78818E+12,THE BEST OF INIDAN WIT AND WISDOM,Mark,ACK,0
-9.78818E+12,MORE TALES FROM THE PANCHTANTRA,AMarkNT PAL,ACK,0
-9.78818E+12,THE GREAT MUGHALS {5-IN-1},AMarkNT.,ACK,0
-9.78818E+12,FAMOUS SCIENTISTS,Mark,ACK,0
-9.78818E+12,KOMarkRK,Mark,ACK,0
-9.78818E+12,THE MUGHAL COURT,REEMark,ACK,0
-9.78818E+12,MORE STORIES FROM THE JATAKAS,Mark,ACK,0
+2,TALES OF SHIVA,Mark,mark,0
+3,1Q84 THE COMPLETE TRILOGY,HARUKI MURAKAMI,Mark,0
+4,MY KUMAN,Mark,Mark,0
+5,THE GOD OF SMAAL THINGS,ARUNDHATI ROY,4TH HARPER COLLINS,2
+6,THE BLACK CIRCLE,Mark,4TH HARPER COLLINS,0
+7,THE THREE LAWS OF PERFORMANCE,Mark,4TH HARPER COLLINS,0
+8,CHAMarkKYA MANTRA,Mark,4TH HARPER COLLINS,0
+9,59.FLAGS,Mark,4TH HARPER COLLINS,0
+10,THE POWER OF POSITIVE THINKING FROM,Mark,A & A PUBLISHER,0
+11,YOU CAN IF YO THINK YO CAN,PEALE,A & A PUBLISHER,0
+12,DONGRI SE DUBAI TAK (MPH),Mark,A & A PUBLISHER,0
+13,MarkLANDA ADYTAN KOSH,Mark,AADISH BOOK DEPOT,0
+14,MarkLANDA VISHAL SHABD SAGAR,-,AADISH BOOK DEPOT,1
+15,MarkLANDA CONCISE DICT(ENG TO HINDI),Mark,AADISH BOOK DEPOT,0
+16,LIEUTEMarkMarkT GENERAL BHAGAT: A SAGA OF BRAVERY AND LEADERSHIP,Mark,AAM COMICS,2
+17,LN. MarkIK SUNDER SINGH,N.A,AAN COMICS,0
+18,I AM KRISHMark,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,1
+19,DON'T TEACH ME TOLERANCE INDIA,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,0
+20,MUJHE SAHISHNUTA MAT SIKHAO BHARAT,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,0
+21,SECRETS OF DESTINY,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,1
+22,BHAGYA KE RAHASYA (HINDI) SECRET OF DESTINY,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,1
+23,MEIN MANN HOON,DEEP TRIVEDI,AATMAN INNOVATIONS PVT LTD,0
+24,I AM THE MIND,DEEP TRIVEDI,AATMARAM & SONS,0
+25,THE ART OF CHOOSING,SHEEMark IYENGAR,ABACUS,0
+26,IN SPITE OF THE GODS,EDWARD LUCE,ABACUS,1
+27,QUESTIONS & ANWERS ABOUT THE GREAT BIBLE,Mark,ABC PUBLISHERS DISTRIBUTORS,4
+28,NIBANDH EVAM KAHANI LEKHAN { HINDI },Mark,ABHI BOOKS,1
+29,INDIAN ECONOMY SINCE INDEPENDENCE 27TH /E,UMA KAPILA,ACADEMIC FOUNDATION,1
+30,ECONOMIC DEVELOPMENT AND POLICY IN INDIA,UMA KAPILA,ACADEMIC FOUNDATION,1
+31,INDIAN ECONOMY PERFORMANCE 18TH/E  2017-2018,UMA KAPILA,ACADEMIC FOUNDATION,2
+32,INDIAN ECONOMIC DEVELOPMENTSINCE 1947 (NO RETURMarkBLE),UMA KAPILA,ACADEMIC FOUNDATION,1
+33,PRELIMS SPECIAL READING COMPREHENSION PAPER II CSAT,MarkGENDRA PRATAP,ACCESS PUBLISHING INDIA PVT.LTD,0
+34,THE CONSTITUTION OF INDIA 2ND / E,AR KHAN,ACCESS PUBLISHING INDIA PVT.LTD,10
+35,"INDIAN HERITAGE ,ART & CULTURE",MADHUKAR,ACCESS PUBLISHING INDIA PVT.LTD,10
+36,BHARAT KA SAMVIDHAN,AR KHAN,ACCESS PUBLISHING INDIA PVT.LTD,4
+37,"ETHICS, INTEGRITY & APTITUDE ( 3RD/E)","P N ROY ,G SUBBA RAO",ACCESS PUBLISHING INDIA PVT.LTD,10
+38,GENERAL STUDIES PAPER -- I (2016),Mark,ACCESS PUBLISHING INDIA PVT.LTD,0
+39,GENERAL STUDIES PAPER - II (2016),Mark,ACCESS PUBLISHING INDIA PVT.LTD,0
+40,INDIAN AND WORLD GEOGRAPHY 2E,D R KHULLAR,ACCESS PUBLISHING INDIA PVT.LTD,10
+41,VASTUNISTHA PRASHN SANGRAHA: BHARAT KA ITIHAS,MEEMarkKSHI KANT,ACCESS PUBLISHING INDIA PVT.LTD,0
+42,"PHYSICAL, HUMAN AND ECONOMIC GEOGRAPHY",D R KHULLAR,ACCESS PUBLISHING INDIA PVT.LTD,4
+43,WORLD GEOGRAPHY,DR KHULLAR,ACCESS PUBLISHING INDIA PVT.LTD,5
+44,INDIA: MAP ENTRIES IN GEOGRAPHY,MAJID HUSAIN,ACCESS PUBLISHING INDIA PVT.LTD,5
+45,GOOD GOVERMarkNCE IN INDIA 2/ED.,G SUBBA RAO,ACCESS PUBLISHING INDIA PVT.LTD,1
+46,KAMYABI KE SUTRA-CIVIL SEWA PARIKSHA AAP KI MUTTHI MEIN,ASHOK KUMAR,ACCESS PUBLISHING INDIA PVT.LTD,0
+47,GENERAL SCIENCE PRELIRY EXAM,Mark,ACCESS PUBLISHING INDIA PVT.LTD,0
+48,SUCCESS AND DYSLEXIA,SUCCESS AND DYSLEXIA,ACER PRESS,0
+49,AN EXTRAORDIMarkRY SCHOOL,SARA JAMES,ACER PRESS,0
+50,POWERFUL PRACTICES FOR READING IMPROVEMENT,GLASSWELL,ACER PRESS,0
+51,EARLY CHILDHOOD PLAY MATTERS,SHOMark BASS,ACER PRESS,0
+52,LEADING LEARNING AND TEACHING,STEPHEN DINHAM,ACER PRESS,0
+53,READING AND LEARNING DIFFICULTIES,PETER WESTWOOD,ACER PRESS,0
+54,NUMERACY AND LEARNING DIFFICULTIES,PETER WOODLAND],ACER PRESS,0
+55,TEACHING AND LEARNING DIFFICULTIES,PETER WOODLAND,ACER PRESS,0
+56,USING DATA TO IMPROVE LEARNING,ANTHONY SHADDOCK,ACER PRESS,0
+57,PATHWAYS TO SCHOOL SYSTEM IMPROVEMENT,MICHAEL GAFFNEY,ACER PRESS,0
+58,FOR THOSE WHO TEACH,PHIL RIDDEN,ACER PRESS,0
+59,KEYS TO SCHOOL LEADERSHIP,PHIL RIDDEN & JOHN DE NOBILE,ACER PRESS,0
+60,DIVERSE LITERACIES IN EARLY CHILDHOOD,LEONIE ARTHUR,ACER PRESS,0
+61,CREATIVE ARTS IN THE LIVESOF YOUNG CHILDREN,ROBYN EWING,ACER PRESS,0
+62,SOCIAL AND EMOTIOMarkL DEVELOPMENT,ROS LEYDEN AND ERIN SHALE,ACER PRESS,0
+63,DISCUSSIONS IN SCIENCE,TIM SPROD,ACER PRESS,0
+64,YOUNG CHILDREN LEARNING MATHEMATICS,ROBERT HUNTING,ACER PRESS,0
+65,COACHING CHILDREN,KELLY SUMICH,ACER PRESS,1
+66,TEACHING PHYSICAL EDUCATIOMarkL IN PRIMARY SCHOOL,JANET L CURRIE,ACER PRESS,0
+67,ASSESSMENT AND REPORTING,PHIL RIDDEN AND SANDY,ACER PRESS,0
+68,COLLABORATION IN LEARNING,MAL LEE AND LORRAE WARD,ACER PRESS,0
+69,RE-IMAGINING EDUCATIMarkL LEADERSHIP,BRIAN J.CALDWELL,ACER PRESS,0
+70,TOWARDS A MOVING SCHOOL,FLEMING & KLEINHENZ,ACER PRESS,0
+71,DESIGNING A THINKING A CURRICULAM,SUSAN WILKS,ACER PRESS,0
+72,LEADING A DIGITAL SCHOOL,MAL LEE AND MICHEAL GAFFNEY,ACER PRESS,0
+73,NUMERACY,WESTWOOD,ACER PRESS,0
+74,TEACHING ORAL LANGUAGE,JOHN MUNRO,ACER PRESS,0
+75,SPELLING,WESTWOOD,ACER PRESS,0
+76,STORIES OF SHIVA,Mark,ACK,0
+77,JAMSET  JI TATA: THE MAN WHO SAW TOMORROW,,ACK,0
+78,HEROES FROM THE MAHABHARTA { 5-IN-1 },Mark,ACK,0
+79,SURYA,,ACK,0
+80,TALES OF THE MOTHER GODDESS,-,ACK,0
+81,ADVENTURES OF KRISHMark,Mark,ACK,0
+82,MAHATMA GANDHI,Mark,ACK,1
+83,TALES FROM THE PANCHATANTRA 3-IN-1,-,ACK,0
+84,YET MORE TALES FROM THE JATAKAS { 3-IN-1 },AMarkNT PAI,ACK,0
+85,LEGENDARY RULERS OF INDIA,-,ACK,0
+86,GREAT INDIAN CLASSIC,Mark,ACK,0
+87,TULSIDAS ' RAMAYAMark,Mark,ACK,0
+88,TALES OF HANUMAN,-,ACK,0
+89,VALMIKI'S RAMAYAMark,A C K,ACK,1
+90,THE BEST OF INDIAN WIT AND WISDOM,Mark,ACK,0
+91,MORE TALES FROM THE PANCHTANTRA,AMarkNT PAL,ACK,0
+92,THE GREAT MUGHALS {5-IN-1},AMarkNT.,ACK,0
+93,FAMOUS SCIENTISTS,Mark,ACK,0
+94,KOMarkRK,Mark,ACK,0
+95,THE MUGHAL COURT,REEMark,ACK,0
+96,MORE STORIES FROM THE JATAKAS,Mark,ACK,0
diff --git a/docs/apache-airflow/tutorial.rst b/docs/apache-airflow/tutorial.rst
index df87f96..d9587bc 100644
--- a/docs/apache-airflow/tutorial.rst
+++ b/docs/apache-airflow/tutorial.rst
@@ -395,7 +395,7 @@ Create a Employee table in postgres using this
   create table "Employees_temp"
   (
       "Serial Number" numeric not null
-   constraint employees_pk
+   constraint employees_temp_pk
               primary key,
       "Company Name" text,
       "Employee Markme" text,
@@ -410,29 +410,23 @@ Let's break this down into 2 steps: get data & merge data:
 
   @task
   def get_data():
+      data_path = "/usr/local/airflow/dags/files/employees.csv"
+
       url = "https://raw.githubusercontent.com/apache/airflow/main/docs/apache-airflow/pipeline_example.csv"
 
       response = requests.request("GET", url)
 
-      with open("/usr/local/airflow/dags/files/employees.csv", "w") as file:
+      with open(data_path, "w") as file:
           for row in response.text.split("\n"):
-              file.write(row)
+              if row:
+                  file.write(row + "\n")
 
       postgres_hook = PostgresHook(postgres_conn_id="LOCAL")
       conn = postgres_hook.get_conn()
       cur = conn.cursor()
-      with open("/usr/local/airflow/dags/files/employees.csv", "r") as file:
-          cur.copy_from(
-              f,
-              "Employees_temp",
-              columns=[
-                  "Serial Number",
-                  "Company Name",
-                  "Employee Markme",
-                  "Description",
-                  "Leave",
-              ],
-              sep=",",
+      with open(data_path, "r") as file:
+          cur.copy_expert(
+              "COPY \"Employees_temp\" FROM stdin WITH CSV HEADER DELIMITER AS ','", file
           )
       conn.commit()
 
@@ -468,6 +462,12 @@ Lets look at our DAG:
 
 .. code-block:: python
 
+  from airflow.decorators import dag, task
+  from airflow.hooks.postgres_hook import PostgresHook
+  from datetime import datetime, timedelta
+  import requests
+
+
   @dag(
       schedule_interval="0 0 * * *",
       start_date=datetime(2021, 1, 1),
@@ -477,29 +477,25 @@ Lets look at our DAG:
   def Etl():
       @task
       def get_data():
+          # NOTE: configure this as appropriate for your airflow environment
+          data_path = "/usr/local/airflow/dags/files/employees.csv"
+
           url = "https://raw.githubusercontent.com/apache/airflow/main/docs/apache-airflow/pipeline_example.csv"
 
           response = requests.request("GET", url)
 
-          with open("/usr/local/airflow/dags/files/employees.csv", "w") as file:
+          with open(data_path, "w") as file:
               for row in response.text.split("\n"):
-                  file.write(row)
+                  if row:
+                      file.write(row + "\n")
 
           postgres_hook = PostgresHook(postgres_conn_id="LOCAL")
           conn = postgres_hook.get_conn()
           cur = conn.cursor()
-          with open("/usr/local/airflow/dags/files/employees.csv", "r") as file:
-              cur.copy_from(
+          with open(data_path, "r") as file:
+              cur.copy_expert(
+                  "COPY \"Employees_temp\" FROM stdin WITH CSV HEADER DELIMITER AS ','",
                   file,
-                  "Employees_temp",
-                  columns=[
-                      "Serial Number",
-                      "Company Name",
-                      "Employee Markme",
-                      "Description",
-                      "Leave",
-                  ],
-                  sep=",",
               )
           conn.commit()
 
@@ -530,7 +526,7 @@ Lets look at our DAG:
   dag = Etl()
 
 This dag runs daily at 00:00.
-Add this python file to airflow/dags folder and go back to the main folder and run
+Add this python file to airflow/dags folder (e.g. ``dags/etl.py``) and go back to the main folder and run
 
 .. code-block:: bash
 

[airflow] 18/24: Improve documentation on ``Params`` (#20567)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 543a78bd5b0b376c6591c384066961c0ae508e47
Author: Matt Rixman <58...@users.noreply.github.com>
AuthorDate: Mon Jan 3 20:40:10 2022 -0700

    Improve documentation on ``Params`` (#20567)
    
    I think that this doc could be improved by adding examples of how to reference the params in your dag. (Also, the current example code causes this: #20559.)
    
    While trying to find the right place to work a few reference examples in, I ended up rewriting quite a lot of it.
    Let me know if you think that this is an improvement.
    
    I haven't yet figured out how to build this and view it locally, and I'd want to do that as a sanity check before merging it, but I figured get feedback on what I've written before I do that.
    
    (cherry picked from commit 064efbeae7c2560741c5a8928799482ef795e100)
---
 docs/apache-airflow/concepts/params.rst | 146 ++++++++++++++++++++++++++------
 1 file changed, 119 insertions(+), 27 deletions(-)

diff --git a/docs/apache-airflow/concepts/params.rst b/docs/apache-airflow/concepts/params.rst
index c508279..ef266ea 100644
--- a/docs/apache-airflow/concepts/params.rst
+++ b/docs/apache-airflow/concepts/params.rst
@@ -15,16 +15,21 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _concepts:params:
+
 Params
 ======
 
-Params are Airflow's concept of providing runtime configuration to tasks when a dag gets triggered manually.
-Params are configured while defining the dag & tasks, that can be altered while doing a manual trigger. The
-ability to update params while triggering a DAG depends on the flag ``core.dag_run_conf_overrides_params``,
-so if that flag is ``False``, params would behave like constants.
+Params are how Airflow provides runtime configuration to tasks.
+When you trigger a DAG manually, you can modify its Params before the dagrun starts.
+If the user-supplied values don't pass validation, Airflow shows a warning instead of creating the dagrun.
+(For scheduled runs, the default values are used.)
+
+Adding Params to a DAG
+----------------------
 
-To use them, one can use the ``Param`` class for complex trigger-time validations or simply use primitive types,
-which won't be doing any such validations.
+To add Params to a :class:`~airflow.models.dag.DAG`, initialize it with the ``params`` kwarg.
+Use a dictionary that maps Param names to a either a :class:`~airflow.models.param.Param` or an object indicating the parameter's default value.
 
 .. code-block::
 
@@ -32,33 +37,120 @@ which won't be doing any such validations.
     from airflow.models.param import Param
 
     with DAG(
-        'my_dag',
+        "the_dag",
         params={
-            'int_param': Param(10, type='integer', minimum=0, maximum=20),  # a int param with default value
-            'str_param': Param(type='string', minLength=2, maxLength=4),    # a mandatory str param
-            'dummy_param': Param(type=['null', 'number', 'string'])         # a param which can be None as well
-            'old_param': 'old_way_of_passing',                              # i.e. no data or type validations
-            'simple_param': Param('im_just_like_old_param'),                # i.e. no data or type validations
-            'email_param': Param(
-                default='example@example.com',
-                type='string',
-                format='idn-email',
-                minLength=5,
-                maxLength=255,
-            ),
+            "x": Param(5, type="integer", minimum=3),
+            "y": 6
         },
+    ) as the_dag:
+
+Referencing Params in a Task
+----------------------------
+
+Params are stored as ``params`` in the :ref:`template context <templates-ref>`.
+So you can reference them in a template.
+
+.. code-block::
+
+    PythonOperator(
+        task_id="from_template",
+        op_args=[
+            "{{ params.int_param + 10 }}",
+        ],
+        python_callable=(
+            lambda x: print(x)
+        ),
+    )
+
+Even though Params can use a variety of types, the default behavior of templates is to provide your task with a string.
+You can change this by setting ``render_template_as_native_obj=True`` while initializing the :class:`~airflow.models.dag.DAG`.
+
+.. code-block::
+
+    with DAG(
+        "the_dag",
+        params={"x": Param(5, type="integer", minimum=3)},
+        render_template_as_native_obj=True
+    ) as the_dag:
+
+
+This way, the Param's type is respected when its provided to your task.
+
+.. code-block::
+
+    # prints <class 'str'> by default
+    # prints <class 'int'> if render_template_as_native_obj=True
+    PythonOperator(
+        task_id="template_type",
+        op_args=[
+            "{{ params.int_param }}",
+        ],
+        python_callable=(
+            lambda x: print(type(x))
+        ),
     )
 
-``Param`` make use of `json-schema <https://json-schema.org/>`__ to define the properties and doing the
-validation, so one can use the full json-schema specifications mentioned at
-https://json-schema.org/draft/2020-12/json-schema-validation.html to define the construct of a ``Param``
-objects.
+Another way to access your param is via a task's ``context`` kwarg.
 
-Also, it worthwhile to note that if you have any DAG which uses a mandatory param value, i.e. a ``Param``
-object with no default value or ``null`` as an allowed type, that DAG schedule has to be ``None``. However,
-if such ``Param`` has been defined at task level, Airflow has no way to restrict that & the task would be
-failing at the execution time.
+.. code-block::
+
+    def print_x(**context):
+        print(context["params"]["x"])
+
+    PythonOperator(
+        task_id="print_x",
+        python_callable=print_it,
+    )
+
+Task-level Params
+-----------------
+
+You can also add Params to individual tasks.
+
+.. code-block::
+
+    PythonOperator(
+        task_id="print_x",
+        params={"x": 10},
+        python_callable=print_it,
+    )
+
+If there's already a dag param with that name, the task-level default will take precedence over the dag-level default.
+If a user supplies their own value when the DAG was triggered, Airflow ignores all defaults and uses the user's value.
+
+JSON Schema Validation
+----------------------
+
+:class:`~airflow.modules.param.Param` makes use of ``json-schema <https://json-schema.org/>``, so you can use the full json-schema specifications mentioned at https://json-schema.org/draft/2020-12/json-schema-validation.html to define ``Param`` objects.
+
+.. code-block::
+
+    with DAG(
+        "my_dag",
+        params={
+            # a int with a default value
+            "int_param": Param(10, type="integer", minimum=0, maximum=20),
+
+            # a required param which can be of multiple types
+            "dummy": Param(type=["null", "number", "string"]),
+
+            # a param which uses json-schema formatting
+            "email": Param(
+                default="example@example.com",
+                type="string",
+                format="idn-email",
+                minLength=5,
+                maxLength=255,
+            ),
+        },
+    ) as my_dag:
 
 .. note::
     As of now, for security reasons, one can not use Param objects derived out of custom classes. We are
     planning to have a registration system for custom Param classes, just like we've for Operator ExtraLinks.
+
+Disabling Runtime Param Modification
+------------------------------------
+
+The ability to update params while triggering a DAG depends on the flag ``core.dag_run_conf_overrides_params``.
+Setting this config to ``False`` will effectively turn your default params into constants.

[airflow] 06/24: Doc: Improve tutorial documentation and code (#19186)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 2cc9ed00b22245fcc9c9feb4dac36817a1447ff5
Author: Kian Yang Lee <ke...@gmail.com>
AuthorDate: Wed Oct 27 02:37:01 2021 +0800

    Doc: Improve tutorial documentation and code (#19186)
    
    1. Added instructions on adding postgres connection.
    2. Modified proper SQL syntax.
    3. Remove redundant lines when writing to CSV.
    4. Added QUOTE argument for copy_expert
    
    (cherry picked from commit f83099cd4c2eaecfd363fce4562aa19e975c146c)
---
 docs/apache-airflow/tutorial.rst | 99 ++++++++++++++++++++++------------------
 1 file changed, 55 insertions(+), 44 deletions(-)

diff --git a/docs/apache-airflow/tutorial.rst b/docs/apache-airflow/tutorial.rst
index d9587bc..acb7e84 100644
--- a/docs/apache-airflow/tutorial.rst
+++ b/docs/apache-airflow/tutorial.rst
@@ -377,73 +377,86 @@ We need to have docker and postgres installed.
 We will be using this `docker file <https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#docker-compose-yaml>`_
 Follow the instructions properly to set up Airflow.
 
-Create a Employee table in postgres using this
+Create a Employee table in postgres using this:
 
 .. code-block:: sql
 
-  create table "Employees"
+  CREATE TABLE "Employees"
   (
-      "Serial Number" numeric not null
-   constraint employees_pk
-              primary key,
-      "Company Name" text,
-      "Employee Markme" text,
-      "Description" text,
-      "Leave" integer
+      "Serial Number" NUMERIC PRIMARY KEY,
+      "Company Name" TEXT,
+      "Employee Markme" TEXT,
+      "Description" TEXT,
+      "Leave" INTEGER
   );
 
-  create table "Employees_temp"
+  CREATE TABLE "Employees_temp"
   (
-      "Serial Number" numeric not null
-   constraint employees_temp_pk
-              primary key,
-      "Company Name" text,
-      "Employee Markme" text,
-      "Description" text,
-      "Leave" integer
+      "Serial Number" NUMERIC PRIMARY KEY,
+      "Company Name" TEXT,
+      "Employee Markme" TEXT,
+      "Description" TEXT,
+      "Leave" INTEGER
   );
 
+We also need to add a connection to postgres. Go to the UI and click "Admin" >> "Connections". Specify the following for each field:
+
+- Conn id: LOCAL
+- Conn Type: postgres
+- Host: postgres
+- Schema: <DATABASE_NAME>
+- Login: airflow
+- Password: airflow
+- Port: 5432
+
+After that, you can test your connection and if you followed all the steps correctly, it should show a success notification. Proceed with saving the connection and we are now ready write the DAG.
 
 Let's break this down into 2 steps: get data & merge data:
 
 .. code-block:: python
 
+  from airflow.decorators import dag, task
+  from airflow.hooks.postgres import PostgresHook
+  from datetime import datetime, timedelta
+  import requests
+
+
   @task
   def get_data():
-      data_path = "/usr/local/airflow/dags/files/employees.csv"
+      # NOTE: configure this as appropriate for your airflow environment
+      data_path = "/opt/airflow/dags/files/employees.csv"
 
       url = "https://raw.githubusercontent.com/apache/airflow/main/docs/apache-airflow/pipeline_example.csv"
 
       response = requests.request("GET", url)
 
       with open(data_path, "w") as file:
-          for row in response.text.split("\n"):
-              if row:
-                  file.write(row + "\n")
+          file.write(response.text)
 
       postgres_hook = PostgresHook(postgres_conn_id="LOCAL")
       conn = postgres_hook.get_conn()
       cur = conn.cursor()
       with open(data_path, "r") as file:
           cur.copy_expert(
-              "COPY \"Employees_temp\" FROM stdin WITH CSV HEADER DELIMITER AS ','", file
+              "COPY \"Employees_temp\" FROM STDIN WITH CSV HEADER DELIMITER AS ',' QUOTE '\"'",
+              file,
           )
       conn.commit()
 
-Here we are passing a ``GET`` request to get the data from the URL and save it in ``employees.csv`` file on our Airflow instance and we are dumping the file into a temporary table before merging the data to the final employees table
+Here we are passing a ``GET`` request to get the data from the URL and save it in ``employees.csv`` file on our Airflow instance and we are dumping the file into a temporary table before merging the data to the final employees table.
 
 .. code-block:: python
 
   @task
   def merge_data():
       query = """
-          delete
-          from "Employees" e using "Employees_temp" et
-          where e."Serial Number" = et."Serial Number";
+          DELETE FROM "Employees" e
+          USING "Employees_temp" et
+          WHERE e."Serial Number" = et."Serial Number";
 
-          insert into "Employees"
-          select *
-          from "Employees_temp";
+          INSERT INTO "Employees"
+          SELECT *
+          FROM "Employees_temp";
       """
       try:
           postgres_hook = PostgresHook(postgres_conn_id="LOCAL")
@@ -455,7 +468,7 @@ Here we are passing a ``GET`` request to get the data from the URL and save it i
       except Exception as e:
           return 1
 
-Here we are first looking for duplicate values and removing them before we insert new values in our final table
+Here we are first looking for duplicate values and removing them before we insert new values in our final table.
 
 
 Lets look at our DAG:
@@ -478,23 +491,21 @@ Lets look at our DAG:
       @task
       def get_data():
           # NOTE: configure this as appropriate for your airflow environment
-          data_path = "/usr/local/airflow/dags/files/employees.csv"
+          data_path = "/opt/airflow/dags/files/employees.csv"
 
           url = "https://raw.githubusercontent.com/apache/airflow/main/docs/apache-airflow/pipeline_example.csv"
 
           response = requests.request("GET", url)
 
           with open(data_path, "w") as file:
-              for row in response.text.split("\n"):
-                  if row:
-                      file.write(row + "\n")
+              file.write(response.text)
 
           postgres_hook = PostgresHook(postgres_conn_id="LOCAL")
           conn = postgres_hook.get_conn()
           cur = conn.cursor()
           with open(data_path, "r") as file:
               cur.copy_expert(
-                  "COPY \"Employees_temp\" FROM stdin WITH CSV HEADER DELIMITER AS ','",
+                  "COPY \"Employees_temp\" FROM STDIN WITH CSV HEADER DELIMITER AS ',' QUOTE '\"'",
                   file,
               )
           conn.commit()
@@ -502,13 +513,13 @@ Lets look at our DAG:
       @task
       def merge_data():
           query = """
-                  delete
-                  from "Employees" e using "Employees_temp" et
-                  where e."Serial Number" = et."Serial Number";
+                  DELETE FROM "Employees" e
+                  USING "Employees_temp" et
+                  WHERE e."Serial Number" = et."Serial Number";
 
-                  insert into "Employees"
-                  select *
-                  from "Employees_temp";
+                  INSERT INTO "Employees"
+                  SELECT *
+                  FROM "Employees_temp";
                   """
           try:
               postgres_hook = PostgresHook(postgres_conn_id="LOCAL")
@@ -526,14 +537,14 @@ Lets look at our DAG:
   dag = Etl()
 
 This dag runs daily at 00:00.
-Add this python file to airflow/dags folder (e.g. ``dags/etl.py``) and go back to the main folder and run
+Add this python file to airflow/dags folder (e.g. ``dags/etl.py``) and go back to the main folder and run:
 
 .. code-block:: bash
 
   docker-compose up airflow-init
   docker-compose up
 
-Go to your browser and go to the site http://localhost:8080/home and trigger your DAG Airflow Example
+Go to your browser and go to the site http://localhost:8080/home and trigger your DAG Airflow Example:
 
 .. image:: img/new_tutorial-1.png
 
@@ -541,7 +552,7 @@ Go to your browser and go to the site http://localhost:8080/home and trigger you
 .. image:: img/new_tutorial-2.png
 
 The DAG ran successfully as we can see the green boxes. If there had been an error the boxes would be red.
-Before the DAG run my local table had 10 rows after the DAG run it had approx 100 rows
+Before the DAG run my local table had 10 rows after the DAG run it had approx 100 rows.
 
 
 What's Next?

[airflow] 14/24: Fix typo in MySQL Database creation code (Set up DB docs) (#20102)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 8ecdcb9c2f8167d997d123a1aebc1d19de1e2ccb
Author: Richard Pelgrim <68...@users.noreply.github.com>
AuthorDate: Wed Dec 8 12:27:58 2021 +0100

    Fix typo in MySQL Database creation code (Set up DB docs)  (#20102)
    
    * Fix typo in MySQL Database creation code
    
    Character set `utf8` is an alias for `utf8mb3`, see docs linked below. This means collation should be set to `utf8mb3_unicode_ci`.
    Using `COLLATE utf8mb4_unicode_ci` (current code) throws the following error:
    `ERROR 1253 (42000): COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'utf8'`
    
    (cherry picked from commit 9347832633a41b7b1e759d32ee756e7987f14a0d)
---
 docs/apache-airflow/howto/set-up-database.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/apache-airflow/howto/set-up-database.rst b/docs/apache-airflow/howto/set-up-database.rst
index f2f7fb6..14051b3 100644
--- a/docs/apache-airflow/howto/set-up-database.rst
+++ b/docs/apache-airflow/howto/set-up-database.rst
@@ -238,7 +238,7 @@ In the example below, a database ``airflow_db`` and user  with username ``airflo
 
 .. code-block:: sql
 
-   CREATE DATABASE airflow_db CHARACTER SET utf8 COLLATE utf8mb4_unicode_ci;
+   CREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    CREATE USER 'airflow_user' IDENTIFIED BY 'airflow_pass';
    GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user';
 

[airflow] 13/24: Correct set-up-database.rst (#20090)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 239b1dcd5907d3e3c80ceb1aa7f9c8ec01921aa6
Author: Pavan Sabnis <pv...@gmail.com>
AuthorDate: Tue Dec 7 00:57:58 2021 -0800

    Correct set-up-database.rst (#20090)
    
    (cherry picked from commit 6d42b0e611c6b68c611dc25f3de2661e268cf3fe)
---
 docs/apache-airflow/howto/set-up-database.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/apache-airflow/howto/set-up-database.rst b/docs/apache-airflow/howto/set-up-database.rst
index a3d3666..f2f7fb6 100644
--- a/docs/apache-airflow/howto/set-up-database.rst
+++ b/docs/apache-airflow/howto/set-up-database.rst
@@ -64,7 +64,7 @@ Setting up a SQLite Database
 ----------------------------
 
 SQLite database can be used to run Airflow for development purpose as it does not require any database server
-(the database is stored in a local file). There are a few limitations of using the SQLite database (for example
+(the database is stored in a local file). There are many limitations of using the SQLite database (for example
 it only works with Sequential Executor) and it should NEVER be used for production.
 
 There is a minimum version of sqlite3 required to run Airflow 2.0+ - minimum version is 3.15.0. Some of the