You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by ka...@apache.org on 2021/09/11 11:00:18 UTC

[airflow] branch v2-1-test updated (45eb384 -> 1621890)

This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a change to branch v2-1-test
in repository https://gitbox.apache.org/repos/asf/airflow.git.


 discard 45eb384  Fixes warm shutdown for celery worker. (#18068)
 discard c81aa2b  Add Changelog for 2.1.4
 discard 2ef6ab1  Bump version to 2.1.4
 discard c1e9f80  Do not let create_dagrun overwrite explicit run_id (#17728)
     new a3728cf  Do not let create_dagrun overwrite explicit run_id (#17728)
     new 58b562c  Bump version to 2.1.4
     new 9bb26b7  Fixes warm shutdown for celery worker. (#18068)
     new ccdc121  Regression on pid reset to allow task start after heartbeat (#17333)
     new 1621890  Add Changelog for 2.1.4

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (45eb384)
            \
             N -- N -- N   refs/heads/v2-1-test (1621890)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 5 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG.txt                  | 1 +
 airflow/models/taskinstance.py | 1 +
 tests/conftest.py              | 1 -
 3 files changed, 2 insertions(+), 1 deletion(-)

[airflow] 05/05: Add Changelog for 2.1.4

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch v2-1-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 1621890d705f20a6a294ce6ff2900ef73b96ab78
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Fri Sep 10 15:11:27 2021 +0100

    Add Changelog for 2.1.4
---
 CHANGELOG.txt | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/CHANGELOG.txt b/CHANGELOG.txt
index ecbb6b6..4318c22 100644
--- a/CHANGELOG.txt
+++ b/CHANGELOG.txt
@@ -1,3 +1,43 @@
+Airflow 2.1.4, 2021-09-15
+-------------------------
+
+Bug Fixes
+"""""""""
+
+- Fix deprecation error message rather than silencing it (#18126)
+- Limit the number of queued dagruns created by the Scheduler (#18065)
+- Fix ``DagRun`` execution order from queued to running not being properly followed (#18061)
+- Fix ``max_active_runs`` not allowing moving of queued dagruns to running (#17945)
+- Avoid redirect loop for users with no permissions (#17838)
+- Avoid endless redirect loop when user has no roles (#17613)
+- Fix log links on graph TI modal (#17862)
+- Hide variable import form if user lacks permission (#18000)
+- Improve dag/task concurrency check (#17786)
+- Fix Clear task instances endpoint resets all DAG runs bug (#17961)
+- Fixes incorrect parameter passed to views (#18083) (#18085)
+- Fix Sentry handler from ``LocalTaskJob`` causing error (#18119)
+- Limit ``colorlog`` version (6.x is incompatible) (#18099)
+- Only show Pause/Unpause tooltip on hover (#17957)
+- Improve graph view load time for dags with open groups (#17821)
+- Increase width for Run column (#17817)
+- Fix wrong query on running tis (#17631)
+- Add root to tree refresh url (#17633)
+- Do not delete running DAG from the UI (#17630)
+- Improve discoverability of Provider packages' functionality
+- Do not let ``create_dagrun`` overwrite explicit ``run_id`` (#17728)
+- BugFix: Regression on pid reset to allow task start after heartbeat (#17333)
+
+Doc only changes
+""""""""""""""""
+
+- Update version added fields in airflow/config_templates/config.yml (#18128)
+- Improve the description of how to handle dynamic task generation (#17963)
+- Improve cross-links to operators and hooks references (#17622)
+- Doc: Fix replacing Airflow version for Docker stack (#17711)
+- Make the providers operators/hooks reference much more usable (#17768)
+- Update description about the new ``connection-types`` provider meta-data
+- Suggest to use secrets backend for variable when it contains sensitive data (#17319)
+
 Airflow 2.1.3, 2021-08-21
 -------------------------
 

[airflow] 04/05: Regression on pid reset to allow task start after heartbeat (#17333)

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch v2-1-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit ccdc121383d1f3a75fce2a3a7db1d980e8e6d3bf
Author: nmehraein <ni...@mehraein.fr>
AuthorDate: Tue Aug 3 12:36:57 2021 +0200

    Regression on pid reset to allow task start after heartbeat (#17333)
    
    Regression on PID reset to allow task start after heartbeat
    
    Co-authored-by: Nicolas MEHRAEIN <ni...@adevinta.com>
    (cherry picked from commit ed99eaafc479aedbbe2d618da878195a132abb1a)
---
 airflow/models/taskinstance.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/airflow/models/taskinstance.py b/airflow/models/taskinstance.py
index c715f22..b492469 100644
--- a/airflow/models/taskinstance.py
+++ b/airflow/models/taskinstance.py
@@ -1030,6 +1030,7 @@ class TaskInstance(Base, LoggingMixin):
         self.refresh_from_db(session=session, lock_for_update=True)
         self.job_id = job_id
         self.hostname = get_hostname()
+        self.pid = None
 
         if not ignore_all_deps and not ignore_ti_state and self.state == State.SUCCESS:
             Stats.incr('previously_succeeded', 1, 1)

[airflow] 01/05: Do not let create_dagrun overwrite explicit run_id (#17728)

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch v2-1-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit a3728cf83ddee9e1458fea9c706aa55916d42f2b
Author: Tzu-ping Chung <tp...@astronomer.io>
AuthorDate: Thu Aug 19 22:33:09 2021 +0800

    Do not let create_dagrun overwrite explicit run_id (#17728)
    
    Previous DAG.create_dagrun() has an weird behavior that when *all* of
    run_id, execution_date, and run_type are provided, the function would
    ignore the run_id argument and overwrite it by auto-generating a run_id
    with DagRun.generate_run_id(). This fix the logic to respect the
    explicit run_id value.
    
    I don't think any of the "Airflow proper" code would be affected by
    this, but the dag_maker fixture used in the test suite needs to be
    tweaked a bit to continue working.
    
    (cherry picked from commit 50771e0f66803d0a0a0b552ab77f4e6be7d1088b)
---
 airflow/models/dag.py |  9 +++++----
 tests/conftest.py     | 17 ++++++++++-------
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/airflow/models/dag.py b/airflow/models/dag.py
index 4ac2ace..a1419fe 100644
--- a/airflow/models/dag.py
+++ b/airflow/models/dag.py
@@ -1767,15 +1767,16 @@ class DAG(LoggingMixin):
         :param dag_hash: Hash of Serialized DAG
         :type dag_hash: str
         """
-        if run_id and not run_type:
+        if run_id:  # Infer run_type from run_id if needed.
             if not isinstance(run_id, str):
                 raise ValueError(f"`run_id` expected to be a str is {type(run_id)}")
-            run_type: DagRunType = DagRunType.from_run_id(run_id)
-        elif run_type and execution_date:
+            if not run_type:
+                run_type = DagRunType.from_run_id(run_id)
+        elif run_type and execution_date is not None:  # Generate run_id from run_type and execution_date.
             if not isinstance(run_type, DagRunType):
                 raise ValueError(f"`run_type` expected to be a DagRunType is {type(run_type)}")
             run_id = DagRun.generate_run_id(run_type, execution_date)
-        elif not run_id:
+        else:
             raise AirflowException(
                 "Creating DagRun needs either `run_id` or both `run_type` and `execution_date`"
             )
diff --git a/tests/conftest.py b/tests/conftest.py
index 0873ac4..3d053cf 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -459,13 +459,16 @@ def dag_maker(request):
 
         def create_dagrun(self, **kwargs):
             dag = self.dag
-            defaults = dict(
-                run_id='test',
-                state=State.RUNNING,
-                execution_date=self.start_date,
-                start_date=self.start_date,
-            )
-            kwargs = {**defaults, **kwargs}
+            kwargs = {
+                "state": State.RUNNING,
+                "execution_date": self.start_date,
+                "start_date": self.start_date,
+                **kwargs,
+            }
+            # Need to provide run_id if the user does not either provide one
+            # explicitly, or pass run_type for inference in dag.create_dagrun().
+            if "run_id" not in kwargs and "run_type" not in kwargs:
+                kwargs["run_id"] = "test"
             self.dag_run = dag.create_dagrun(**kwargs)
             return self.dag_run
 

[airflow] 02/05: Bump version to 2.1.4

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch v2-1-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 58b562cf63c43a892da0e7bad0c1b362e28ce8d1
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Fri Sep 10 15:06:48 2021 +0100

    Bump version to 2.1.4
---
 README.md | 16 ++++++++--------
 setup.py  |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/README.md b/README.md
index 722fad2..8981e3a 100644
--- a/README.md
+++ b/README.md
@@ -82,7 +82,7 @@ Airflow is not a streaming solution, but it is often used to process real-time d
 
 Apache Airflow is tested with:
 
-|                      | Main version (dev)        | Stable version (2.1.3)   |
+|                      | Main version (dev)        | Stable version (2.1.4)   |
 | -------------------- | ------------------------- | ------------------------ |
 | Python               | 3.6, 3.7, 3.8, 3.9        | 3.6, 3.7, 3.8, 3.9       |
 | Kubernetes           | 1.20, 1.19, 1.18          | 1.20, 1.19, 1.18         |
@@ -142,15 +142,15 @@ them to appropriate format and workflow that your tool requires.
 
 
 ```bash
-pip install apache-airflow==2.1.3 \
- --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.3/constraints-3.7.txt"
+pip install apache-airflow==2.1.4 \
+ --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.7.txt"
 ```
 
 2. Installing with extras (for example postgres,google)
 
 ```bash
-pip install apache-airflow[postgres,google]==2.1.3 \
- --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.3/constraints-3.7.txt"
+pip install apache-airflow[postgres,google]==2.1.4 \
+ --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.7.txt"
 ```
 
 For information on installing provider packages check
@@ -231,7 +231,7 @@ packages:
 * **Airflow Providers**: SemVer rules apply to changes in the particular provider's code only.
   SemVer MAJOR and MINOR versions for the packages are independent from Airflow version.
   For example `google 4.1.0` and `amazon 3.0.3` providers can happily be installed
-  with `Airflow 2.1.3`. If there are limits of cross-dependencies between providers and Airflow packages,
+  with `Airflow 2.1.4`. If there are limits of cross-dependencies between providers and Airflow packages,
   they are present in providers as `install_requires` limitations. We aim to keep backwards
   compatibility of providers with all previously released Airflow 2 versions but
   there will be sometimes breaking changes that might make some, or all
@@ -254,7 +254,7 @@ Apache Airflow version life cycle:
 
 | Version | Current Patch/Minor | State     | First Release | Limited Support | EOL/Terminated |
 |---------|---------------------|-----------|---------------|-----------------|----------------|
-| 2       | 2.1.3               | Supported | Dec 17, 2020  | Dec 2021        | TBD            |
+| 2       | 2.1.4               | Supported | Dec 17, 2020  | Dec 2021        | TBD            |
 | 1.10    | 1.10.15             | EOL       | Aug 27, 2018  | Dec 17, 2020    | June 17, 2021  |
 | 1.9     | 1.9.0               | EOL       | Jan 03, 2018  | Aug 27, 2018    | Aug 27, 2018   |
 | 1.8     | 1.8.2               | EOL       | Mar 19, 2017  | Jan 03, 2018    | Jan 03, 2018   |
@@ -280,7 +280,7 @@ They are based on the official release schedule of Python and Kubernetes, nicely
 
 2. The "oldest" supported version of Python/Kubernetes is the default one. "Default" is only meaningful
    in terms of "smoke tests" in CI PRs which are run using this default version and default reference
-   image available. Currently ``apache/airflow:latest`` and ``apache/airflow:2.1.3` images
+   image available. Currently ``apache/airflow:latest`` and ``apache/airflow:2.1.4` images
    are both Python 3.6 images, however the first MINOR/MAJOR release of Airflow release after 23.12.2021 will
    become Python 3.7 images.
 
diff --git a/setup.py b/setup.py
index 33cb4f9..33e1c8d 100644
--- a/setup.py
+++ b/setup.py
@@ -41,7 +41,7 @@ PY39 = sys.version_info >= (3, 9)
 
 logger = logging.getLogger(__name__)
 
-version = '2.1.3'
+version = '2.1.4'
 
 my_dir = dirname(__file__)
 

[airflow] 03/05: Fixes warm shutdown for celery worker. (#18068)

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch v2-1-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 9bb26b7d51fdd21576b21b79b2f9f3efd3db398c
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Fri Sep 10 20:13:31 2021 +0200

    Fixes warm shutdown for celery worker. (#18068)
    
    The way how dumb-init propagated the signal by default
    made celery worker not to handle termination well.
    
    Default behaviour of dumb-init is to propagate signals to the
    process group rather than to the single child it uses. This is
    protective behaviour, in case a user runs 'bash -c' command
    without 'exec' - in this case signals should be sent not only
    to the bash but also to the process(es) it creates, otherwise
    bash exits without propagating the signal and you need second
    signal to kill all processes.
    
    However some airflow processes (in particular airflow celery worker)
    behave in a responsible way and handles the signals appropriately
    - when the first signal is received, it will switch to offline
    mode and let all workers terminate (until grace period expires
    resulting in Warm Shutdown.
    
    Therefore we can disable the protection of dumb-init and let it
    propagate the signal to only the single child it spawns in the
    Helm Chart. Documentation of the image was also updated to include
    explanation of signal propagation. For explicitness the
    DUMB_INIT_SETSID variable has been set to 1 in the image as well.
    
    Fixes #18066
    
    (cherry picked from commit 9e13e450032f4c71c54d091e7f80fe685204b5b4)
---
 Dockerfile                                     |  1 +
 chart/templates/workers/worker-deployment.yaml |  3 ++
 docs/docker-stack/entrypoint.rst               | 41 ++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/Dockerfile b/Dockerfile
index e08a050..de9248c 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -479,6 +479,7 @@ LABEL org.apache.airflow.distro="debian" \
   org.opencontainers.image.title="Production Airflow Image" \
   org.opencontainers.image.description="Reference, production-ready Apache Airflow image"
 
+ENV DUMB_INIT_SETSID="1"
 
 ENTRYPOINT ["/usr/bin/dumb-init", "--", "/entrypoint"]
 CMD []
diff --git a/chart/templates/workers/worker-deployment.yaml b/chart/templates/workers/worker-deployment.yaml
index 38e4e6d..7ae2627 100644
--- a/chart/templates/workers/worker-deployment.yaml
+++ b/chart/templates/workers/worker-deployment.yaml
@@ -169,6 +169,9 @@ spec:
           envFrom:
           {{- include "custom_airflow_environment_from" . | default "\n  []" | indent 10 }}
           env:
+            # Only signal the main process, not the process group, to make Warm Shutdown work properly
+            - name: DUMB_INIT_SETSID
+              value: "0"
           {{- include "custom_airflow_environment" . | indent 10 }}
           {{- include "standard_airflow_environment" . | indent 10 }}
           {{- if .Values.workers.kerberosSidecar.enabled }}
diff --git a/docs/docker-stack/entrypoint.rst b/docs/docker-stack/entrypoint.rst
index a999892..4b64904 100644
--- a/docs/docker-stack/entrypoint.rst
+++ b/docs/docker-stack/entrypoint.rst
@@ -161,6 +161,47 @@ If there are any other arguments - they are simply passed to the "airflow" comma
   > docker run -it apache/airflow:2.1.0-python3.6 version
   2.1.0
 
+Signal propagation
+------------------
+
+Airflow uses ``dumb-init`` to run as "init" in the entrypoint. This is in order to propagate
+signals and reap child processes properly. This means that the process that you run does not have
+to install signal handlers to work properly and be killed when the container is gracefully terminated.
+The behaviour of signal propagation is configured by ``DUMB_INIT_SETSID`` variable which is set to
+``1`` by default - meaning that the signals will be propagated to the whole process group, but you can
+set it to ``0`` to enable ``single-child`` behaviour of ``dumb-init`` which only propagates the
+signals to only single child process.
+
+The table below summarizes ``DUMB_INIT_SETSID`` possible values and their use cases.
+
++----------------+----------------------------------------------------------------------+
+| Variable value | Use case                                                             |
++----------------+----------------------------------------------------------------------+
+| 1 (default)    | Propagates signals to all processes in the process group of the main |
+|                | process running in the container.                                    |
+|                |                                                                      |
+|                | If you run your processes via ``["bash", "-c"]`` command and bash    |
+|                | spawn  new processes without ``exec``, this will help to terminate   |
+|                | your container gracefully as all processes will receive the signal.  |
++----------------+----------------------------------------------------------------------+
+| 0              | Propagates signals to the main process only.                         |
+|                |                                                                      |
+|                | This is useful if your main process handles signals gracefully.      |
+|                | A good example is warm shutdown of Celery workers. The ``dumb-init`` |
+|                | in this case will only propagate the signals to the main process,    |
+|                | but not to the processes that are spawned in the same process        |
+|                | group as the main one. For example in case of Celery, the main       |
+|                | process will put the worker in "offline" mode, and will wait         |
+|                | until all running tasks complete, and only then it will              |
+|                | terminate all processes.                                             |
+|                |                                                                      |
+|                | For Airflow's Celery worker, you should set the variable to 0        |
+|                | and either use ``["celery", "worker"]`` command.                     |
+|                | If you are running it through ``["bash", "-c"]`` command,            |
+|                | you  need to start the worker via ``exec airflow celery worker``     |
+|                | as the last command executed.                                        |
++----------------+----------------------------------------------------------------------+
+
 Additional quick test options
 -----------------------------