You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Yati (JIRA)" <ji...@apache.org> on 2018/01/15 11:02:00 UTC

[jira] [Created] (AIRFLOW-2001) Make sensors relinquish their execution slots

Yati created AIRFLOW-2001:
-----------------------------

             Summary: Make sensors relinquish their execution slots
                 Key: AIRFLOW-2001
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2001
             Project: Apache Airflow
          Issue Type: Bug
          Components: db, scheduler
            Reporter: Yati
            Assignee: Yati


A sensor task instance should not take up an execution slot for the entirety of its lifetime (as is currently the case). Indeed, for reasons outlined below, it would be better if sensor execution was preempted by the scheduler by parking it away from the slot till the next poll.

 Some sensors sense for a condition to be true which is affected only by an external party (e.g., materialization by external means of certain rows in a table). By external, I mean external to the Airflow installation in question, such that the producing entity itself does not need an execution slot in an Airflow pool. If all sensors and their dependencies were of this nature, there would be no issue. Unfortunately, a lot of real world DAGs have sensor dependencies on results produced by another task, typically in some other DAG, but scheduled by the same Airflow scheduler.

Consider a simple example (arrow direction represents "must happen before", just like in Airflow): DAG1(a >> b) and DAG2(c:sensor(DAG1.b) >> d). In other words, The opening task c of the second dag has a sensor dependency on the ending task b of the first dag. Imagine we have a single pool with 10 execution slots, and somehow task instances for c fill up the pool, while the corresponding task instances of DAG1.b have not had a chance to execute (in the real world this happens because of, say, back-fills or reprocesses by clearing those sensors instances and their upstream). This is a deadlock situation, since no progress can be made here – the sensors have filled up the pool waiting on tasks that themselves will never get a chance to run. This problem has been [acknowledged here|https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls]

One way (suggested by Fokko) to solve this is to always run sensors on their pool, and to be careful with the concurrency settings of sensor tasks. This is what a lot of users do now, but there are better solutions to this. Since all the sensor interface allows for is a poll, we can, after each poll, "park" the sensor's execution slot and yield it to other tasks. In the above scenario, there would be no "filling up" of the pool by sensors tasks, as they will be polled, determined to be still unfulfilled, and then parked away, thereby giving a chance to other tasks.

This would likely have some changes to the DB, and of course to the scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)