You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Qian Yu (Jira)" <ji...@apache.org> on 2019/11/03 10:11:00 UTC

[jira] [Commented] (AIRFLOW-2279) Clearing Tasks Across DAGs

    [ https://issues.apache.org/jira/browse/AIRFLOW-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965639#comment-16965639 ] 

Qian Yu commented on AIRFLOW-2279:
----------------------------------

I'm in the process of working on something I call ClearTaskOperator. In fact, I already have a working PR. I was using it for a different purpose (rerunning completed tasks). However, the exact same operator can be used to achieve the intention of [~asoni-stripe] .

[https://github.com/apache/airflow/pull/6392]

 

For example, if we want to make sure when task_A on dag1 is cleared, it always clears a task_B on a dag2, we can do make the dag1 look like this:
{code:python}
task_A >> clear_Task_B
{code}
 

where
{code:python}
clear_Task_B = ClearTaskOperator(external_task_id="task_B", external_dag_id="dag2")
{code}
So the change is actually very simple, just adding a new operator, without even touching the core Airflow code.

I saw [~gsilk] has an interesting requirement that I did not consider when coming up with the PR, that is to limit the number tasks cleared at the same time. But adding this is trivial because it is just an additional check in the execute() function.

And some other improvements I can think of for the PR is to make it behave like an ExternalTaskSensor, i.e. it only clears the target tasks if they are done. If the target tasks are still running, it'll just wait and reschedule itself for a later time to try clearing them again.

If there are interest in this new operator or these additional features, pls mention that on my PR and I'll continue to develop it to make it better and hopefully get more support from upstream.

> Clearing Tasks Across DAGs
> --------------------------
>
>                 Key: AIRFLOW-2279
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2279
>             Project: Apache Airflow
>          Issue Type: Improvement
>            Reporter: Achal Soni
>            Priority: Major
>         Attachments: cross_dag_ui_screenshot.png
>
>
> At Stripe, we commonly have discrete dags that depend on each other by leveraging ExternalTaskSensors. We also find ourselves routinely wanting to not only clear tasks and their downstream tasks in a particular dag, but also their downstream tasks in their dependent dags (linked by ExternalTaskSensors). 
> We currently have extended Airflow to handle this by modifying the webapp and cli tool to optionally clear dependent tasks across multiple dags (see attached screenshot). 
> We want to open the floor for discussion with the larger Airflow community about the usage of ExternalTaskSensors and specifically how to handle clearing across dags. We are interested in learning more about the accepted practices in this regard, and are very open/willing to contribute in this area if there is interest!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)