You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Maxime Beauchemin <ma...@gmail.com> on 2018/07/01 20:01:16 UTC

Re: Deprecating Run task from Airflow webUI

Few thoughts:
* in our environment at Lyft, cleared tasks do get picked up by the
scheduler. Is there an issue opened for the bug you are referring to? is
that on 1.9.0?
* "clearing" in both the web ui and CLI also flips the DagRun state back to
running as the intent of clearing is usually to get the scheduler to pick
it up, there may be caveats when the DagRun doesn't exist, or is a DagRun
of the "backfill" type. Maybe `clear` should make sure that there's a
proper DagRun
* "clear" is confusing as a verb to use when the actual intent is to
reprocess...
* another [backward compatible] option is to put the feature behind a
feature flag and set that flag to False in your environment

Whether the web server and the upcoming REST API should be able to talk to
an executor is bigger question. We may want Airflow to allow running
arbitrary DAG (through the upcoming DagFetcher abstraction) on arbitrary
executors, and the REST API (web server) may need to communicate with the
executor for that purpose. Though that's far enough ahead that it's mostly
unrelated to your current concern.

I'll go a bit further and say that eventually we should have a way to run
local, "in-development" DAGs on a remote executor in an adhoc fashion. In
next-gen Airflow that would go through publishing a DAG to the DagFetcher
abstraction (say git://{org}/{repo}/{gitref_say_a_branch}/{path/to/dag.py}
), and run say `airflow test` or `airflow backfill` through the REST API
and get that to run remote on k8s (through k8sexecutor) for instance. I
think this may require the REST api talking to the executor.

Max

On Fri, Jun 29, 2018 at 8:03 PM Feng Lu <fe...@google.com.invalid> wrote:

> Re-attaching the image..
>
> On Fri, Jun 29, 2018 at 4:54 PM Feng Lu <fe...@google.com> wrote:
>
>> Hi all,
>>
>> Please take a look at our proposal to deprecate Run task in Airflow
>> webUI.
>>
>> *What?*
>> Deprecate Run task support in the Airflow webUI and make it a no-op for
>> now.
>>
>> [image: xGAcOrcLJE4.png]
>> ​
>> *Why?*
>>
>>    1. It only works with CeleryExecutor
>>    <https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L1001-L1003>
>>    and renders an inconsistent experience for other types of Executors.
>>    2. It requires Airflow webserver to have direct connection with the
>>    message backend of CeleryExecutor, and opens more vulnerability in the
>>    system. In many cases, users may want to restrict access to the celery
>>    messaging backend as much as possible.
>>
>> *Mitigation:*
>> This Run task feature is mainly for the purpose of re-executing of a
>> previously running task which got stuck in running and deleted manually.
>> It's currently a two step process:
>> 1. Navigate to the task instance view page and delete the running task
>> that's stuck
>> 2. Go back to DAG/task view and click "Run"
>>
>> We proposed to combine the two steps, after a running task is deleted,
>> the Airflow scheduler will automatically re-schedule (which it does today)
>> and re-queue the task (there's a bug that needs to be fixed).
>>
>> *Fix:*
>> The scheduler currently doesn't not automatically re-queue the task
>> despite the task instance has changed from running to scheduled state. The
>> heartbeat check incorrectly returns a success in this case. The root cause
>> is that LocalTaskJob doesn't set the job state to failed (details
>> <https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L2674-L2681>)
>> when a running task is externally deleted and confuses the heartbeat
>> check
>> <https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L443>
>> .
>> Once this is fixed, a killed running task instance will be
>> auto-scheduled/enqueued for execution, verified locally.
>>
>> Thank you.
>>
>> Feng
>>
>>
>>

Re: Deprecating Run task from Airflow webUI

Posted by Feng Lu <fe...@google.com.INVALID>.
Thank you Max, kindly see my reply inline:
On Sun, Jul 1, 2018 at 1:01 PM Maxime Beauchemin <ma...@gmail.com>
wrote:

> Few thoughts:
> * in our environment at Lyft, cleared tasks do get picked up by the
> scheduler. Is there an issue opened for the bug you are referring to? is
> that on 1.9.0?
> * "clearing" in both the web ui and CLI also flips the DagRun state back
> to running as the intent of clearing is usually to get the scheduler to
> pick it up, there may be caveats when the DagRun doesn't exist, or is a
> DagRun of the "backfill" type. Maybe `clear` should make sure that there's
> a proper DagRun
> * "clear" is confusing as a verb to use when the actual intent is to
> reprocess...
> * another [backward compatible] option is to put the feature behind a
> feature flag and set that flag to False in your environment
>
We reproduced the bug in 1.9 and it appears that this will be an issue for
master as well (will confirm).

>
> Whether the web server and the upcoming REST API should be able to talk to
> an executor is bigger question. We may want Airflow to allow running
> arbitrary DAG (through the upcoming DagFetcher abstraction) on arbitrary
> executors, and the REST API (web server) may need to communicate with the
> executor for that purpose. Though that's far enough ahead that it's mostly
> unrelated to your current concern
>

> I'll go a bit further and say that eventually we should have a way to run
> local, "in-development" DAGs on a remote executor in an adhoc fashion. In
> next-gen Airflow that would go through publishing a DAG to the DagFetcher
> abstraction (say git://{org}/{repo}/{gitref_say_a_branch}/{path/to/dag.py}
> ), and run say `airflow test` or `airflow backfill` through the REST API
> and get that to run remote on k8s (through k8sexecutor) for instance. I
> think this may require the REST api talking to the executor.
>
> +1 to the idea of DAG testing/execution in a remote Airflow setup.
To implement the remote testing/backfll as a RESTful API, it requires us to
define the corresponding resource collection/object which will exist in the
metadata database.
The executor can next start DAG runs based on resource updates in the
metadata database.
It seems that we can decouple API server and executor this way.

Max
>
> On Fri, Jun 29, 2018 at 8:03 PM Feng Lu <fe...@google.com.invalid> wrote:
>
>> Re-attaching the image..
>>
>> On Fri, Jun 29, 2018 at 4:54 PM Feng Lu <fe...@google.com> wrote:
>>
>>> Hi all,
>>>
>>> Please take a look at our proposal to deprecate Run task in Airflow
>>> webUI.
>>>
>>> *What?*
>>> Deprecate Run task support in the Airflow webUI and make it a no-op for
>>> now.
>>>
>>> [image: xGAcOrcLJE4.png]
>>> ​
>>> *Why?*
>>>
>>>    1. It only works with CeleryExecutor
>>>    <https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L1001-L1003>
>>>    and renders an inconsistent experience for other types of Executors.
>>>    2. It requires Airflow webserver to have direct connection with the
>>>    message backend of CeleryExecutor, and opens more vulnerability in the
>>>    system. In many cases, users may want to restrict access to the celery
>>>    messaging backend as much as possible.
>>>
>>> *Mitigation:*
>>> This Run task feature is mainly for the purpose of re-executing of a
>>> previously running task which got stuck in running and deleted manually.
>>> It's currently a two step process:
>>> 1. Navigate to the task instance view page and delete the running task
>>> that's stuck
>>> 2. Go back to DAG/task view and click "Run"
>>>
>>> We proposed to combine the two steps, after a running task is deleted,
>>> the Airflow scheduler will automatically re-schedule (which it does today)
>>> and re-queue the task (there's a bug that needs to be fixed).
>>>
>>> *Fix:*
>>> The scheduler currently doesn't not automatically re-queue the task
>>> despite the task instance has changed from running to scheduled state. The
>>> heartbeat check incorrectly returns a success in this case. The root cause
>>> is that LocalTaskJob doesn't set the job state to failed (details
>>> <https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L2674-L2681>)
>>> when a running task is externally deleted and confuses the heartbeat
>>> check
>>> <https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L443>
>>> .
>>> Once this is fixed, a killed running task instance will be
>>> auto-scheduled/enqueued for execution, verified locally.
>>>
>>> Thank you.
>>>
>>> Feng
>>>
>>>
>>>