You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Stanislav Pak (JIRA)" <ji...@apache.org> on 2017/07/26 00:01:38 UTC

[jira] [Updated] (AIRFLOW-1463) Clear state of pending task when it fails due to DAG import error

     [ https://issues.apache.org/jira/browse/AIRFLOW-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stanislav Pak updated AIRFLOW-1463:
-----------------------------------
    Description: 
Our pipelines related code is deployed almost simultaneously on all airflow boxes: scheduler+webserver box, workers boxes. Some common python package is deployed on those boxes on every other code push (3-5 deployments per hour). Due to installation specifics, a DAG that imports module from that package might fail. If DAG import fails when worker runs a task, the task is still removed from the queue but task state is not changed, so in this case the task stays in PENDING state forever.

Beside the described case, there is scenario when it happens because of DAG update lag in scheduler. A task can be scheduled with old DAG and worker can run the task with new DAG that fails to be imported.

There might be other scenarios when it happens.

Proposal:
Catch errors when importing DAG on task run and clear task instance state if import fails. This should fix transient issues of this kind.


  was:
Our pipelines related code is deployed almost simultaneously on all airflow boxes: scheduler+webserver box, workers boxes. Some common python package is deployed on those boxes on every other code push (3-5 deployments per hour). Due to installation specifics, a DAG that imports module from that package might fail. If DAG import fails when worker runs a task, the task is still removed from the queue but task state is not changed, so in this case the task stays in PENDING state forever.

Beside the described case, there is scenario when it happens because of DAG update lag in scheduler. A task can be scheduler with old DAG and worker can run the task with new DAG that fails to be imported.

There might be other scenarios when it happens.

Proposal:
Catch errors when importing DAG on task run and clear task instance state if import fails. This should fix transient issues of this kind.



> Clear state of pending task when it fails due to DAG import error
> -----------------------------------------------------------------
>
>                 Key: AIRFLOW-1463
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1463
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: cli
>         Environment: Ubuntu 14.04
> Airflow 1.8.0
> SQS backed task queue, AWS RDS backed meta storage
> DAG folder is synced by script on code push: archive is downloaded from s3, unpacked, moved, install script is run. airflow executable is replaced with symlink pointing to the latest version of code, no airflow processes are restarted.
>            Reporter: Stanislav Pak
>            Assignee: Stanislav Pak
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Our pipelines related code is deployed almost simultaneously on all airflow boxes: scheduler+webserver box, workers boxes. Some common python package is deployed on those boxes on every other code push (3-5 deployments per hour). Due to installation specifics, a DAG that imports module from that package might fail. If DAG import fails when worker runs a task, the task is still removed from the queue but task state is not changed, so in this case the task stays in PENDING state forever.
> Beside the described case, there is scenario when it happens because of DAG update lag in scheduler. A task can be scheduled with old DAG and worker can run the task with new DAG that fails to be imported.
> There might be other scenarios when it happens.
> Proposal:
> Catch errors when importing DAG on task run and clear task instance state if import fails. This should fix transient issues of this kind.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)