You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Michael A Perez (Jira)" <ji...@apache.org> on 2019/10/01 21:28:00 UTC

[jira] [Commented] (AIRFLOW-5191) SubDag is marked failed

    [ https://issues.apache.org/jira/browse/AIRFLOW-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942326#comment-16942326 ] 

Michael A Perez commented on AIRFLOW-5191:
------------------------------------------

Hello Oliver,

I recently had to address this issue with a DAG that attempts a CloudSQLImport and falls back to a bulk upsert if the import task fails.

Basically, when a task instance fails ([https://github.com/apache/airflow/blob/master/airflow/models/taskinstance.py#L1047]) it announces it and sets it's self to failed. I've found that updating the task after the fact, like in a PythonBranchOperator, doesn't do the trick, however overriding {{on_failure_callback}} does!

The function I pass as my task's {{on_failure_callback}} parameter looks like:
{code:python}
def unfail(context):
     """ Needed to prevent task fail from propagating """
     context['ti'].set_state(state.State.SKIPPED)
{code}


TL;DR you gotta change the failing task's state using it's {{on_failure_callback}} parameter

> SubDag is marked failed 
> ------------------------
>
>                 Key: AIRFLOW-5191
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5191
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, DagRun
>    Affects Versions: 1.10.4
>         Environment: CentOS 7, Maria-DB, python 3.6.7, Airflow 1.10.4
>            Reporter: Oliver Ricken
>            Priority: Blocker
>
> Dear all,
> after having upgraded from Airflow version 1.10.2 to 1.10.4, we experience strange and very problematic behaviour of SubDags (which are crucial for our environment and used frequently).
> Tasks inside the SubDag failing and awaiting retry ("up-for-retry") mark the SubDag "failed" (while in 1.10.2, the SubDag was still in "running"-state). This is particularly problematic for downstream tasks depending on the state of the SubDag. Since we have downstream tasks triggered on "all_done", the downstream task is triggered by the "failed" SubDag although a SubDag-internal task is awaiting retry and might (in our case: most likely) yield successfully processed data. This data is thus not available to the prematurely triggered task downstream of the SubDag.
> This is a severe problem for us and worth rolling back to 1.10.2 if there is no quick solution or work-around to this issue!
> We urgently need help on this matter.
> Thanks allot in advance, any suggestions and input is highly appreciated!
> Cheers
> Oliver



--
This message was sent by Atlassian Jira
(v8.3.4#803005)