You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Alex Van Boxel <al...@vanboxel.be> on 2017/01/08 10:38:51 UTC

Trigger behaviour with skipped upstream tasks (request for opinions)

This is a moved discussion from a GitHub Pull request (as Bolke suggested):

I'll try to copy paste the discussion as best I can:
*PR*: https://github.com/apache/incubator-airflow/pull/1961

*Original commit message:*
This commit fixes the premature finishing of DAGs when using the
branching operator. The fix is making sure a operator in a DAG
is marked as SKIPPED only when all the upstream operators are skipped.

*Remark for Max:*

I thought the desired behavior was to propagate the skipping so that the
DAG run would end and allow the scheduler to move on. Ultimately your DAG
will stop when concurrent number of DAG runs is reached. Whether the
skipping state propagates or not, user action need to be taken for things
to move forward.

I think it's fair to think that people may expect either behavior. People
may have designed existing DAGs based on how this feature behaves. I think
the main reason I'd prefer the current behavior is that is allows the
scheduler to move forward.

In either case the branching section in the concepts part of the docs
should be more clear about using the trigger rule one_success in the
joining task. It should also be clear about how user should expect the
skipped state to propagate. Thinking about it, the trigger rule all_success in
a joining task is not logical. We could almost raise when/if detecting it.


*Reply from Alex:*

My understanding was that skipped == success, because when a skipped
reaches the end the DAG is marked as success (even with half of my tasks
unfinished). And actually this commit 2630361
<https://github.com/apache/incubator-airflow/commit/2630361ca24737c28f458825b20ab11c9c996b17>convinced
my that I had it right. That's why I wrote this PR.

I understand your point because you actually look at it from the schedulers
point of view, I look at it from the users point of view. I already know 2
people in my team that fell into the trap with join nodes. And documenting
it alone isn't enough, I would then be prepared to change this PR into a raise
when we detect this with ALL_SUCCESS trigger.

Still the first DAG I wrote will never work with the only alternative I
have (ONE_SUCCESS), because the branch operator starts a branch that fans
out to a lot of parallel tasks that all join together to the same node
where the skipped branch arrive. This will never work with the ONE_SUCESS
because as soon as one of the parallel branches succeed the rest will just
stop. I know I can hack around this by introducing a DummyOperator where
all successful branches arrive, but I feel dirty if I need to do this.

So I propose this:

   - I'll change this PR so the ALL_SUCCESS raises and error.
   - Start on another PR that introduces 2 new triggers:
   -- ALL_SUCCESS_OR_SKIPPED
   -- ALL_FAILED_OR_SKIPPED

What do you think?



-- 
  _/
_/ Alex Van Boxel