You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/24 09:54:50 UTC

[GitHub] [airflow] KulykDmytro opened a new issue #11819: LatestOnlyOperator cascades skipped status despite of trigger_rule set

KulykDmytro opened a new issue #11819:
URL: https://github.com/apache/airflow/issues/11819


   **Apache Airflow version**: 1.10.12
   **Kubernetes version**: 1.18.6
   **What happened**:
   Forwarded from https://github.com/apache/airflow/issues/10686
   In case when LatestOnlyOperator set as upstream all downstreams tasks are being skipped despite of `trigger_rule` set to any of `all_done`, `none_failed`, `none_failed_or_skipped`. It behaves same as `all_success` which is contradicts with [documentation](https://airflow.apache.org/docs/stable/concepts.html?highlight=branch#latest-run-only)
   
   ```python
       t_ready = DummyOperator(
           task_id = 'calc_ready',
           trigger_rule = 'none_failed',
           dag=dag)
   ```
   ![image](https://user-images.githubusercontent.com/34435869/97078846-c91a9500-15f7-11eb-8328-1f37730fb650.png)
   
   PS: This is a repeating issue which seems to be fixed with AIRFLOW-4453 has been returned back (at least in 1.10.12)
   PR: #7464
   
   **What you expected to happen**:
   Behavior should correspond to [documentation](https://airflow.apache.org/docs/stable/concepts.html?highlight=branch#latest-run-only) and not cascade `skipped` status as per [here](https://airflow.apache.org/docs/stable/concepts.html?highlight=branch#trigger-rules)
   ```
   Skipped tasks will cascade through trigger rules `all_success` and `all_failed` but not `all_done`, `one_failed`, `one_success`, `none_failed`, `none_failed_or_skipped`, `none_skipped` and `dummy`. 
   ```
   **How to reproduce it**:
   ```python
   import datetime as dt
   
   from airflow.models import DAG
   from airflow.operators.dummy_operator import DummyOperator
   from airflow.operators.latest_only_operator import LatestOnlyOperator
   from airflow.utils.dates import days_ago
   #from airflow.utils.trigger_rule import TriggerRule
   
   dag = DAG(
       dag_id='latest_only_with_trigger',
       schedule_interval=dt.timedelta(hours=4),
       start_date=days_ago(2),
       tags=['example']
   )
   
   latest_only = LatestOnlyOperator(task_id='latest_only', dag=dag)
   task0 = DummyOperator(task_id='task0', dag=dag)
   task1 = DummyOperator(task_id='task1', dag=dag)
   task2 = DummyOperator(task_id='task2', dag=dag)
   
   task0 >> [task1, task2]
   latest_only >> task1 
   tr_list =  ['all_done', 'none_failed', 'none_failed_or_skipped']
   
   for tr in tr_list:
       taska = DummyOperator(dag=dag, task_id=f'taska_{tr}', trigger_rule=tr)
       taskb = DummyOperator(task_id=f'taskb_{tr_list.index(tr)}', dag=dag)
       taskc = DummyOperator(task_id=f'taskc_{tr_list.index(tr)}', dag=dag)
   
       task1 >> [taska, taskb] >> taskc
       task2 >> [taska, taskb]
   ```
   https://user-images.githubusercontent.com/34435869/96783110-18917300-13f6-11eb-927c-331d8c22dd72.png
   
   **Anything else we need to know**:
   Anyhow behavior not corresponds with [documentation ](https://airflow.apache.org/docs/stable/concepts.html?highlight=branch#latest-run-only)
   Even using a code snippet mentioned there receiving non-expected result (task4 being skipped)
   ![image](https://user-images.githubusercontent.com/34435869/97078879-1139b780-15f8-11eb-9a90-be03fd2c3f07.png)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on issue #11819: LatestOnlyOperator cascades skipped status despite of trigger_rule set

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on issue #11819:
URL: https://github.com/apache/airflow/issues/11819#issuecomment-720878678


   Just realised this issue is already fixed in `master` branch. @KulykDmytro would you like to try 2.0.0a1? It won't have this problem because `LatestOnlyOperator` has been changed to a subclass of `BaseBranchOperator` which should do the right thing when skipping downstream tasks. Since 2.0.0 is around the corner, I don't know if it's still worth fixing this issue in the 1.10 branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #11819: LatestOnlyOperator cascades skipped status despite of trigger_rule set

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #11819:
URL: https://github.com/apache/airflow/issues/11819#issuecomment-723583993


   @yuqian90 is this an easy fix to backport it to 1.10? Even if 2.0 is around the corner this is a functionality bug that might cause DAGs not to work as expected. Also not all organizations will move quickly to 2.0 it will take time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on issue #11819: LatestOnlyOperator cascades skipped status despite of trigger_rule set

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on issue #11819:
URL: https://github.com/apache/airflow/issues/11819#issuecomment-726766782


   > @yuqian90 is this an easy fix to backport it to 1.10? Even if 2.0 is around the corner this is a functionality bug that might cause DAGs not to work as expected. Also not all organizations will move quickly to 2.0 it will take time.
   
   I'd love to help but don't have time at the moment. I can see the change needed is rather small. To achieve what you want, you can make this one line change to `latest_only_operator.py` and use it as a custom operator:
   
   ```python
   diff --git a/airflow/operators/latest_only_operator.py b/airflow/operators/latest_only_operator.py
   index c95ceacbd..68ce083a9 100644
   --- a/airflow/operators/latest_only_operator.py
   +++ b/airflow/operators/latest_only_operator.py
   @@ -55,7 +55,7 @@ class LatestOnlyOperator(BaseOperator, SkipMixin):
            if not left_window < now <= right_window:
                self.log.info('Not latest execution, skipping downstream.')
   
   -            downstream_tasks = context['task'].get_flat_relatives(upstream=False)
   +            downstream_tasks = context['task'].get_direct_relatives(upstream=False)
                self.log.debug("Downstream task_ids %s", downstream_tasks)
   
                if downstream_tasks:
   ```
   
   Or alternatively, if you don't mind putting up a PR to cherry-pick this change from master branch and apply it against v1-10-test branch, that'll be even better: https://github.com/apache/airflow/pull/5778


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org