You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/27 16:12:58 UTC

[GitHub] [airflow] nathadfield opened a new issue #12659: Tasks in DAGs with `depends_on_past` are not being scheduled

nathadfield opened a new issue #12659:
URL: https://github.com/apache/airflow/issues/12659


   **Apache Airflow version**: 1.10.13
   
   **What happened**:
   
   After performing an upgrade to `v1.10.13` we noticed that tasks in some of our DAGs were not be scheduled.  After a bit of investigation we discovered that by commenting out `'depends_on_past': True` the issue went away.
   
   **What you expected to happen**:
   
   We think the issue might have something to do with this which was introduced to `1.10.13`
   
   [AIRFLOW-3607] Only query DB once per DAG run for TriggerRuleDep (#4751)
   
   **How to reproduce it**:
   
   1. Install Airflow v1.10.13 from pip
   2. Start webserver and scheduler
   3. Add the following code as a DAG
   4. Switch the DAG on in the UI.
   
   ```
   from airflow import models
   from airflow.operators.dummy_operator import DummyOperator
   from datetime import datetime, timedelta
   
   default_args = {
       'owner': 'airflow',
       'start_date': datetime(2018, 10, 31),
       'depends_on_past': True,
       'retries': 3,
       'retry_delay': timedelta(minutes=5)
   }
   
   dag_name = 'my-test-dag'
   
   with models.DAG(dag_name,
                   default_args=default_args,
                   schedule_interval='0 0 * * *',
                   catchup=False,
                   max_active_runs=5,
                   ) as dag:
   
       test = DummyOperator(
           task_id='test'
       ) 
   ```
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12659: Tasks in DAGs with `depends_on_past` or `task_concurrency` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734964475


   Closed by #12663 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] nathadfield commented on issue #12659: Tasks in DAGs with `depends_on_past` or `task_concurrency` are not being scheduled

Posted by GitBox <gi...@apache.org>.
nathadfield commented on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734956182


   Nice one @kaxil! Will this force the need for a 1.10.14 then?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12659: Tasks in DAGs with `depends_on_past` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734905108


   On Master, this was fixed by https://github.com/apache/airflow/pull/7402 & further optimised by https://github.com/apache/airflow/pull/7503
   
   in 1.10.13 -- this was clubbed by the following 2 commits:
   
   - https://github.com/apache/airflow/commit/cb8d53fbc64af8d6c175d0dda6ae51db65ccc19b#diff-649fbbf224bab54417f03338c27d0fdb3c3336e53a522a13dfd9806c99f63137
   - https://github.com/apache/airflow/commit/268f1bec389096a9a96a958d69ee5d39793791b8#diff-649fbbf224bab54417f03338c27d0fdb3c3336e53a522a13dfd9806c99f63137
   
   I will investigate this further


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on issue #12659: Tasks in DAGs with `depends_on_past` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734913826


   > Since v1.10.13 we also noticed, for some dags, the tasks are not being scheduled. They stay forever with a None state. Nothing in the scheduler logs (DEBUG level). Running the tasks manually work fine though. In our case, some/most of the dags have indeed `depends_on_past` set to true but not all of them it seems. So maybe this is something different. I will try to investigate deeper and share any relevant info.
   
   Can you check if the other DAGs (not using `depends_on_past` but are still stuck) have `task_concurrency` set? @mthoretton 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12659: Tasks in DAGs with `depends_on_past` or `task_concurrency` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734956335


   > Nice one @kaxil! Will this force the need for a 1.10.14 then?
   
   Yup, indeed. I hope to get it out by next week


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12659: Tasks in DAGs with `depends_on_past` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734910752


   @nathadfield has confirmed the issue does not exist on 2.0.0b3


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12659: Tasks in DAGs with `depends_on_past` or `task_concurrency` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734955267


   https://github.com/apache/airflow/pull/12663 should fix it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on issue #12659: Tasks in DAGs with `depends_on_past` or `task_concurrency` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734955267


   https://github.com/apache/airflow/pull/12663 should fix it @nathadfield @mthoretton


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on issue #12659: Tasks in DAGs with `depends_on_past` or `task_concurrency` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734956335


   > Nice one @kaxil! Will this force the need for a 1.10.14 then?
   
   Yup, indeed. I hope to get it out by early next week


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #12659: Tasks in DAGs with `depends_on_past` or `task_concurrency` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #12659:
URL: https://github.com/apache/airflow/issues/12659


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mthoretton commented on issue #12659: Tasks in DAGs with `depends_on_past` are not being scheduled

Posted by GitBox <gi...@apache.org>.
mthoretton commented on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734912848


   Since v1.10.13 we also noticed, for some dags, the tasks are not being scheduled. They stay forever with a None state. Nothing in the scheduler logs (DEBUG level). Running the tasks manually work fine though. In our case, some/most of the dags have indeed `depends_on_past` set to true but not all of them it seems. So maybe this is something different. I will try to investigate deeper and share any relevant info.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mthoretton commented on issue #12659: Tasks in DAGs with `depends_on_past` are not being scheduled

Posted by GitBox <gi...@apache.org>.
mthoretton commented on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734916448


   I did not check absolutely all dags we have but yes, the "broken" dags either have `depends_on_past` or `task_concurrency` set. I was not aware of the 2 issues you mentionened above, I will have a look but it definitely looks related.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12659: Tasks in DAGs with `depends_on_past` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734913826


   > Since v1.10.13 we also noticed, for some dags, the tasks are not being scheduled. They stay forever with a None state. Nothing in the scheduler logs (DEBUG level). Running the tasks manually work fine though. In our case, some/most of the dags have indeed `depends_on_past` set to true but not all of them it seems. So maybe this is something different. I will try to investigate deeper and share any relevant info.
   
   Can you check if the other DAGs (not using `depends_on_past` but are still stuck) have `task_concurrency` set?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12659: Tasks in DAGs with `depends_on_past` or `task_concurrency` are not being scheduled

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12659:
URL: https://github.com/apache/airflow/issues/12659#issuecomment-734921519


   I can confirm the bug. I was able to reproduce it with task with `task_concurrency` or `depends_on_past` with `LocalExecutor` and the following DAG:
   
   ```python
   from airflow import models
   from airflow.operators.dummy_operator import DummyOperator
   from airflow.operators.bash_operator import BashOperator
   from datetime import datetime, timedelta
   
   default_args = {
       'owner': 'airflow',
       'start_date': datetime(2018, 10, 31),
       'retries': 3,
       'retry_delay': timedelta(minutes=5)
   }
   
   dag_name = 'dag-bugcheck'
   
   with models.DAG(dag_name,
                   default_args=default_args,
                   schedule_interval='0 0 * * *',
                   catchup=False,
                   max_active_runs=5,
                   ) as dag:
   
       test1 = DummyOperator(
           task_id='test1',
           task_concurrency=10,
       )
   
       test2 = BashOperator(
           task_id='test2',
           bash_command='echo hi',
           depends_on_past=True,
       )
   
       test3 = BashOperator(
           task_id='test3',
           bash_command='echo hi',
       )
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org