You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/12 18:17:06 UTC

[GitHub] [airflow] joshk-kang opened a new issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

joshk-kang opened a new issue #17585:
URL: https://github.com/apache/airflow/issues/17585


   **Apache Airflow version**: 2.1.2
   
   **Apache Airflow Provider versions** (please include all providers that are relevant to your bug): None are relevant
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): N/A
   
   **Environment**: 
   
   - **Cloud provider or hardware configuration**: 
   - **OS** (e.g. from /etc/os-release):
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**:
   
   **What happened**:
   
   When adding new tasks to an existing dag with `depends_on_past = True`, it will never get scheduled. 
   
   **What you expected to happen**:
   The newly added tasks should be able to run even with `depends_on_past=True`
   
   The reason is that the logic to determine if it is the first task depends on the dag not the task itself https://github.com/apache/airflow/blob/main/airflow/ti_deps/deps/prev_dagrun_dep.py#L58-L65. The code here should be updated so that the first TaskInstance should be allowed to run even if there are previous DagRuns related to that task.
   
   **How to reproduce it**:
   Create a DAG with some tasks and let it have a run. Add another task with `depends_on_past=True` to the existing DAG, and it will never be scheduled because of the following reason `depends_on_past is true for this task's DAG, but the previous task instance has not run yet.`
   
   **Anything else we need to know**: NA
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal edited a comment on issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

Posted by GitBox <gi...@apache.org>.
eladkal edited a comment on issue #17585:
URL: https://github.com/apache/airflow/issues/17585#issuecomment-902014936


   > Add another task with depends_on_past=True to the existing DAG
   
   I think the issue here is that you assume that when you add a new task Airflow knows that it starts from the date you added it thus you expect `depends_on_past=True` to work but this is not the case. You must provide a `start_date` to this new task for the `depends_on_past` to work.
   
   ```
   from airflow import DAG
   from airflow.operators.dummy import DummyOperator
   from datetime import datetime
   
   default_args = {
       'owner': 'Elad',
       'start_date': datetime(2021, 8, 19),
   }
   
   with DAG('my_dag2', default_args=default_args, schedule_interval='*/5 * * * *') as dag:
       a = DummyOperator(task_id="1st")
       b = DummyOperator(task_id="2nd", depends_on_past=True)
       a >> b
   
   with DAG('my_dag3', default_args=default_args, schedule_interval='*/5 * * * *') as dag:
       a = DummyOperator(task_id="1st")
       b = DummyOperator(task_id="2nd", depends_on_past=True, start_date=datetime(2021, 8, 19, 15, 15))
       a >> b
   ```
   
   as you can see `my_dag3` works fine yet `my_dag2` is stuck:
   
   ![Screen Shot 2021-08-19 at 18 29 18](https://user-images.githubusercontent.com/45845474/130097259-52b79d35-0186-446e-a668-aa12f145c642.png)
   
   ![Screen Shot 2021-08-19 at 18 29 11](https://user-images.githubusercontent.com/45845474/130097294-7996b42f-2b24-4db3-abcc-486f15e86f51.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #17585:
URL: https://github.com/apache/airflow/issues/17585#issuecomment-931791087


   This issue has been closed because it has not received response from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #17585:
URL: https://github.com/apache/airflow/issues/17585#issuecomment-926249643


   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17585:
URL: https://github.com/apache/airflow/issues/17585#issuecomment-897864240


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #17585:
URL: https://github.com/apache/airflow/issues/17585


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #17585:
URL: https://github.com/apache/airflow/issues/17585#issuecomment-904864396


   > Thanks @eladkal, I am currently using `start_date=days_ago(1)`, for a DAG that runs every 30 minutes, would you suggest using `start_date=days_ago(0)` instead?
   
   No, I'm suggesting not to use dynamic values in `start_date`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal edited a comment on issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

Posted by GitBox <gi...@apache.org>.
eladkal edited a comment on issue #17585:
URL: https://github.com/apache/airflow/issues/17585#issuecomment-902014936


   > Add another task with depends_on_past=True to the existing DAG
   
   I think the issue here is that you assume that when you add a new task Airflow knows that it starts from the date you added it thus you expect `depends_on_past=True` to work but this is not the case. You must provide a `start_date` to this new task for the `depends_on_past` to work.
   
   ```
   from airflow import DAG
   from airflow.operators.dummy import DummyOperator
   from datetime import datetime
   
   default_args = {
       'owner': 'Elad',
       'start_date': datetime(2021, 8, 19),
   }
   
   with DAG('my_dag2', default_args=default_args, schedule_interval='*/5 * * * *') as dag:
       a = DummyOperator(task_id="1st")
       b = DummyOperator(task_id="2nd", wait_for_downstream=True)
       a >> b
   
   with DAG('my_dag3', default_args=default_args, schedule_interval='*/5 * * * *') as dag:
       a = DummyOperator(task_id="1st")
       b = DummyOperator(task_id="2nd", wait_for_downstream=True, start_date=datetime(2021, 8, 19, 15, 15))
       a >> b
   ```
   
   as you can see `my_dag3` works fine yet `my_dag2` is stuck:
   
   ![Screen Shot 2021-08-19 at 18 29 18](https://user-images.githubusercontent.com/45845474/130097259-52b79d35-0186-446e-a668-aa12f145c642.png)
   
   ![Screen Shot 2021-08-19 at 18 29 11](https://user-images.githubusercontent.com/45845474/130097294-7996b42f-2b24-4db3-abcc-486f15e86f51.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] joshk-kang commented on issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

Posted by GitBox <gi...@apache.org>.
joshk-kang commented on issue #17585:
URL: https://github.com/apache/airflow/issues/17585#issuecomment-904825754


   Thanks @eladkal, I am currently using `start_date=days_ago(1)`, for a DAG that runs every 30 minutes, would you suggest using `start_date=days_ago(0)` instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal edited a comment on issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

Posted by GitBox <gi...@apache.org>.
eladkal edited a comment on issue #17585:
URL: https://github.com/apache/airflow/issues/17585#issuecomment-902014936


   > Add another task with depends_on_past=True to the existing DAG
   
   I think the issue here is that you assume that when you add a new task Airflow knows that it starts from the date you added it thus you expect `depends_on_past=True` to work but this is not the case. You must provide a `start_date` to this new task for the `depends_on_past` to work.
   
   ```
   from airflow import DAG
   from airflow.operators.dummy import DummyOperator
   from datetime import datetime
   
   default_args = {
       'owner': 'Elad',
       'start_date': datetime(2021, 8, 19),
   }
   
   with DAG('my_dag2', default_args=default_args, schedule_interval='*/5 * * * *') as dag:
       a = DummyOperator(task_id="1st")
       b = DummyOperator(task_id="2nd", depends_on_past =True)
       a >> b
   
   with DAG('my_dag3', default_args=default_args, schedule_interval='*/5 * * * *') as dag:
       a = DummyOperator(task_id="1st")
       b = DummyOperator(task_id="2nd", depends_on_past =True, start_date=datetime(2021, 8, 19, 15, 15))
       a >> b
   ```
   
   as you can see `my_dag3` works fine yet `my_dag2` is stuck:
   
   ![Screen Shot 2021-08-19 at 18 29 18](https://user-images.githubusercontent.com/45845474/130097259-52b79d35-0186-446e-a668-aa12f145c642.png)
   
   ![Screen Shot 2021-08-19 at 18 29 11](https://user-images.githubusercontent.com/45845474/130097294-7996b42f-2b24-4db3-abcc-486f15e86f51.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #17585: New Tasks to an existing DAG do not get scheduled if they have depends_on_past = True

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #17585:
URL: https://github.com/apache/airflow/issues/17585#issuecomment-902014936


   > Add another task with depends_on_past=True to the existing DAG
   
   I think the issue here is that you assume that when you add a new task Airflow knows that it starts from the date you added it thus you expect depends_on_past=True to work but this is not the case. You must provide a `start_date` to this new task then it will work as expected.
   
   ```
   from airflow import DAG
   from airflow.operators.dummy import DummyOperator
   from datetime import datetime
   
   default_args = {
       'owner': 'Elad',
       'start_date': datetime(2021, 8, 19),
   }
   
   with DAG('my_dag2', default_args=default_args, schedule_interval='*/5 * * * *') as dag:
       a = DummyOperator(task_id="1st")
       b = DummyOperator(task_id="2nd", wait_for_downstream=True)
       a >> b
   
   with DAG('my_dag3', default_args=default_args, schedule_interval='*/5 * * * *') as dag:
       a = DummyOperator(task_id="1st")
       b = DummyOperator(task_id="2nd", wait_for_downstream=True, start_date=datetime(2021, 8, 19, 15, 15))
       a >> b
   ```
   
   as you can see `my_dag3` works fine yet `my_dag2` is stuck:
   
   ![Screen Shot 2021-08-19 at 18 29 18](https://user-images.githubusercontent.com/45845474/130097259-52b79d35-0186-446e-a668-aa12f145c642.png)
   
   ![Screen Shot 2021-08-19 at 18 29 11](https://user-images.githubusercontent.com/45845474/130097294-7996b42f-2b24-4db3-abcc-486f15e86f51.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org