You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/22 14:15:30 UTC

[GitHub] [airflow] ashb opened a new issue #15488: Dynamic DAGs that disappear end up stuck in queued state.

ashb opened a new issue #15488:
URL: https://github.com/apache/airflow/issues/15488


   I can observe the same problem with version 2.0.2:
   
   * Tasks fail, because a DAG/task has gone missing (we are using dynamically created DAGs, and they can go missing)
   * The scheduler keeps those queued
   * The pool gradually fills up with these queued tasks
   * The whole operation stops, because of this behaviour
   
   My current remedy:
   
   * Manually remove those queued tasks
   
   My desired solution:
   
   When a DAG/task goes missing while it is queued, it should end up in a failed state.
   
   _Originally posted by @lukas-at-harren in https://github.com/apache/airflow/issues/13542#issuecomment-824575868_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on issue #15488: Dynamic DAGs that disappear end up stuck in queued state.

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #15488:
URL: https://github.com/apache/airflow/issues/15488#issuecomment-850350245


   @lukas-at-harren  @alokgarg5 Can you try it with Master as suggested by @ephraimbuddy please and report back please if possible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on issue #15488: Dynamic DAGs that disappear end up stuck in queued state.

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #15488:
URL: https://github.com/apache/airflow/issues/15488#issuecomment-850350245


   @lukas-at-harren  @alokgarg5 Can you try it with Master as suggested by @ephraimbuddy ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alokgarg5 commented on issue #15488: Dynamic DAGs that disappear end up stuck in queued state.

Posted by GitBox <gi...@apache.org>.
alokgarg5 commented on issue #15488:
URL: https://github.com/apache/airflow/issues/15488#issuecomment-825185515


   We are facing the same issue, I am using celery executor, upgraded to version 2.0.2 , tasks do not get executed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #15488: Dynamic DAGs that disappear end up stuck in queued state.

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #15488:
URL: https://github.com/apache/airflow/issues/15488


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #15488: Dynamic DAGs that disappear end up stuck in queued state.

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #15488:
URL: https://github.com/apache/airflow/issues/15488#issuecomment-850001116


   I reproduced this in 2.1.0 by removing a dag while it's queued. Set default pool to 9. 
   Below is the dag I used and to reproduce it, you need to trigger the dag multiple times and then remove the dag.
   
   ```python
   
   from datetime import timedelta
   
   from airflow import DAG
   from airflow.operators.bash import BashOperator
   from airflow.operators.dummy import DummyOperator
   from airflow.utils.dates import days_ago
   
   args = {
       'owner': 'airflow',
   }
   
   with DAG(
       dag_id='example_bash_operator',
       default_args=args,
       schedule_interval='0 0 * * *',
       start_date=days_ago(2),
       dagrun_timeout=timedelta(minutes=60),
       params={"example_key": "example_value"},
   ) as dag:
   
       run_this_last = DummyOperator(
           task_id='run_this_last',
       )
   
       # [START howto_operator_bash]
       run_this = BashOperator(
           task_id='run_after_loop',
           bash_command='echo 1',
       )
       # [END howto_operator_bash]
   
       run_this >> run_this_last
   
       for i in range(7):
           task = BashOperator(
               task_id='runme_' + str(i),
               bash_command='echo "{{ task_instance_key_str }}" && sleep 30',
           )
           task >> run_this
   
       # [START howto_operator_bash_template]
       also_run_this = BashOperator(
           task_id='also_run_this',
           bash_command='echo "run_id={{ run_id }} | dag_run={{ dag_run }}"',
       )
       # [END howto_operator_bash_template]
       also_run_this >> run_this_last
   
   # [START howto_operator_bash_skip]
   this_will_skip = BashOperator(
       task_id='this_will_skip',
       bash_command='echo "hello world"; exit 99;',
       dag=dag,
   )
   # [END howto_operator_bash_skip]
   this_will_skip >> run_this_last
   
   ```
   Notice that when you remove the dag, queued tasks remain queued and the executor fails.
   logs:
   ```log
   [2021-05-27 21:22:48,902] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_4 execution_date=2021-05-26 00:00:00+00:00 exited with status failed for try_number 1
   [2021-05-27 21:22:48,902] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_6 execution_date=2021-05-26 00:00:00+00:00 exited with status failed for try_number 1
   [2021-05-27 21:22:48,902] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_2 execution_date=2021-05-26 00:00:00+00:00 exited with status failed for try_number 1
   [2021-05-27 21:22:48,902] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_3 execution_date=2021-05-26 00:00:00+00:00 exited with status failed for try_number 1
   [2021-05-27 21:22:48,902] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.also_run_this execution_date=2021-05-26 00:00:00+00:00 exited with status failed for try_number 1
   [2021-05-27 21:22:48,902] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.run_after_loop execution_date=2021-05-25 00:00:00+00:00 exited with status failed for try_number 1
   [2021-05-27 21:22:48,902] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_5 execution_date=2021-05-26 00:00:00+00:00 exited with status failed for try_number 1
   [2021-05-27 21:22:48,903] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_0 execution_date=2021-05-26 00:00:00+00:00 exited with status success for try_number 1
   [2021-05-27 21:22:48,903] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_1 execution_date=2021-05-26 00:00:00+00:00 exited with status success for try_number 1
   [2021-05-27 21:22:48,909] {scheduler_job.py:1244} ERROR - Executor reports task instance <TaskInstance: example_bash_operator.run_after_loop 2021-05-25 00:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
   [2021-05-27 21:22:48,910] {scheduler_job.py:1244} ERROR - Executor reports task instance <TaskInstance: example_bash_operator.runme_2 2021-05-26 00:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
   [2021-05-27 21:22:48,910] {scheduler_job.py:1244} ERROR - Executor reports task instance <TaskInstance: example_bash_operator.runme_3 2021-05-26 00:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
   [2021-05-27 21:22:48,910] {scheduler_job.py:1244} ERROR - Executor reports task instance <TaskInstance: example_bash_operator.runme_4 2021-05-26 00:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
   [2021-05-27 21:22:48,910] {scheduler_job.py:1244} ERROR - Executor reports task instance <TaskInstance: example_bash_operator.runme_5 2021-05-26 00:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
   [2021-05-27 21:22:48,911] {scheduler_job.py:1244} ERROR - Executor reports task instance <TaskInstance: example_bash_operator.runme_6 2021-05-26 00:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
   [2021-05-27 21:22:48,911] {scheduler_job.py:1244} ERROR - Executor reports task instance <TaskInstance: example_bash_operator.also_run_this 2021-05-26 00:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
   [2021-05-27 21:22:48,922] {file_processor_handler.py:123} WARNING - /root/airflow/logs/scheduler/latest already exists as a dir/file. Skip creating symlink.
   [2021-05-27 21:22:48,966] {dagbag.py:487} INFO - Filling up the DagBag from /files/dags/example_bash.py
   [2021-05-27 21:22:48,968] {local_executor.py:127} ERROR - Failed to execute task dag_id could not be found: example_bash_operator. Either the dag did not exist or it failed to parse..
   [2021-05-27 21:22:48,975] {dagbag.py:487} INFO - Filling up the DagBag from /files/dags/example_bash.py
   [2021-05-27 21:22:48,978] {local_executor.py:127} ERROR - Failed to execute task dag_id could not be found: example_bash_operator. Either the dag did not exist or it failed to parse..
   [2021-05-27 21:22:49,976] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_1 execution_date=2021-05-27 21:22:21.964648+00:00 exited with status failed for try_number 1
   [2021-05-27 21:22:49,976] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.this_will_skip execution_date=2021-05-26 00:00:00+00:00 exited with status failed for try_number 1
   [2021-05-27 21:22:49,980] {scheduler_job.py:1244} ERROR - Executor reports task instance <TaskInstance: example_bash_operator.this_will_skip 2021-05-26 00:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
   [2021-05-27 21:22:49,980] {scheduler_job.py:1244} ERROR - Executor reports task instance <TaskInstance: example_bash_operator.runme_1 2021-05-27 21:22:21.964648+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
   ```
   ![Screenshot 2021-05-27 235108](https://user-images.githubusercontent.com/4122866/119906619-7535ad00-bf46-11eb-8006-d01cab7a919b.png)
   ![Screenshot 2021-05-27 235218](https://user-images.githubusercontent.com/4122866/119906710-a0200100-bf46-11eb-94b6-da2b48250929.png)
   
   I reproduced this with LocalExecutor. Just make sure you triggered the dag multiple times before removing the dag and you will reproduce this. 
   
   If you trigger another different dag, nothing is executing. You will see this from the log:
   ```log
   [2021-05-27 21:27:01,338] {scheduler_job.py:1837} INFO - Resetting orphaned tasks for active dag runs
   [2021-05-27 21:27:01,374] {scheduler_job.py:1904} INFO - Reset the following 35 orphaned TaskInstances:
         <TaskInstance: example_bash_operator.runme_0 2021-05-27 21:22:21.964648+00:00 [scheduled]>
         <TaskInstance: example_bash_operator.runme_3 2021-05-27 21:22:23.739773+00:00 [scheduled]>
   ```
   
   This no longer happens in master because of this fix https://github.com/apache/airflow/pull/15929. 
   This is the log in master:
   ```log
   [2021-05-27 21:15:20,964] {local_executor.py:127} ERROR - Failed to execute task dag_id could not be found: example_bash_operator. Either the dag did not exist or it failed to parse..
   [2021-05-27 21:15:21,896] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_6 execution_date=2021-05-27 21:14:49.905160+00:00 exited with status failed for try_number 1
   [2021-05-27 21:15:21,896] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_5 execution_date=2021-05-27 21:14:49.905160+00:00 exited with status failed for try_number 1
   [2021-05-27 21:15:21,896] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_4 execution_date=2021-05-27 21:14:49.905160+00:00 exited with status failed for try_number 1
   [2021-05-27 21:15:21,896] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.this_will_skip execution_date=2021-05-27 21:14:47.796058+00:00 exited with status failed for try_number 1
   [2021-05-27 21:15:21,897] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_3 execution_date=2021-05-27 21:14:49.905160+00:00 exited with status failed for try_number 1
   [2021-05-27 21:15:21,897] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_2 execution_date=2021-05-27 21:14:49.905160+00:00 exited with status failed for try_number 1
   [2021-05-27 21:15:21,897] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_0 execution_date=2021-05-27 21:14:49.905160+00:00 exited with status failed for try_number 1
   [2021-05-27 21:15:21,897] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.also_run_this execution_date=2021-05-27 21:14:49.905160+00:00 exited with status failed for try_number 1
   [2021-05-27 21:15:21,897] {scheduler_job.py:1215} INFO - Executor reports execution of example_bash_operator.runme_1 execution_date=2021-05-27 21:14:49.905160+00:00 exited with status failed for try_number 1
   [2021-05-27 21:15:21,903] {scheduler_job.py:1244} ERROR - Executor reports task instance <TaskInstance: example_bash_operator.runme_6 2021-05-27 21:14:49.905160+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
   [2021-05-27 21:15:21,903] {scheduler_job.py:1251} INFO - Setting task instance <TaskInstance: example_bash_operator.runme_6 2021-05-27 21:14:49.905160+00:00 [queued]> state to failed as reported by executor
   ```
   ![Screenshot 2021-05-27 234108](https://user-images.githubusercontent.com/4122866/119906128-8500c180-bf45-11eb-947e-cf2e93f11826.png)
   
   And executor continues to execute other dags when triggered.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #15488: Dynamic DAGs that disappear end up stuck in queued state.

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #15488:
URL: https://github.com/apache/airflow/issues/15488#issuecomment-850350245


   @lukas-at-harren  @alokgarg5 Can you try it with Master as recommended by @ephraimbuddy ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #15488: Dynamic DAGs that disappear end up stuck in queued state.

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #15488:
URL: https://github.com/apache/airflow/issues/15488#issuecomment-887447566


   Closed by https://github.com/apache/airflow/pull/15929 and will be released in 2.1.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org