You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/29 18:58:44 UTC

[GitHub] [airflow] Jorricks opened a new pull request #16718: Slow (cleared) tasks would be adopted by Celery.

Jorricks opened a new pull request #16718:
URL: https://github.com/apache/airflow/pull/16718


   Celery executor is currently adopting anything that has ever run before and has been cleared since then.
   We have a DAG that runs over 200 tasks but has a concurrency of 3 (because of the size of some jobs). Therefore, many tasks should be in scheduled for a bit. However, because of the current implementations, if these tasks ever run before they would get adopted by the schedulers executor instance and become stuck forever (which is fixed in another PR).
   Now this PR prevents the tasks that are in scheduled from being picked up as adopted and furthermore it makes sure that cleared tasks are fully reset.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Jorricks commented on pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
Jorricks commented on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-871959384


   The test failures seem random. Can someone rerun the failing parts :)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-872062875


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-872017983


   @Jorricks  We seem to have general problem with kind tests in GitHub public runners #16736, for the other - you can do it yourself:
   
   1) the best way: rebase to latest main and `git push --force-with-lease` - this will make sure you use latest main code
   2) alternatively (if you are already at main) `git commit --amend` and force push again
   3) finally - close and reopen the PR. This will trigger the rebuild.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#discussion_r661217363



##########
File path: airflow/models/taskinstance.py
##########
@@ -172,6 +172,7 @@ def clear_task_instances(
                 # original max_tries or the last attempted try number.
                 ti.max_tries = max(ti.max_tries, ti.prev_attempted_tries)
             ti.state = State.NONE
+            ti.external_executor_id = None

Review comment:
       Could you check this in the unit tests so we don't regress




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Jorricks edited a comment on pull request #16718: Slow (cleared) tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
Jorricks edited a comment on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-870868296


   ToDo:
   - Fix the failing test
   
   I am also thinking if we can improve the query somehow by checking if the orphaned task was from one of the executors that died. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Jorricks commented on pull request #16718: Slow (cleared) tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
Jorricks commented on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-870868296


   ToDo:
   - Fix the failing test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-872573554


   Have re-runed the test 🤞 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-872573554






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Jorricks edited a comment on pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
Jorricks edited a comment on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-871959384


   The test failures seem random. Can someone rerun the failing parts :)?
   Or point me to the actual logs and show me what is going on?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Jorricks commented on a change in pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
Jorricks commented on a change in pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#discussion_r661236103



##########
File path: airflow/models/taskinstance.py
##########
@@ -172,6 +172,7 @@ def clear_task_instances(
                 # original max_tries or the last attempted try number.
                 ti.max_tries = max(ti.max_tries, ti.prev_attempted_tries)
             ti.state = State.NONE
+            ti.external_executor_id = None

Review comment:
       Good one! Will add it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #16718:
URL: https://github.com/apache/airflow/pull/16718


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Jorricks edited a comment on pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
Jorricks edited a comment on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-871959384


   The test failures seem random. Can someone rerun the failing parts :)?
   I don't fully grasp the failing test. Can anyone explain why we are checking for orphaned processes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Jorricks commented on a change in pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
Jorricks commented on a change in pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#discussion_r661687438



##########
File path: airflow/models/taskinstance.py
##########
@@ -172,6 +172,7 @@ def clear_task_instances(
                 # original max_tries or the last attempted try number.
                 ti.max_tries = max(ti.max_tries, ti.prev_attempted_tries)
             ti.state = State.NONE
+            ti.external_executor_id = None

Review comment:
       Updated the current test to also verify the `external_executor_id` is reset.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-871178372


   > I was thinking if we can improve the query somehow by checking if the orphaned task was from one of the executors that died. What do you think?
   
   We could if we need to - but the reason I didn't is double adoption: if a scheduler adopts tasks and then dies we want all those tasks (it's own and the ones it adopted) to be adopted again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Jorricks commented on pull request #16718: FIX Slow cleared tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
Jorricks commented on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-871204760


   > > I was thinking if we can improve the query somehow by checking if the orphaned task was from one of the executors that died. What do you think?
   > 
   > We could if we need to - but the reason I didn't is double adoption: if a scheduler adopts tasks and then dies we want all those tasks (it's own and the ones it adopted) to be adopted again
   
   Yes alright. It sounds like we should just make sure that cleared tasks are not being picked up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Jorricks edited a comment on pull request #16718: Slow (cleared) tasks would be adopted by Celery.

Posted by GitBox <gi...@apache.org>.
Jorricks edited a comment on pull request #16718:
URL: https://github.com/apache/airflow/pull/16718#issuecomment-870868296


   I was thinking if we can improve the query somehow by checking if the orphaned task was from one of the executors that died. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org