You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/11 17:36:41 UTC

[GitHub] [airflow] SamWheating opened a new pull request #20816: Fixing bug in scheduler_job which could cause a pool to become blocked

SamWheating opened a new pull request #20816:
URL: https://github.com/apache/airflow/pull/20816


   While investigating another issue, I found what appears to be a bug introduced in https://github.com/apache/airflow/pull/20178 in the `_executable_task_instances_to_queued` function of the scheduler job. 
   
   Here is the affected piece of code:
   https://github.com/apache/airflow/blob/c9023fad4287213e4d3d77f4c66799c762bff7ba/airflow/jobs/scheduler_job.py#L335-L351
   
   So we're iterating through a list of `(pool_name: string, task_instances: List[TaskInstance])` pairs, but then checking the value of `task_instance.pool_slots`, which is not explicitly set anywhere.
   
   This means that we actually end up using the value set implicitly by this loop earlier in the function:
   https://github.com/apache/airflow/blob/c9023fad4287213e4d3d77f4c66799c762bff7ba/airflow/jobs/scheduler_job.py#L318-L319
   
   And because of the `continue`, if this task happens to require more slots than the specified pool can provide, then all of the scheduled tasks in this pool will be skipped over. 
   
   Anyways, I've refactored this check such that it checks every TI in `task_instances` individually and simply removes any non-executable TIs from the list of candidates tasks to be queued. I've updated the test accordingly. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr merged pull request #20816: Fix task instances iteration in a pool to prevent blocking

Posted by GitBox <gi...@apache.org>.
uranusjr merged pull request #20816:
URL: https://github.com/apache/airflow/pull/20816


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating commented on pull request #20816: Fixing bug in scheduler_job which could cause a pool to become blocked

Posted by GitBox <gi...@apache.org>.
SamWheating commented on pull request #20816:
URL: https://github.com/apache/airflow/pull/20816#issuecomment-1010275557


   I don't think that these test failures are related to my changes 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #20816: Fixing bug in scheduler_job which could cause a pool to become blocked

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #20816:
URL: https://github.com/apache/airflow/pull/20816#issuecomment-1010316555


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20816: Fixing bug in scheduler_job which could cause a pool to become blocked

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20816:
URL: https://github.com/apache/airflow/pull/20816#issuecomment-1010278992


   > I don't think that these test failures are related to my changes 🤔
   
   Looks like machine was evicted while running the task. you were just unlucky :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org