You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Raul824 (via GitHub)" <gi...@apache.org> on 2023/11/09 11:32:01 UTC

[I] Airflow scheduler restart casuing up_for_retry jobs also to be marked as failed. [airflow]

Raul824 opened a new issue, #35550:
URL: https://github.com/apache/airflow/issues/35550

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   We have jobs with retry which upon failing to run has a retry setup.
   Those include external task sensor and python operator job.
   
   When there is a load of multiple jobs, keda auto scaling up comes in picture and during that up scaling sometimes scheduler restarts which is fine but this scheduler restart marks some of the jobs as failed instead of up_for_retry. There are no logs generated and the failed state task doesn't even count it as an attempt as when we rerun it shows 1st attempt of 100 (which is total number of retries)
   
   
   ### What you think should happen instead
   
   There should be a log for the failure or at-least the job should go for in up_for_retry status.
   
   ### How to reproduce
   
   Trigger 208 jobs at once. Keeping the keda auto scaling workers to 0.
   The scheduler will get overloaded and will restart. Out of 208 jobs some will fail without any logs and upon retry it will be it's first attempt and no status or log of first failure will be found.
   
   ### Operating System
   
   Azure Kubernetes Services
   
   ### Versions of Apache Airflow Providers
   
   2.6.1
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   Every day we have a time where there are 208 task groups which start running in parallel. 
   At the exact same time they start we see this issue re-occurring.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Airflow scheduler restart casuing up_for_retry jobs also to be marked as failed. [airflow]

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis closed issue #35550: Airflow scheduler restart casuing up_for_retry jobs also to be marked as failed.
URL: https://github.com/apache/airflow/issues/35550


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org