You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "sabarnwal (via GitHub)" <gi...@apache.org> on 2023/02/28 07:06:31 UTC

[GitHub] [airflow] sabarnwal opened a new issue, #29800: Crashing of scheduler pod causes tasks to fail.

sabarnwal opened a new issue, #29800:
URL: https://github.com/apache/airflow/issues/29800

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   We have deployed airflow 2.3.3 using helm on our k8s cluster. We are using kubernetes executor for the tasks.
   Issue is, if our scheduler pod crashes, the running pods for those tasks are marked success (successful termination of the pod) and underlying tasks are failed.
   
   
   
   ### What you think should happen instead
   
   According to the doc, 
   
   In cases of scheduler crashes, the scheduler will recover its state using the watcher’s resourceVersion.
   
   When monitoring the Kubernetes cluster’s watcher thread, each event has a monotonically rising number called a resourceVersion. Every time the executor reads a resourceVersion, the executor stores the latest value in the backend database. Because the resourceVersion is stored, the scheduler can restart and continue reading the watcher stream from where it left off. Since the tasks are run independently of the executor and report results directly to the database, scheduler failures will not lead to task failures or re-runs.
   
   ### How to reproduce
   
   On 2.3.3, Trigger a dag manually. After the task has been started, Kill the scheduler pod manually. The running tasks will also be killed, with a sigterm error. 
   
   ### Operating System
   
   PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux"
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==4.0.0
   apache-airflow-providers-apache-spark==3.0.0
   apache-airflow-providers-celery==3.0.0
   apache-airflow-providers-cncf-kubernetes==4.1.0
   apache-airflow-providers-docker==3.0.0
   apache-airflow-providers-elasticsearch==4.0.0
   apache-airflow-providers-ftp==3.0.0
   apache-airflow-providers-google==8.1.0
   apache-airflow-providers-grpc==3.0.0
   apache-airflow-providers-hashicorp==3.0.0
   apache-airflow-providers-http==3.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-microsoft-azure==4.0.0
   apache-airflow-providers-mongo==3.0.0
   apache-airflow-providers-mysql==3.0.0
   apache-airflow-providers-odbc==3.0.0
   apache-airflow-providers-postgres==5.0.0
   apache-airflow-providers-redis==3.0.0
   apache-airflow-providers-sendgrid==3.0.0
   apache-airflow-providers-sftp==3.0.0
   apache-airflow-providers-slack==5.0.0
   apache-airflow-providers-sqlite==3.0.0
   apache-airflow-providers-ssh==3.0.0
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   Everytime the scheduler pod goes down.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on issue #29800: Crashing of scheduler pod causes tasks to fail.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #29800:
URL: https://github.com/apache/airflow/issues/29800#issuecomment-1500735310

   This issue has been closed because it has not received response from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] hussein-awala commented on issue #29800: Crashing of scheduler pod causes tasks to fail.

Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala commented on issue #29800:
URL: https://github.com/apache/airflow/issues/29800#issuecomment-1448995034

   There are 7 different releases after 2.3.3, it is very likely that the problem is solved in one of these releases, can you try upgrading to 2.5.1 and see if the issue is resolved please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] closed issue #29800: Crashing of scheduler pod causes tasks to fail.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #29800: Crashing of scheduler pod causes tasks to fail.
URL: https://github.com/apache/airflow/issues/29800


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on issue #29800: Crashing of scheduler pod causes tasks to fail.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #29800:
URL: https://github.com/apache/airflow/issues/29800#issuecomment-1491121648

   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #29800: Crashing of scheduler pod causes tasks to fail.

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #29800:
URL: https://github.com/apache/airflow/issues/29800#issuecomment-1447685823

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org