You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "sabarnwal (via GitHub)" <gi...@apache.org> on 2023/02/28 07:06:31 UTC
[GitHub] [airflow] sabarnwal opened a new issue, #29800: Crashing of scheduler pod causes tasks to fail.
sabarnwal opened a new issue, #29800:
URL: https://github.com/apache/airflow/issues/29800
### Apache Airflow version
Other Airflow 2 version (please specify below)
### What happened
We have deployed airflow 2.3.3 using helm on our k8s cluster. We are using kubernetes executor for the tasks.
Issue is, if our scheduler pod crashes, the running pods for those tasks are marked success (successful termination of the pod) and underlying tasks are failed.
### What you think should happen instead
According to the doc,
In cases of scheduler crashes, the scheduler will recover its state using the watcher’s resourceVersion.
When monitoring the Kubernetes cluster’s watcher thread, each event has a monotonically rising number called a resourceVersion. Every time the executor reads a resourceVersion, the executor stores the latest value in the backend database. Because the resourceVersion is stored, the scheduler can restart and continue reading the watcher stream from where it left off. Since the tasks are run independently of the executor and report results directly to the database, scheduler failures will not lead to task failures or re-runs.
### How to reproduce
On 2.3.3, Trigger a dag manually. After the task has been started, Kill the scheduler pod manually. The running tasks will also be killed, with a sigterm error.
### Operating System
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux"
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon==4.0.0
apache-airflow-providers-apache-spark==3.0.0
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.1.0
apache-airflow-providers-docker==3.0.0
apache-airflow-providers-elasticsearch==4.0.0
apache-airflow-providers-ftp==3.0.0
apache-airflow-providers-google==8.1.0
apache-airflow-providers-grpc==3.0.0
apache-airflow-providers-hashicorp==3.0.0
apache-airflow-providers-http==3.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-microsoft-azure==4.0.0
apache-airflow-providers-mongo==3.0.0
apache-airflow-providers-mysql==3.0.0
apache-airflow-providers-odbc==3.0.0
apache-airflow-providers-postgres==5.0.0
apache-airflow-providers-redis==3.0.0
apache-airflow-providers-sendgrid==3.0.0
apache-airflow-providers-sftp==3.0.0
apache-airflow-providers-slack==5.0.0
apache-airflow-providers-sqlite==3.0.0
apache-airflow-providers-ssh==3.0.0
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### Anything else
Everytime the scheduler pod goes down.
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #29800: Crashing of scheduler pod causes tasks to fail.
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #29800:
URL: https://github.com/apache/airflow/issues/29800#issuecomment-1500735310
This issue has been closed because it has not received response from the issue author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] hussein-awala commented on issue #29800: Crashing of scheduler pod causes tasks to fail.
Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala commented on issue #29800:
URL: https://github.com/apache/airflow/issues/29800#issuecomment-1448995034
There are 7 different releases after 2.3.3, it is very likely that the problem is solved in one of these releases, can you try upgrading to 2.5.1 and see if the issue is resolved please?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] closed issue #29800: Crashing of scheduler pod causes tasks to fail.
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #29800: Crashing of scheduler pod causes tasks to fail.
URL: https://github.com/apache/airflow/issues/29800
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #29800: Crashing of scheduler pod causes tasks to fail.
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #29800:
URL: https://github.com/apache/airflow/issues/29800#issuecomment-1491121648
This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #29800: Crashing of scheduler pod causes tasks to fail.
Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #29800:
URL: https://github.com/apache/airflow/issues/29800#issuecomment-1447685823
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org