You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/06 23:07:28 UTC

[GitHub] [airflow] santosh-d3vpl3x opened a new issue #17479: Stopping worker leaves spark application behind when running in cluster mode

santosh-d3vpl3x opened a new issue #17479:
URL: https://github.com/apache/airflow/issues/17479


   **Apache Airflow version**: 2.1.2 and also previous versions
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): NA
   
   **Environment**: linux
   
   - **Cloud provider or hardware configuration**: NA
   - **OS** (e.g. from /etc/os-release): NA
   - **Kernel** (e.g. `uname -a`): NA
   - **Install tools**: NA
   - **Others**:
   
   **What happened**:
   We launch all our spark processes in cluster mode on YARN. This helps us in keeping resource hungry driver processes on YARN and airflow workers usually stay with deterministic workloads. During maintenance, we need to restart workers. Our users reported to have duplicate data in some instances and the time period for the respective job runs coincides with worker restarts. On investigation, I found out that airflow jobs are marked as failed due to SIGTERM during worker restart but spark job kept running on cluster. Due to retry policy, airflow launched another instance of exactly same spark job.
   
   **What you expected to happen**:
   Spark job should have been killed before worker exit. The implementation is there but it seems to be not doing the trick.
   
   **How to reproduce it**:
   - Use spark connection with yarn cluster mode.
   - With spark operator, submit a spark job. It can even be a sleep of significant time.
   --->
   
   
   **Anything else we need to know**:
   
   <!--
   
   How often does this problem occur? Once? Every time etc?
   Pretty much on each worker restart.
   
   Any relevant logs to include? Put them here in side a detail tag:
   <details><summary>x.log</summary> lots of stuff </details>
   
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17479: Stopping worker leaves spark application behind when running in cluster mode

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17479:
URL: https://github.com/apache/airflow/issues/17479#issuecomment-894559447


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org