You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/08 15:15:59 UTC

[GitHub] [airflow] cadet354 opened a new issue #15280: Incorrect handle stop DAG when use spark-submit in cluster mode on yarn cluster. on yarn

cadet354 opened a new issue #15280:
URL: https://github.com/apache/airflow/issues/15280


   **Apache Airflow version**: v2.0.1
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:
   - bare metal
   - **OS** (e.g. from /etc/os-release):
   - Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-65-generic x86_64)
   - **Kernel** (e.g. `uname -a`):
   - Linux 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
   - **Install tools**:
   - miniconda
   - **Others**:
   - python 3.7
   - hadoop 2.9.2
   - spark 2.4.7
   
   **What happened**:
   I have two problems:
   1. When starts DAG with spark-submit on yarn with deploy="cluster" airflow doesn't track state driver. 
   Therefore when yarn fails, the DAG's state remains in "running mode.
   
   2. When I manual stop DAG's job, for example, mark it as "failed", in Yarn cluster the same job is still running. 
   This error occurs because empty environment is passed  to subprocess.Popen  and hadoop bin directory doesn't exist in PATH.
   
   ERROR - [Errno 2] No such file or directory: 'yarn': 'yarn'
   
   **What you expected to happen**:
   In the first case, move the task to the "failed" state.
   In the second case, stopping the task on yarn cluster.
   <!-- What do you think went wrong? -->
   
   **How to reproduce it**:
   Reproduce the first issue: Start spark_submit on yarn cluster in  deploy="cluster" and master="yarn", then kill task on yarn UI. On airflow task state remains "running".
   Reproduce the second issue: Start spark_submit  on yarn cluster in  deploy="cluster" and master="yarn", then manually change this running job's state to "failed", on Hadoop cluster the same job remains in running state.
   
   **Anything else we need to know**:
   I propose the following changes to the  to the airflow\providers\apache\hooks\SparkSubmitHook.py:
   1. line 201: 
   ```python
   return 'spark://' in self._connection['master'] and self._connection['deploy_mode'] == 'cluster' 
   ```
   change to 
   ```python        
   return ('spark://' in self._connection['master'] or self._connection['master'] == "yarn") and \
                  (self._connection['deploy_mode'] == 'cluster')
   ```
   2. line 659
   ```python
   env = None
   ```
   change to 
   ``` python
   env = {**os.environ.copy(), **(self._env if self._env else {})}
   ```
   Applying this patch solved the above issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Kimahriman commented on issue #15280: Incorrect handle stop DAG when use spark-submit in cluster mode on yarn cluster. on yarn

Posted by GitBox <gi...@apache.org>.
Kimahriman commented on issue #15280:
URL: https://github.com/apache/airflow/issues/15280#issuecomment-839959371


   FWIW the first issue identified is still an issue. The spark-submit hook doesn't monitor yarn based jobs, it only can kill them


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #15280: Incorrect handle stop DAG when use spark-submit in cluster mode on yarn cluster. on yarn

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15280:
URL: https://github.com/apache/airflow/issues/15280#issuecomment-816042544


   @cadet354 If you have the fix you can just open a PR :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #15280: Incorrect handle stop DAG when use spark-submit in cluster mode on yarn cluster. on yarn

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #15280:
URL: https://github.com/apache/airflow/issues/15280


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #15280: Incorrect handle stop DAG when use spark-submit in cluster mode on yarn cluster. on yarn

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #15280:
URL: https://github.com/apache/airflow/issues/15280#issuecomment-815907098


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Kimahriman removed a comment on issue #15280: Incorrect handle stop DAG when use spark-submit in cluster mode on yarn cluster. on yarn

Posted by GitBox <gi...@apache.org>.
Kimahriman removed a comment on issue #15280:
URL: https://github.com/apache/airflow/issues/15280#issuecomment-839959371


   FWIW the first issue identified is still an issue. The spark-submit hook doesn't monitor yarn based jobs, it only can kill them


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org