You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/19 10:16:11 UTC

[GitHub] [airflow] aipatr opened a new issue #19701: Cuda Spawn process error

aipatr opened a new issue #19701:
URL: https://github.com/apache/airflow/issues/19701


   ### Apache Airflow version
   
   2.2.0
   
   ### Operating System
   
   Ubuntu
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-ftp==2.0.1
   apache-airflow-providers-http==2.0.1
   apache-airflow-providers-imap==2.0.1
   apache-airflow-providers-postgres==2.3.0
   apache-airflow-providers-sqlite==2.0.1
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   When training using Pytorch from an airflow dag I get the following error:
   RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
   [2021-11-19, 10:06:11 UTC] {logging_mixin.py:109} WARNING - Failed to execute job 116 for task training_final_model
   Traceback (most recent call last):
   
   
   ### What you expected to happen
   
   There is some configuration problem that doesn´t allow python to spawn processes but it forks them.
   
   ### How to reproduce
   
   Try to run a pytorch training from an airflow task
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19701:
URL: https://github.com/apache/airflow/issues/19701#issuecomment-974138813


   > This solution doesn´t work. I set in the config the variable. Restarted Webserver and scheduler and the message continue to be:
   
   Did you restart workers? You skipped the part where you shoudl describe your environment, So I have no idea what deployment you have. The setting above shoudl work for Kubernetes and LocalExecutor. But if you use Celery workers then there is another setting which you can try: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#pool 
   
   If it does not work, then using PythonVirtualEnv is the final option you shoudl try and it will for sure work.
   
   Yet another option if you really want separate process is to run your command in PythonOperator via the usual "POpen". This is completely standard way how you start new interpreter. If it does not work for some reason you could start via "system" shell option and then it will work 100% - simply place your python code in a separate file and run`python your_file.py` command.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19701:
URL: https://github.com/apache/airflow/issues/19701#issuecomment-974013673


   > But this would change the behaviour for the entire airflow and for every python process. There should be a better solution. I believe that this issue should be discussed and a better solution to be found. It should be possible to activate the spawn process using the multiprocessing package!
   
   This is entirely possible even today. You are completely free to run a PythonVirtualenv operator https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html#pythonvirtualenvoperator and run your code in it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #19701:
URL: https://github.com/apache/airflow/issues/19701#issuecomment-973936931


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #19701:
URL: https://github.com/apache/airflow/issues/19701


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aipatr commented on issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
aipatr commented on issue #19701:
URL: https://github.com/apache/airflow/issues/19701#issuecomment-974113351


   Use this flag: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#execute-tasks-new-python-interpreter - this is slower but each task is not run in a forked process but in a separate, new interpreter. That should help.
   
   This solution doesn´t work. I set in the config the variable. Restarted Webserver and scheduler and the message continue to be:
   RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19701:
URL: https://github.com/apache/airflow/issues/19701#issuecomment-974002881


   Use this flag: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#execute-tasks-new-python-interpreter  - this is slower but each task is not run in a forked process but in a separate, new interpreter. That should help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aipatr commented on issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
aipatr commented on issue #19701:
URL: https://github.com/apache/airflow/issues/19701#issuecomment-974017300


   do you know how many packages one should install in the new environment. This is unfeasible to say the less. Couldn´t we have variable that are valid only for a particular DAG and that can activate using spawn instead of fork?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aipatr commented on issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
aipatr commented on issue #19701:
URL: https://github.com/apache/airflow/issues/19701#issuecomment-974004472


   But this would change the behaviour for the entire airflow and for every python process. There should be a better solution. I believe that this issue should be discussed and a better solution to be found.
   It should be possible to activate the spawn process using the multiprocessing package!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19701:
URL: https://github.com/apache/airflow/issues/19701#issuecomment-974138813


   > This solution doesn´t work. I set in the config the variable. Restarted Webserver and scheduler and the message continue to be:
   
   Did you restart workers? You skipped the part where you shoudl describe your environment, So I have no idea what deployment you have. The setting above should work for Kubernetes and LocalExecutor. But if you use Celery workers then there is another setting which you can try: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#pool 
   
   If it does not work, then using PythonVirtualEnv is the final option you shoudl try and it will for sure work.
   
   Yet another option if you really want separate process is to run your command in PythonOperator via the usual "POpen". This is completely standard way how you start new interpreter. If it does not work for some reason you could start via "system" shell option and then it will work 100% - simply place your python code in a separate file and run`python your_file.py` command.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19701:
URL: https://github.com/apache/airflow/issues/19701#issuecomment-974019390


   > do you know how many packages one should install in the new environment. This is unfeasible to say the less. Couldn´t we have variable that are valid only for a particular DAG and that can activate using spawn instead of fork?
   
   You can install the packages you need in the system and use system_site_packages=True:
   https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/operators/python/index.html#airflow.operators.python.PythonVirtualenvOperator
   This will create a new virtualenv with the system packages symbolically linked to the installed packages.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19701: Cuda Spawn process error

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19701:
URL: https://github.com/apache/airflow/issues/19701#issuecomment-974013673


   > But this would change the behaviour for the entire airflow and for every python process. There should be a better solution. I believe that this issue should be discussed and a better solution to be found. It should be possible to activate the spawn process using the multiprocessing package!
   
   This is entirely possible even today. You are completely free to run a PythonVirtualenv operator https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html and run your code in it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org