You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/01 10:18:54 UTC

[GitHub] [airflow] nanohanno opened a new issue, #24779: Return objects from PythonVirtualenvOperator

nanohanno opened a new issue, #24779:
URL: https://github.com/apache/airflow/issues/24779

   ### Apache Airflow version
   
   2.3.0
   
   ### What happened
   
   When using the `PythonVirtualenvOperator` with a return statement containing objects from a package that is installed in the virtualenv but not on the Airflow host system the following exception is raised:
   ```
   [2022-06-30, 16:10:12 UTC] {taskinstance.py:1889} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/home/hanno/.pyenv/versions/3.8.12/envs/datamart-py38/lib/python3.8/site-packages/airflow/decorators/base.py", line 179, in execute
       return_value = super().execute(context)
     File "/home/hanno/.pyenv/versions/3.8.12/envs/datamart-py38/lib/python3.8/site-packages/airflow/operators/python.py", line 423, in execute
       return super().execute(context=serializable_context)
     File "/home/hanno/.pyenv/versions/3.8.12/envs/datamart-py38/lib/python3.8/site-packages/airflow/operators/python.py", line 171, in execute
       return_value = self.execute_callable()
     File "/home/hanno/.pyenv/versions/3.8.12/envs/datamart-py38/lib/python3.8/site-packages/airflow/operators/python.py", line 483, in execute_callable
       return self._read_result(output_filename)
     File "/home/hanno/.pyenv/versions/3.8.12/envs/datamart-py38/lib/python3.8/site-packages/airflow/operators/python.py", line 514, in _read_result
       return self.pickling_library.load(file)
   ModuleNotFoundError: No module named 'name_of_custom_module'
   ```
   As I understand it currently, the returned result gets unpickled outside the virtualenv in https://github.com/apache/airflow/blob/main/airflow/operators/python.py#L484 which raises the exception because the custom_module does not exist outside the virtualenv.
   
   
   ### What you think should happen instead
   
   Being able to easily pass pickled objects from one virtualenv task to another when both have the necessary package installed. Alternatively, having documentation about the limitations of virtualenv operators in this respect.
   
   ### How to reproduce
   
   ```
   @task.virtualenv(task_id="task0", requirements="pandas")
   def pandas_task():
       import pandas as pd
       df = pd.DataFrame()
       return df
   ```
   
   ### Operating System
   
   Ubuntu 20.04
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #24779: Return objects from PythonVirtualenvOperator

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #24779: Return objects from PythonVirtualenvOperator
URL: https://github.com/apache/airflow/issues/24779


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #24779: Return objects from PythonVirtualenvOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #24779:
URL: https://github.com/apache/airflow/issues/24779#issuecomment-1174462432

   If you know how unpickling works, what you want to achieve is impossible. This is how Python works, nothing specificy to Airlfow. There is no way to have an object of a type that you do not have available in virtualenv. That's pretty basic behaviour.
   
   But if you think that it needs additional documentation - feel absolutely free to contribute documentation update.  Airlfow is created by > 2100 contributors - mostly users like you and they contribute to airflow (including docs) on a daily basis. 
   
   Since you experienced that problem, you are probably the best person in the world to contribute that documentation - you know where you'd look for it, so simply adding it there is the best idea. And it's super-simple - simply click "Suggest a change on this Page" and it will open PR where you will be able to add appropriate documentation. This is even easier than opening this issue was (same UI from Github used). 
   
   You are absolutely welcome to add such PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #24779: Return objects from PythonVirtualenvOperator

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #24779:
URL: https://github.com/apache/airflow/issues/24779#issuecomment-1172188926

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org