You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Jacob Ferriero (Jira)" <ji...@apache.org> on 2019/09/18 22:20:00 UTC
[jira] [Created] (AIRFLOW-5520) DataflowPythonOperator dependency
management requires side effects
Jacob Ferriero created AIRFLOW-5520:
---------------------------------------
Summary: DataflowPythonOperator dependency management requires side effects
Key: AIRFLOW-5520
URL: https://issues.apache.org/jira/browse/AIRFLOW-5520
Project: Apache Airflow
Issue Type: Improvement
Components: gcp
Affects Versions: 1.10.2
Reporter: Jacob Ferriero
When using DataflowPythonOperator it is difficult to manage apache beam version, (and other python dependencies) without affecting your entire airflow environment. It seems the Dataflow hook just submits a subprocess and python
The operator / hook should be improved to isolate python dependencies for running run py_file.
Perhaps this could be achieved in a virtual environment (similar to PythonVirtualEnvOperator).
For beam it's often customary to specify a --requirements_file or --setup_file to manage python dependencies, we could run one of these in the venv to get it setup.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)