You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Jacob Ferriero (Jira)" <ji...@apache.org> on 2019/09/18 22:20:00 UTC

[jira] [Created] (AIRFLOW-5520) DataflowPythonOperator dependency management requires side effects

Jacob Ferriero created AIRFLOW-5520:
---------------------------------------

             Summary: DataflowPythonOperator dependency management requires side effects
                 Key: AIRFLOW-5520
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5520
             Project: Apache Airflow
          Issue Type: Improvement
          Components: gcp
    Affects Versions: 1.10.2
            Reporter: Jacob Ferriero


When using DataflowPythonOperator it is difficult to manage apache beam version, (and other python dependencies) without affecting your entire airflow environment. It seems the Dataflow hook just submits a subprocess and python 

The operator / hook should be improved to isolate python dependencies for running run py_file.

Perhaps this could be achieved in a virtual environment (similar to PythonVirtualEnvOperator).

For beam it's often customary to specify a --requirements_file or --setup_file to manage python dependencies, we could run one of these in the venv to get it setup. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)