You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Edgar Rodriguez (JIRA)" <ji...@apache.org> on 2017/12/07 20:18:00 UTC

[jira] [Created] (AIRFLOW-1893) PYTHONPATH is not propagated to `run_as_user` context, affecting DAGs using the custom packages

Edgar Rodriguez created AIRFLOW-1893:
----------------------------------------

             Summary: PYTHONPATH is not propagated to `run_as_user` context, affecting DAGs using the custom packages
                 Key: AIRFLOW-1893
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1893
             Project: Apache Airflow
          Issue Type: Bug
            Reporter: Edgar Rodriguez
            Assignee: Edgar Rodriguez


When running DAGs with {{run_as_user}} the {{PYTHONPATH}} env is not available in the user's context given that {{sudo}} wipes out the env variables. For instance, a DAG using a custom package will fail with the following exception:

{code}
[2017-12-06 01:50:08,183] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,183] {models.py:271} INFO - Processed file is not a zip file
[2017-12-06 01:50:08,184] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,184] {models.py:423} INFO - Processing dag_folder as file
[2017-12-06 01:50:08,184] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,184] {models.py:251} INFO - Processing filepath /data/airflow/test_run_as_user.py
[2017-12-06 01:50:08,184] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,184] {models.py:271} INFO - Processed file is not a zip file
[2017-12-06 01:50:08,185] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,185] {models.py:293} ERROR - Failed to import: /data/airflow/test_run_as_user.py
[2017-12-06 01:50:08,185] {base_task_runner.py:92} INFO - Subtask: Traceback (most recent call last):
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 290, in process_file
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     m = imp.load_source(mod_name, filepath)
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/data/airflow/test_run_as_user.py", line 7, in 
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     from contrib.date_utils import ds_replace
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask: ImportError: No module named contrib.date_utils
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask: Traceback (most recent call last):
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/bin/airflow", line 28, in 
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     args.func(args)
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 349, in run
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     dag = get_dag(args)
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 132, in get_dag
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     'parse.'.format(args.dag_id))
[2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask: airflow.exceptions.AirflowException: dag_id could not be found: test_run_as_user. Either the dag did not exist or it failed to parse.
[2017-12-06 01:51:07,258] {jobs.py:186} DEBUG - [heartbeat]
{code}

*Possible location of the issue in Airflow*
 {{airflow/airflow/task_runner/base_task_runner.py}}

*Resolution:*
Since {{sudo}} wipes out the environment variables for security concerns, instead of using the {{-E}} flag to propagate all variables, we can just pass the {{PYTHONPATH}} variable within the command in order to have access to the same python packages as the process spawning the {{sudo}} command.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)