You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/12/12 00:49:00 UTC

[jira] [Commented] (AIRFLOW-1893) PYTHONPATH is not propagated to `run_as_user` context, affecting DAGs using the custom packages

    [ https://issues.apache.org/jira/browse/AIRFLOW-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286879#comment-16286879 ] 

ASF subversion and git services commented on AIRFLOW-1893:
----------------------------------------------------------

Commit 9d9727a80a3948615a4085d5168c24394fde5c84 in incubator-airflow's branch refs/heads/master from [~erod]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=9d9727a ]

[AIRFLOW-1893][AIRFLOW-1901] Propagate PYTHONPATH when using impersonation

When using impersonation via `run_as_user`, the
PYTHONPATH environment
variable is not propagated hence there may be
issues when depending on
specific custom packages used in DAGs.
This PR propagates only the PYTHONPATH in the
process creating the
sub-process with impersonation, if any.

Tested in staging environment; impersonation tests
in airflow are not very portable and fixing them
would take additional work, leaving as TODO and
tracking with jira ticket: https://issues.apache.o
rg/jira/browse/AIRFLOW-1901.

Closes #2860 from edgarRd/erod-
pythonpath_run_as_user


> PYTHONPATH is not propagated to `run_as_user` context, affecting DAGs using the custom packages
> -----------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-1893
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1893
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Edgar Rodriguez
>            Assignee: Edgar Rodriguez
>
> When running DAGs with {{run_as_user}} the {{PYTHONPATH}} env is not available in the user's context given that {{sudo}} wipes out the env variables. For instance, a DAG using a custom package will fail with the following exception:
> {code}
> [2017-12-06 01:50:08,183] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,183] {models.py:271} INFO - Processed file is not a zip file
> [2017-12-06 01:50:08,184] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,184] {models.py:423} INFO - Processing dag_folder as file
> [2017-12-06 01:50:08,184] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,184] {models.py:251} INFO - Processing filepath /data/airflow/test_run_as_user.py
> [2017-12-06 01:50:08,184] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,184] {models.py:271} INFO - Processed file is not a zip file
> [2017-12-06 01:50:08,185] {base_task_runner.py:92} INFO - Subtask: [2017-12-06 01:50:08,185] {models.py:293} ERROR - Failed to import: /data/airflow/test_run_as_user.py
> [2017-12-06 01:50:08,185] {base_task_runner.py:92} INFO - Subtask: Traceback (most recent call last):
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 290, in process_file
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     m = imp.load_source(mod_name, filepath)
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/data/airflow/test_run_as_user.py", line 7, in 
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     from contrib.date_utils import ds_replace
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask: ImportError: No module named contrib.date_utils
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask: Traceback (most recent call last):
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/bin/airflow", line 28, in 
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     args.func(args)
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 349, in run
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     dag = get_dag(args)
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:   File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 132, in get_dag
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask:     'parse.'.format(args.dag_id))
> [2017-12-06 01:50:08,186] {base_task_runner.py:92} INFO - Subtask: airflow.exceptions.AirflowException: dag_id could not be found: test_run_as_user. Either the dag did not exist or it failed to parse.
> [2017-12-06 01:51:07,258] {jobs.py:186} DEBUG - [heartbeat]
> {code}
> *Possible location of the issue in Airflow*
>  {{airflow/airflow/task_runner/base_task_runner.py}}
> *Resolution:*
> Since {{sudo}} wipes out the environment variables for security concerns, instead of using the {{-E}} flag to propagate all variables, we can just pass the {{PYTHONPATH}} variable within the command in order to have access to the same python packages as the process spawning the {{sudo}} command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)