You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Zhankun Tang (JIRA)" <ji...@apache.org> on 2018/12/28 07:06:00 UTC

[jira] [Created] (YARN-9160) Add document for "PYTHONPATH" environment variable setting when using -localization options

Zhankun Tang created YARN-9160:
----------------------------------

             Summary: Add document for "PYTHONPATH" environment variable setting when using -localization options
                 Key: YARN-9160
                 URL: https://issues.apache.org/jira/browse/YARN-9160
             Project: Hadoop YARN
          Issue Type: Sub-task
            Reporter: Zhankun Tang
            Assignee: Zhankun Tang


An infra platform might want to provide the user a Zepplin notebook and execute user's job with user's command input like "python entry_point.py ...". This is better for the end user because he/she feels that the "entry_point.py" seems in the local workbench.

This may translate to below submarine command in the platform when submitting the job:

 
{code:java}
... job run
  --localization entry_script.py:./
  --localization depedency_script1.py:./
  --localization depedency_script2.py:./
  --worker_launch_cmd "python entry_point.py .."
{code}
Or 

 
{code:java}
... job run
  --localization entry_script.py:./
  --localization depedency_scripts_dir:./
  --worker_launch_cmd "python entry_script.py .."
{code}
 

When running with the above command, both will fail due to module import error from the entry_point.py. This is because YARN only creates symbol links in the container's work dir (the real scripts files are in different cache folders) and python module import won't know that.

One possible solution is set localization with a directory containing all scripts and change the worker_launch_cmd to "cd scripts_dir && python entry_script.py". But this solution makes the user experience bad which feels not in a local workbench.

And another solution is using "PYTHONPATH" environment variable. This solution can keep the user experience good and won't need YARN localization internal changes.
{code:java}
... job run
 # the entry point
 --localization entry_script.py:<path>/entry_script.py
 # the dependency Python scripts of the entry point
 --localization depedency_scripts_dir:<path>/dependency_scripts_dir
 # the PYTHONPATH env to make dependency available to entry script
 --env PYTHONPATH="<path>/dependency_scripts_dir"
 --worker_launch_cmd "python <path>/entry_script.py ..."{code}
And we should document this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org