You are viewing a plain text version of this content. The canonical link for it is here.
Posted to submarine-dev@hadoop.apache.org by "Zhankun Tang (Jira)" <ji...@apache.org> on 2019/09/23 02:45:00 UTC
[jira] [Updated] (SUBMARINE-35) [Submarine] Document "PYTHONPATH"
environment variable setting when using -localization options
[ https://issues.apache.org/jira/browse/SUBMARINE-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhankun Tang updated SUBMARINE-35:
----------------------------------
Fix Version/s: 0.2.0
> [Submarine] Document "PYTHONPATH" environment variable setting when using -localization options
> -----------------------------------------------------------------------------------------------
>
> Key: SUBMARINE-35
> URL: https://issues.apache.org/jira/browse/SUBMARINE-35
> Project: Hadoop Submarine
> Issue Type: New Feature
> Reporter: Zhankun Tang
> Assignee: Zhankun Tang
> Priority: Major
> Fix For: 0.2.0
>
> Attachments: YARN-9160-trunk.001.patch
>
>
> An infra platform might want to provide the user a Zepplin notebook and execute user's job with user's command input like "python entry_point.py ...". This is better for the end user because he/she feels that the "entry_point.py" seems in the local workbench.
> This may translate to below submarine command in the platform when submitting the job:
>
> {code:java}
> ... job run
> --localization entry_script.py:./
> --localization depedency_script1.py:./
> --localization depedency_script2.py:./
> --worker_launch_cmd "python entry_point.py .."
> {code}
> Or
>
> {code:java}
> ... job run
> --localization entry_script.py:./
> --localization depedency_scripts_dir:./
> --worker_launch_cmd "python entry_script.py .."
> {code}
>
> When running with the above command, both will fail due to module import error from the entry_point.py. This is because YARN only creates symbol links in the container's work dir (the real scripts files are in different cache folders) and python module import won't know that.
> One possible solution is set localization with a directory containing all scripts and change the worker_launch_cmd to "cd scripts_dir && python entry_script.py". But this solution makes the user experience bad which feels not in a local workbench.
> And another solution is using "PYTHONPATH" environment variable. This solution can keep the user experience good and won't need YARN localization internal changes.
> {code:java}
> ... job run
> # the entry point
> --localization entry_script.py:<path>/entry_script.py
> # the dependency Python scripts of the entry point
> --localization depedency_scripts_dir:<path>/dependency_scripts_dir
> # the PYTHONPATH env to make dependency available to entry script
> --env PYTHONPATH="<path>/dependency_scripts_dir"
> --worker_launch_cmd "python <path>/entry_script.py ..."{code}
> And we should document this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)