You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Chuan Liu (JIRA)" <ji...@apache.org> on 2012/07/13 23:34:35 UTC

[jira] [Updated] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chuan Liu updated MAPREDUCE-4374:
---------------------------------

    Attachment: MAPREDUCE-4374-branch-1-win-2.patch

{quote}
{code}
static final String DEFAULT_HOME_DIR =
      System.getenv(Shell.WINDOWS ? "USERPROFILE" : "HOME");
{code}
{quote}
I think this is an old private constant variable that is only used in the TaskRunner class; also, it aligns with other private constant variables in the class, e.g. DEFAULT_MAPRED_ADMIN_JAVA_OPTS, DEFAULT_SHELL. So I am in favor to keep it here than put in Shell.

{quote}
Firstly, why do we need this when non-Windows gets away without it? It just seems to be pure Java code. Secondly, if we know what abstract operation FOO we are doing here, then can we push it into Shell.FOO()?
{quote}
About triming leading and trailing quotes on Windows, this is due to a subtle difference of ‘set’ on cmd shell and ‘export’ on bash shell. According to the [set document|http://technet.microsoft.com/en-us/library/bb490998] which I quoted below:
??The characters <, >, |, &, ^ are special command shell characters and must be either preceded by the escape character (^) or enclosed in quotation marks when used in string (that is, "StringContaining&Symbol". If you use quotation marks to enclose a string containing one of the special characters, *the quotation marks are set as part of the environment variable value*.??
On Linux, quotation marks are not set as part of the environment variable by ‘export’.
To launch child tasks with correct environment, we setup the environment by writing a series of ‘set’ or ‘export’ command (cf. TaskRunner.run() method) to TaskController.COMMAND_FILE, and execute the file to launch the Java job (cf. DefaultTaskController.launchTask() method). So when we receive the environment variables on Windows, we need to trim the leading and ending quotes around them in the program.
These are not perfect solutions. However notice this is the existing behavior from the previous change which may work for majority of cases. I only changed the code to make it more concise. Let’s not push this to Shell as I think the code is simply enough to be understood and we can work towards a better abstraction for this (handling ‘set’ vs ‘export’) in the future.

{quote}
Why this code addition? Is this a general bug you have found?
{quote}
Notice this is tmp directory. On Linux /tmp exists by default which is not the case on Windows.

{quote}
Should be easy to get the regex pattern for env vars from Shell?
{quote}
Moved to Shell in the new Patch.

{quote}
Should easily move inside Shell.getTempPath()?
{quote}
The folders here are really only just text string. The test is really just testing if the environment variable is set correctly. We make them folder names to make them more like real world scenario because that is a major use of environment variable in practice.

{quote}
This could easily use Shell.getUserHome()?
{quote}
Again, the purpose here is not to get user home, but to test an environment variable. I don’t think there is a need to create Shell function for this. The environment variables are very platform dependent. So I think we should accept the platform dependence here.

                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win-2.patch, MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira