You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Chuan Liu (JIRA)" <ji...@apache.org> on 2012/06/26 20:53:44 UTC

[jira] [Created] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Chuan Liu created MAPREDUCE-4374:
------------------------------------

             Summary: Fix child task environment variable config and add support for Windows
                 Key: MAPREDUCE-4374
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 1-win
            Reporter: Chuan Liu
            Priority: Minor


In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403423#comment-13403423 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4374:
----------------------------------------------------

Just checked that the CLASSPATH separator on Windows is indeed ';', so yeah, it is natural to use it. Same for '%'.

Good to see that we are not breaking any compatibility for existing users and only defining new behaviour for users on Windows.

bq. Why not use existing syntax, i.e. $ and ':' (e.g. '$x=a:b'), to set environment variables on Windows?
Didn't understand this part, the existing syntax is "mapred.child.env=MY_PATH=/tmp", is that how you set vars on Windows?
                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402782#comment-13402782 ] 

Ivan Mitic commented on MAPREDUCE-4374:
---------------------------------------

+1, change looks good to me. Agree on your points for using '%' and ';' on Windows.

                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chuan Liu updated MAPREDUCE-4374:
---------------------------------

    Attachment: MAPREDUCE-4374-branch-1-win.patch

In this patch, I provide a more complete implementation for child environment variable expansion, as well as adding support for Windows. For this feature, we will have different syntaxes on Windows and Linux. I added some descriptions to the Java doc as well. For example, users can specify env via the following config on Linux.
{code}
<property>
  <name>mapred.child.env</name>
  <value>PATH=$HOME:/opt/bin</value>
</property>
{code}
While on Windows, the equivalent will look like:
{code}
<property>
  <name>mapred.child.env</name>
  <value>PATH=%HOME%;C:\opt\bin</value>
</property>
{code}
For the implementation, I followed the following IEEE POSIX standards except the letter case based on some discussion with my colleagues, i.e. both uppercase and lowercase letters are allowed. From the discussion, it seems it is both common for applications on Linux and Windows to use lower case letters for environment variable, and Hadoop does not need to follow IEEE guideline.  If there are other common use cases in Hadoop community, we can expand the support as well.
“Environment variable names used by the utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 consist solely of uppercase letters, digits, and the '_' (underscore) from the characters defined in Portable Character Set and do not begin with a digit.”
All matching patterns in the string are considered an environment variable, and are expanded to actual values accordingly.

*Why not use existing syntax, i.e. $ and ':' (e.g. '$x=a:b'),  to set environment variables on Windows?*
The most common usage for the environment variables is to provide path holders for the programs, e.g. LD_LIBRARY_PATH, PATH, HOME, etc. Unlike Linux, ':' is common in Windows paths as in 'C:\Windows'. If we use ':' as a separate for different values for the env variable, it will cause confusing during parsing.
We need to either choose another separator, e.g. ';' (semicolon); or escape ':' (colon). Escaping ':' is very ugly in my opinion, and also not a cross platform solution. If we follow the route to use another separator, we are already changing the existing syntax. I think using '%' instead of '$' and ';' will be more natural for Windows users. Since the paths are the most common usages of env variables, and will most likely be different on Windows and Linux, so it should be fine to ask users to adopt different settings on different platforms, since they likely need to change the path settings for different OSes anyway.


I also refactored two related tests to make them run on Windows. For *TestMiniMRChildTask*, the change is essential choosing different syntax to set the child task config for different OSes. For *TestTaskEnvironment*, we removed unnecessary parts that seems to be borrowed from TestJvmManager.
                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Posted by "Bikas Saha (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404122#comment-13404122 ] 

Bikas Saha commented on MAPREDUCE-4374:
---------------------------------------

TaskRunner.java 
Should be easy to move to Shell.getUserHome()?
{code}
  static final String DEFAULT_HOME_DIR =
      System.getenv(Shell.WINDOWS ? "USERPROFILE" : "HOME");
{code}

Firstly, why do we need this when non-Windows gets away without it? It just seems to be pure Java code. Secondly, if we know what abstract operation FOO we are doing here, then can we push it into Shell.FOO()?
{code}
      // line 505
      if (Shell.WINDOWS) {
        // trim leading and trailing quotes on Windows
        String acmeDir = workDir.toString();
        acmeDir = acmeDir.replaceAll("^\"|\"$", "");
        tmpDir = new Path(acmeDir, tmp);
      } else
        tmpDir = new Path(workDir.toString(), tmp);

     ......
  // line 790
  private static void symlink(File workDir, String target, String link)
      throws IOException {
    if (link != null) {
      if (Shell.WINDOWS){
        String path = workDir.toString();
        // strip leading and trailing quotes
        path = path.replaceAll("^\"|\"$", "");
        link = path + "\\" + link;
      }
{code}

{code}
    } else if (createDir) {
      FileSystem localFs = FileSystem.getLocal(conf);
      if (!localFs.exists(tmpDir) && !localFs.mkdirs(tmpDir) && 
          !localFs.getFileStatus(tmpDir).isDir()) {
            throw new IOException("Mkdirs failed to create " +
                tmpDir.toString());
      }
    }
{code}
Why this code addition? Is this a general bug you have found?

Should be easy to get the regex pattern for env vars from Shell?
{code}
          if (Shell.WINDOWS)
            p = Pattern.compile("%([A-Za-z_][A-Za-z0-9_]*?)%");
          else
            p = Pattern.compile("\\$([A-Za-z_][A-Za-z0-9_]*)");
{code}

TestMiniMRChildTask.java
Should easily move inside Shell.getTempPath()?
{code}
  private final static String TMP_PATH = Shell.WINDOWS ? "C:\\tmp" : "/tmp";
{code}

Same question as above
{code}
  private static void checkEnv(String envName, String expValue, String mode) {
    String envValue = System.getenv(envName).trim();
    // trim leading and trailing quotes on Windows 
    if (Shell.WINDOWS)
      envValue = envValue.replaceAll("^\"|\"$", "");
{code}

I couldn't think of a simple way to move the Shell.Windows inside Shell. Mainly because of the %being also used in String.Format.
{code}
    String envKey = String.format(Shell.WINDOWS ? "MY_PATH=%1$s,HOME=%1$s,"
        + "LD_LIBRARY_PATH=%%LD_LIBRARY_PATH%%;%1$s,"
        + "PATH=%%PATH%%;%1$s,NEW_PATH=%%NEW_PATH%%;%1$s"
        : "MY_PATH=%1$s,HOME=%1$s,LD_LIBRARY_PATH=$LD_LIBRARY_PATH:%1$s,"
            + "PATH=$PATH:%1$s,NEW_PATH=$NEW_PATH:%1$s", TMP_PATH);
{code}

TestTaskEnvironment.java
This could easily use Shell.getUserHome()?
{code}
    ttConf.set("mapred.child.env", (Shell.WINDOWS ? "ROOT=%USERPROFILE%"
        : "ROOT=$HOME"));
    ...
    assertTrue(root, root.contentEquals(System
        .getenv((Shell.WINDOWS ? "USERPROFILE" : "HOME"))));
{code}

Testing
Are there any tests that validate the changes or we should add new ones (Windows and Linux)?

                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Assigned] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chuan Liu reassigned MAPREDUCE-4374:
------------------------------------

    Assignee: Chuan Liu
    
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404018#comment-13404018 ] 

Chuan Liu commented on MAPREDUCE-4374:
--------------------------------------

{quote}
Didn't understand this part, the existing syntax is "mapred.child.env=MY_PATH=/tmp", is that how you set vars on Windows?
{quote}
Oh, sorry for the confusing. By 'existing', I mean the existing setting for Hadoop on Linux. Because one possible solution is to use the Linux variable setting syntax for Hadoop on Windows. Hope this clarifies your question.
                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Posted by "Bikas Saha (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427851#comment-13427851 ] 

Bikas Saha commented on MAPREDUCE-4374:
---------------------------------------

bq.Notice this is tmp directory. On Linux /tmp exists by default which is not the case on Windows.
I think this is not limited to /tmp. It could be any directory set in the config.
btw, in the if condition is tmpDir exists then the check for it being a dir is skipped I think.

bq. Let’s not push this to Shell as I think the code is simply enough to be understood and we can work towards a better abstraction for this (handling ‘set’ vs ‘export’) in the future.
I see what the issue is. At the same time, this code is duplicated in 4 different places. So IMO, it makes sense to put these in a helper method call. When we make a better abstraction, we will know the single place to look for the old code instead of multiple places. Thoughts?
                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win-2.patch, MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Resolved] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy resolved MAPREDUCE-4374.
--------------------------------------

       Resolution: Fixed
    Fix Version/s: 1-win

I just committed this. Thanks Chuan for the patch, and Bikas for the review.

                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>             Fix For: 1-win
>
>         Attachments: MAPREDUCE-4374-branch-1-win-2.patch, MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Posted by "Chuan Liu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chuan Liu updated MAPREDUCE-4374:
---------------------------------

    Attachment: MAPREDUCE-4374-branch-1-win-2.patch

{quote}
{code}
static final String DEFAULT_HOME_DIR =
      System.getenv(Shell.WINDOWS ? "USERPROFILE" : "HOME");
{code}
{quote}
I think this is an old private constant variable that is only used in the TaskRunner class; also, it aligns with other private constant variables in the class, e.g. DEFAULT_MAPRED_ADMIN_JAVA_OPTS, DEFAULT_SHELL. So I am in favor to keep it here than put in Shell.

{quote}
Firstly, why do we need this when non-Windows gets away without it? It just seems to be pure Java code. Secondly, if we know what abstract operation FOO we are doing here, then can we push it into Shell.FOO()?
{quote}
About triming leading and trailing quotes on Windows, this is due to a subtle difference of ‘set’ on cmd shell and ‘export’ on bash shell. According to the [set document|http://technet.microsoft.com/en-us/library/bb490998] which I quoted below:
??The characters <, >, |, &, ^ are special command shell characters and must be either preceded by the escape character (^) or enclosed in quotation marks when used in string (that is, "StringContaining&Symbol". If you use quotation marks to enclose a string containing one of the special characters, *the quotation marks are set as part of the environment variable value*.??
On Linux, quotation marks are not set as part of the environment variable by ‘export’.
To launch child tasks with correct environment, we setup the environment by writing a series of ‘set’ or ‘export’ command (cf. TaskRunner.run() method) to TaskController.COMMAND_FILE, and execute the file to launch the Java job (cf. DefaultTaskController.launchTask() method). So when we receive the environment variables on Windows, we need to trim the leading and ending quotes around them in the program.
These are not perfect solutions. However notice this is the existing behavior from the previous change which may work for majority of cases. I only changed the code to make it more concise. Let’s not push this to Shell as I think the code is simply enough to be understood and we can work towards a better abstraction for this (handling ‘set’ vs ‘export’) in the future.

{quote}
Why this code addition? Is this a general bug you have found?
{quote}
Notice this is tmp directory. On Linux /tmp exists by default which is not the case on Windows.

{quote}
Should be easy to get the regex pattern for env vars from Shell?
{quote}
Moved to Shell in the new Patch.

{quote}
Should easily move inside Shell.getTempPath()?
{quote}
The folders here are really only just text string. The test is really just testing if the environment variable is set correctly. We make them folder names to make them more like real world scenario because that is a major use of environment variable in practice.

{quote}
This could easily use Shell.getUserHome()?
{quote}
Again, the purpose here is not to get user home, but to test an environment variable. I don’t think there is a need to create Shell function for this. The environment variables are very platform dependent. So I think we should accept the platform dependence here.

                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win-2.patch, MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

Posted by "Bikas Saha (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444306#comment-13444306 ] 

Bikas Saha commented on MAPREDUCE-4374:
---------------------------------------

On a second look, I can see why these changes make sense in being test specific. +1.
                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win-2.patch, MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira