You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Chuan Liu (JIRA)" <ji...@apache.org> on 2012/06/26 21:09:44 UTC

[jira] [Updated] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chuan Liu updated MAPREDUCE-4374:
---------------------------------

    Attachment: MAPREDUCE-4374-branch-1-win.patch

In this patch, I provide a more complete implementation for child environment variable expansion, as well as adding support for Windows. For this feature, we will have different syntaxes on Windows and Linux. I added some descriptions to the Java doc as well. For example, users can specify env via the following config on Linux.
{code}
<property>
  <name>mapred.child.env</name>
  <value>PATH=$HOME:/opt/bin</value>
</property>
{code}
While on Windows, the equivalent will look like:
{code}
<property>
  <name>mapred.child.env</name>
  <value>PATH=%HOME%;C:\opt\bin</value>
</property>
{code}
For the implementation, I followed the following IEEE POSIX standards except the letter case based on some discussion with my colleagues, i.e. both uppercase and lowercase letters are allowed. From the discussion, it seems it is both common for applications on Linux and Windows to use lower case letters for environment variable, and Hadoop does not need to follow IEEE guideline.  If there are other common use cases in Hadoop community, we can expand the support as well.
“Environment variable names used by the utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 consist solely of uppercase letters, digits, and the '_' (underscore) from the characters defined in Portable Character Set and do not begin with a digit.”
All matching patterns in the string are considered an environment variable, and are expanded to actual values accordingly.

*Why not use existing syntax, i.e. $ and ':' (e.g. '$x=a:b'),  to set environment variables on Windows?*
The most common usage for the environment variables is to provide path holders for the programs, e.g. LD_LIBRARY_PATH, PATH, HOME, etc. Unlike Linux, ':' is common in Windows paths as in 'C:\Windows'. If we use ':' as a separate for different values for the env variable, it will cause confusing during parsing.
We need to either choose another separator, e.g. ';' (semicolon); or escape ':' (colon). Escaping ':' is very ugly in my opinion, and also not a cross platform solution. If we follow the route to use another separator, we are already changing the existing syntax. I think using '%' instead of '$' and ';' will be more natural for Windows users. Since the paths are the most common usages of env variables, and will most likely be different on Windows and Linux, so it should be fine to ask users to adopt different settings on different platforms, since they likely need to change the path settings for different OSes anyway.


I also refactored two related tests to make them run on Windows. For *TestMiniMRChildTask*, the change is essential choosing different syntax to set the child task config for different OSes. For *TestTaskEnvironment*, we removed unnecessary parts that seems to be borrowed from TestJvmManager.
                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'.  However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira