You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2014/04/18 22:47:14 UTC

[jira] [Updated] (MAPREDUCE-5850) PATH environment variable contains duplicate values in map and reduce tasks on Windows.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Nauroth updated MAPREDUCE-5850:
-------------------------------------

    Attachment: MAPREDUCE-5850.1.patch

This is only a problem on Windows.  It doesn't happen on Linux.  Here is a description of how this happens.

In {{MRJobConfig}}, the default value of {{mapreduce.admin.user.env}} is defined to set the PATH environment variable on Windows so that tasks will be able to find and load hadoop.dll.

{code}
  public final String DEFAULT_MAPRED_ADMIN_USER_ENV = 
      Shell.WINDOWS ? 
          "PATH=%PATH%;%HADOOP_COMMON_HOME%\\bin":
          "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native";
{code}

{{TaskAttemptImpl#createCommonContainerLaunchContext}} sets up the base environment.  As part of that, it includes picking up {{mapreduce.admin.user.env}}.  This is the point where the behavior diverges from Linux.  On Linux, the common context won't have a PATH.  On Windows, the common context will have a PATH.

{code}
    // Add the env variables passed by the admin
    MRApps.setEnvFromInputString(
        environment, 
        conf.get(
            MRJobConfig.MAPRED_ADMIN_USER_ENV, 
            MRJobConfig.DEFAULT_MAPRED_ADMIN_USER_ENV), conf
        );
{code}

Then, at task launch time, we end up setting PATH again via a call to {{TaskAttemptImpl#createContainerLaunchContext}} -> {{MapReduceChildJVM#setVMEnv}} -> {{MRApps#setEnvFromInputString}} -> {{Apps#setEnvFromInputString}}.  This uses {{Apps#addToEnvironment}} to set the new value in the environment, and the logic of this method appends to existing values:

{code}
  @Public
  @Unstable
  public static void addToEnvironment(
      Map<String, String> environment,
      String variable, String value, String classPathSeparator) {
    String val = environment.get(variable);
    if (val == null) {
      val = value;
    } else {
      val = val + classPathSeparator + value;
    }
    environment.put(StringInterner.weakIntern(variable), 
        StringInterner.weakIntern(val));
  }
{code}

I haven't been able to come up with a clean fix for this.  We can't change the default value of {{mapreduce.admin.user.env}}, because tasks are dependent on it to find the native code (an absolute must on Windows).  We can't drop the appending behavior, because there are valid use cases dependent on it.  Adding a special case for Windows + PATH seems hacky.  Does anyone else have ideas?

Since this is ultimately harmless, we might consider simply relaxing the assertion in {{TestMiniMRChildTask}}.  I'm attaching a patch that does that.  This passes on Mac and Windows.

> PATH environment variable contains duplicate values in map and reduce tasks on Windows.
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5850
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5850
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>            Priority: Minor
>         Attachments: MAPREDUCE-5850.1.patch
>
>
> The value of the PATH environment variable gets appended twice before execution of a container for a map or reduce task.  This is ultimately harmless at runtime, but it does cause a failure in {{TestMiniMRChildTask}} when running on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)