You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Philip Zeyliger (JIRA)" <ji...@apache.org> on 2017/11/06 23:05:00 UTC

[jira] [Created] (HADOOP-15019) Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST

Philip Zeyliger created HADOOP-15019:
----------------------------------------

             Summary: Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST 
                 Key: HADOOP-15019
                 URL: https://issues.apache.org/jira/browse/HADOOP-15019
             Project: Hadoop Common
          Issue Type: Bug
          Components: bin
            Reporter: Philip Zeyliger


If a user sets {{HADOOP_USER_CLASSPATH_FIRST=true}} and furthermore includes a directory that's already in Hadoop's classpath via {{HADOOP_CLASSPATH}}, that directory will appear later than it should in the eventual $CLASSPATH. I believe this is because the de-duping at https://github.com/apache/hadoop/blob/cbc632d9abf08c56a7fc02be51b2718af30bad28/hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh#L1200 is ignoring the "before/after" parameter.

My way of reproduction, first build the following trivial Java program:

{code}
$cat Test.java
public class Test {
  public static void main(String[]args) {
    System.out.println(System.getenv().get("CLASSPATH"));
  }
}
$javac Test.java
$jar cf test.jar Test.class
{code}

With that, if you happen to have an entry in HADOOP_CLASSPATH that matches what Hadoop would produce, you'll find the ordering not honored. It's easiest to reproduce this with a match for HADOOP_CONF_DIR, as in the second case below:
{code}
# As you'd expect, /usr/share is first!
$HADOOP_CONF_DIR=/etc HADOOP_USER_CLASSPATH_FIRST="true" HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' '\n' | grep -n . | grep '/usr/share'
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
1:/usr/share

# Surprise! /usr/share is now in the 3rd line, even thought it was first in HADOOP_CLASSPATH.
$HADOOP_CONF_DIR=/usr/share HADOOP_USER_CLASSPATH_FIRST="true" HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' '\n' | grep -n . | grep '/usr/share'
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
3:/usr/share
{code}

To re-iterate, what's surprising is that you can make an entry that's first in HADOOP_USER_CLASSPATH show up not first in the resulting classpath.

I ran into this configuring {{bin/hive}} with a confdir that was being used for both HDFS and Hive, and flailing as to why my {{log4j2.properties}} wasn't being read. The one in my conf dir was lower in my classpath than one bundled in some Hive jar.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org