You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2017/11/07 16:04:00 UTC
[jira] [Commented] (HADOOP-15019) Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST

    [ https://issues.apache.org/jira/browse/HADOOP-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242263#comment-16242263 ] 

Allen Wittenauer commented on HADOOP-15019:
-------------------------------------------

bq. My way of reproduction, first build the following trivial Java program:

Or, just use 'hadoop classpath' ...

I thought this was known, but yes, add_classpath doesn't care about where in the order it should be, just that it exists.  The code that handles HADOOP_USER_CLASSPATH_FIRST should probably strip the existing classpath before calling add_classpath. 

> Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST 
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-15019
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15019
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: bin
>            Reporter: Philip Zeyliger
>
> If a user sets {{HADOOP_USER_CLASSPATH_FIRST=true}} and furthermore includes a directory that's already in Hadoop's classpath via {{HADOOP_CLASSPATH}}, that directory will appear later than it should in the eventual $CLASSPATH. I believe this is because the de-duping at https://github.com/apache/hadoop/blob/cbc632d9abf08c56a7fc02be51b2718af30bad28/hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh#L1200 is ignoring the "before/after" parameter.
> My way of reproduction, first build the following trivial Java program:
> {code}
> $cat Test.java
> public class Test {
>   public static void main(String[]args) {
>     System.out.println(System.getenv().get("CLASSPATH"));
>   }
> }
> $javac Test.java
> $jar cf test.jar Test.class
> {code}
> With that, if you happen to have an entry in HADOOP_CLASSPATH that matches what Hadoop would produce, you'll find the ordering not honored. It's easiest to reproduce this with a match for HADOOP_CONF_DIR, as in the second case below:
> {code}
> # As you'd expect, /usr/share is first!
> $HADOOP_CONF_DIR=/etc HADOOP_USER_CLASSPATH_FIRST="true" HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' '\n' | grep -n . | grep '/usr/share'
> WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
> 1:/usr/share
> # Surprise! /usr/share is now in the 3rd line, even thought it was first in HADOOP_CLASSPATH.
> $HADOOP_CONF_DIR=/usr/share HADOOP_USER_CLASSPATH_FIRST="true" HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' '\n' | grep -n . | grep '/usr/share'
> WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
> 3:/usr/share
> {code}
> To re-iterate, what's surprising is that you can make an entry that's first in HADOOP_USER_CLASSPATH show up not first in the resulting classpath.
> I ran into this configuring {{bin/hive}} with a confdir that was being used for both HDFS and Hive, and flailing as to why my {{log4j2.properties}} wasn't being read. The one in my conf dir was lower in my classpath than one bundled in some Hive jar.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org