You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Eli Collins (JIRA)" <ji...@apache.org> on 2011/06/12 08:03:51 UTC
[jira] [Updated] (HADOOP-6605) Add JAVA_HOME detection to hadoop-config

     [ https://issues.apache.org/jira/browse/HADOOP-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-6605:
--------------------------------

    Attachment: hadoop-6605-3.patch

Thanks for the feedback everyone. Popping the stack, Hadoop requires the user set JAVA_HOME for two reasons:

# We want to add tools.jar to the classpath, and JAVA_HOME let's the user specify a base directory to look (other than the default java which may be from a JRE and therefore not have tools.jar). This is no longer an issue since HADOOP-7374 removed it.
# We want to respect JAVA_HOME even if there is already a java in the path. Ie users and admins can easily configure which java should be used with Hadoop that's different from the default system java. This makes sense given that Hadoop is picky. Therefore it makes sense to only auto-detect JAVA_HOME if it is not set (which all versions of the patch do) and we can determine a reasonable value.

On OSX, they provide an API (java_home(1)) that does this (returns a path suitable for setting JAVA_HOME based on enabled/preferred JVM'S as set by Java Preferences). I think we agree it makes sense to use this.

On Linux, there is no single API that works across distributions. Even though alternatives is widely available it works differently on different distriubtions (also, it indicates where the java binary lives, not where JAVA_HOME is, though you could determine that with readlink). There are well-known locations where JAVA_HOME is installed that you can check to reasonably detect it. This is the approach taken by the previous patch. I've provided data that shows that checking a set of directories does not measurably impact the execution time (therefore "too much work" sounds like a philosophical objection rather than a technical objection to me). I've found that globbing is not an issue in practice because the glob does not match more than one installation on a given system. This is because the JDK was resolved via a packaging dependency and the package updates itself rather than having multiple versions installed. People who manually install multiple JDKs typically set JAVA_HOME explicitly and therefore the detection is not used. There are no alternative proposals for autodetecting JAVA_HOME on Linux, and I'm not going to spend any more time on this part for now so I'm dropping this case from the patch.

In any case (ha), there is consensus on the OSX approach so let's just go with this for now. We can easily implement cases for other OS types in the future if there's an approach that's acceptable. Patch attached.

> Add JAVA_HOME detection to hadoop-config
> ----------------------------------------
>
>                 Key: HADOOP-6605
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6605
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Chad Metcalf
>            Assignee: Eli Collins
>            Priority: Minor
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-6605.patch, hadoop-6605-1.patch, hadoop-6605-2.patch, hadoop-6605-3.patch
>
>
> The commands that source hadoop-config.sh currently bail with an error if JAVA_HOME is not set. Let's detect JAVA_HOME (from a list of locations on various OS types) if JAVA_HOME is not already set by hadoop-env.sh or the environment. This way users don't have to manually configure it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira