You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Sangjin Lee <sj...@apache.org> on 2015/05/28 18:36:35 UTC

use of HADOOP_HOME

Hi folks,

I noticed this while setting up a cluster based on the current trunk. It
appears that setting HADOOP_HOME is now done much later (in
hadoop_finalize) than branch-2. Importantly this is set *after*
hadoop-env.sh (or yarn-env.sh) is invoked.

In our version of hadoop-env.sh, we have used $HADOOP_HOME to define some
more variables, but it appears that we can no longer rely on the
HADOOP_HOME value in our *-env.sh customization. Is this an intended change
in the recent shell script refactoring? What is the right thing to use in
hadoop-env.sh for the location of hadoop?

Thanks,
Sangjin

Re: use of HADOOP_HOME

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Sangjin,

In the new scripts, HADOOP_PREFIX is set very early in execution.  This
happens inside the hadoop_bootstrap function, which executes before
hadoop_exec_hadoopenv, so I expect you can use that.

However, HADOOP-11393 proposes reverting HADOOP_PREFIX and switching
things back to HADOOP_HOME, so this might be a moving target right now.  I
haven't looked at the patch in detail yet.

https://issues.apache.org/jira/browse/HADOOP-11393


--Chris Nauroth




On 5/28/15, 9:36 AM, "Sangjin Lee" <sj...@apache.org> wrote:

>Hi folks,
>
>I noticed this while setting up a cluster based on the current trunk. It
>appears that setting HADOOP_HOME is now done much later (in
>hadoop_finalize) than branch-2. Importantly this is set *after*
>hadoop-env.sh (or yarn-env.sh) is invoked.
>
>In our version of hadoop-env.sh, we have used $HADOOP_HOME to define some
>more variables, but it appears that we can no longer rely on the
>HADOOP_HOME value in our *-env.sh customization. Is this an intended
>change
>in the recent shell script refactoring? What is the right thing to use in
>hadoop-env.sh for the location of hadoop?
>
>Thanks,
>Sangjin


Re: use of HADOOP_HOME

Posted by Allen Wittenauer <aw...@altiscale.com>.
On May 28, 2015, at 11:29 AM, Sangjin Lee <sj...@gmail.com> wrote:

> Thanks Chris and Allen for the info! Yes, we can use HADOOP_PREFIX
> until/unless HADOOP-11393 is resolved.
> 
> Just to clarify, we're not setting HADOOP_HOME/HADOOP_PREFIX in our
> *-env.sh; we simply use them. I don't know that it is always feasible to
> set them at the machine level. Some setups may have multiple hadoop
> installs and want to switch between them, and so on.

	Yup.  Understood.  In fact, it’s probably worth pointing out that if you do a tar-ball style install (e.g, all the hadoop gunk is in one dir), trunk will figure all these vars out based upon the hadoop/yarn/etc bin in your path. :)  … and HADOOP_PREFIX should be set to something by the time *-env.sh gets set, so should be safe to use there.  It’s just HADOOP_HOME that’s problematic.  If HADOOP-11393 gets committed, then the rules will be a bit different….

	trunk’s most powerful env var is probably HADOOP_LIBEXEC_DIR, actually.  But I’ll leave that as an exercise for the reader to as to why.
	

Re: use of HADOOP_HOME

Posted by Sangjin Lee <sj...@gmail.com>.
Thanks Chris and Allen for the info! Yes, we can use HADOOP_PREFIX
until/unless HADOOP-11393 is resolved.

Just to clarify, we're not setting HADOOP_HOME/HADOOP_PREFIX in our
*-env.sh; we simply use them. I don't know that it is always feasible to
set them at the machine level. Some setups may have multiple hadoop
installs and want to switch between them, and so on.

On Thu, May 28, 2015 at 10:13 AM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> On May 28, 2015, at 9:36 AM, Sangjin Lee <sj...@apache.org> wrote:
>
> > Hi folks,
> >
> > I noticed this while setting up a cluster based on the current trunk. It
> > appears that setting HADOOP_HOME is now done much later (in
> > hadoop_finalize) than branch-2. Importantly this is set *after*
> > hadoop-env.sh (or yarn-env.sh) is invoked.
> >
> > In our version of hadoop-env.sh, we have used $HADOOP_HOME to define some
> > more variables, but it appears that we can no longer rely on the
> > HADOOP_HOME value in our *-env.sh customization. Is this an intended
> change
> > in the recent shell script refactoring? What is the right thing to use in
> > hadoop-env.sh for the location of hadoop?
>
>         a) HADOOP_HOME was deprecated on Unix systems as part of (IIRC)
> 0.21.  HADOOP_PREFIX was its replacement.  (No, I never understood the
> reasoning for this either.)  Past 0.21, it was never safe to rely upon
> HADOOP_HOME in *-env.sh files unless it is set prior to running the shell
> commands.
>
>         b) That said, functionality-wise, HADOP_HOME is being set in
> pretty much the same place in the code flow.  *-env.sh has already been
> processed in both branch-2 and trunk by the time HADOOP_HOME is
> configured.  trunk only configures HADOOP_HOME for backward compatibility.
> The rest of the code uses HADOOP_PREFIX as expected and very very early on
> the lifecycle.
>
>         What you are likely seeing is the result of a bug fix:  trunk
> doesn't reprocess *-env.sh files when using the shin commands whereas
> branch-2 does it several times over. (This is also one of the reasons why
> Java command line options are duplicated too.)  So it likely worked for you
> because of this broken behavior.
>
>         In my mind, it is a better practice to configure
> HADOOP_HOME/HADOOP_PREFIX outside of the *-env.sh files (e.g.,
> /etc/profile.d on Linux) so that one can use them for PATH, etc.  That
> should guarantee expected behavior.
>
>
>
>
>

Re: use of HADOOP_HOME

Posted by Allen Wittenauer <aw...@altiscale.com>.
On May 28, 2015, at 9:36 AM, Sangjin Lee <sj...@apache.org> wrote:

> Hi folks,
> 
> I noticed this while setting up a cluster based on the current trunk. It
> appears that setting HADOOP_HOME is now done much later (in
> hadoop_finalize) than branch-2. Importantly this is set *after*
> hadoop-env.sh (or yarn-env.sh) is invoked.
> 
> In our version of hadoop-env.sh, we have used $HADOOP_HOME to define some
> more variables, but it appears that we can no longer rely on the
> HADOOP_HOME value in our *-env.sh customization. Is this an intended change
> in the recent shell script refactoring? What is the right thing to use in
> hadoop-env.sh for the location of hadoop?

	a) HADOOP_HOME was deprecated on Unix systems as part of (IIRC) 0.21.  HADOOP_PREFIX was its replacement.  (No, I never understood the reasoning for this either.)  Past 0.21, it was never safe to rely upon HADOOP_HOME in *-env.sh files unless it is set prior to running the shell commands.

	b) That said, functionality-wise, HADOP_HOME is being set in pretty much the same place in the code flow.  *-env.sh has already been processed in both branch-2 and trunk by the time HADOOP_HOME is configured.  trunk only configures HADOOP_HOME for backward compatibility.  The rest of the code uses HADOOP_PREFIX as expected and very very early on the lifecycle.  

	What you are likely seeing is the result of a bug fix:  trunk doesn’t reprocess *-env.sh files when using the shin commands whereas branch-2 does it several times over. (This is also one of the reasons why Java command line options are duplicated too.)  So it likely worked for you because of this broken behavior.

	In my mind, it is a better practice to configure HADOOP_HOME/HADOOP_PREFIX outside of the *-env.sh files (e.g., /etc/profile.d on Linux) so that one can use them for PATH, etc.  That should guarantee expected behavior.