You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Tom Brown <to...@gmail.com> on 2013/03/12 20:50:00 UTC

HADOOP_CLIENT_OPTS getting set multiple times (is this a bug?)

I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).

I noticed that my task tracker processes have multiple "-Xmx" configs
attached, and that the later ones (128m) were overriding the ones I
had intended to be used (500m).

After digging through the various scripts, I found that the problem is
happening because "hadoop-env.sh" is getting invoked multiple times.
The deb file created a link from "/etc/profile.d/" to hadoop-env.sh,
so this file is run whenever I log in. The "hadoop" script also
invokes hadoop-env.sh (via "hadoop-config.sh"). The following sequence
is causing the problem:

1. The first time hadoop-env.sh is invoked (when the user logs in),
HADOOP_CLIENT_OPTS is set to "-Xmx128m ...".

2. The second time hadoop-env.sh is invoked (when a Hadoop process is
started), HADOOP_OPTS is set to "... $HADOOP_CLIENT_OPTS" (thereby
including the memory setting for all Hadoop processes in general)

3. Also during the second execution, HADOOP_CLIENT_OPTS is recursively
set to "-Xmx128m $HADOOP_CLIENT_OPTS" (so it now contains "-Xmx128m
-Xmx128m").

4. When the actual hadoop process is started, it always includes both
JAVA_HEAP_SIZE and HADOOP_OPTS (in that order), but since HADOOP_OPTS
also has a memory setting and is later in the command line, it takes
precedence.

I couldn't find any bug that matched this, so I thought I'd reach out
to the community: Is this a known bug? Do the scripts and deb file
belong to Hadoop in general, or is this the responsibility of a
specific distribution?

Thanks in advance!

--Tom

Re: HADOOP_CLIENT_OPTS getting set multiple times (is this a bug?)

Posted by Matt Foley <mf...@hortonworks.com>.
>> I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).
...
>> Do the scripts and deb file belong to Hadoop in general, or is this the
responsibility of a specific distribution?

Hi Tom,
Good description.  I searched in Jira for "HADOOP_CLIENT_OPTS", and it
appears there are at least two bugs open on this issue (although in the
later context of 2.0.2 and 3.0):
HADOOP-9211<https://issues.apache.org/jira/browse/HADOOP-9211>
 and HADOOP-9351 <https://issues.apache.org/jira/browse/HADOOP-9351>.  I
encourage you to follow and/or contribute to those jiras if you are
interested in improving the usability issue.

Regarding whether you are looking at Apache stuff or something specific to
a distro:
I'm going to get a little pedantic here, sorry, there's no other way to
explain it and the differences are actually important from a legalistic
standpoint.

As members of this community, we wear multiple "hats".  I'm a committer and
PMC member for the Apache Hadoop project, and wearing that "hat" I was also
the release manager for Hadoop-1.0.2.  I think you found those deb packages in
the Apache artifact repositories.  If so, it was compiled by me, as Release
Manager for that release of Hadoop-1.  But it wasn't compiled by
Hortonworks -- even though I am also an employee of Hortonworks and
Hortonworks supports my work on behalf of the community.

Hortonworks makes releases of HDP, their supported product which includes
or is "powered by" Apache Hadoop and related projects.  Other companies
also publish distributions powered by Hadoop.  But those distros are
available from their respective companies' web sites.  Anything you
download from Apache is provided by members of the Apache Hadoop community
on a non-commercial basis.  All of our companies are proud to support this
work, as it is part of the opensource "virtuous circle" between the
community and the companies, the technology and the commerce.

Hope that helps.  Feel free to contact me off-list if you want to discuss
more.
Regards,
--Matt


On Tue, Mar 12, 2013 at 12:50 PM, Tom Brown <to...@gmail.com> wrote:

> I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).
>
> I noticed that my task tracker processes have multiple "-Xmx" configs
> attached, and that the later ones (128m) were overriding the ones I
> had intended to be used (500m).
>
> After digging through the various scripts, I found that the problem is
> happening because "hadoop-env.sh" is getting invoked multiple times.
> The deb file created a link from "/etc/profile.d/" to hadoop-env.sh,
> so this file is run whenever I log in. The "hadoop" script also
> invokes hadoop-env.sh (via "hadoop-config.sh"). The following sequence
> is causing the problem:
>
> 1. The first time hadoop-env.sh is invoked (when the user logs in),
> HADOOP_CLIENT_OPTS is set to "-Xmx128m ...".
>
> 2. The second time hadoop-env.sh is invoked (when a Hadoop process is
> started), HADOOP_OPTS is set to "... $HADOOP_CLIENT_OPTS" (thereby
> including the memory setting for all Hadoop processes in general)
>
> 3. Also during the second execution, HADOOP_CLIENT_OPTS is recursively
> set to "-Xmx128m $HADOOP_CLIENT_OPTS" (so it now contains "-Xmx128m
> -Xmx128m").
>
> 4. When the actual hadoop process is started, it always includes both
> JAVA_HEAP_SIZE and HADOOP_OPTS (in that order), but since HADOOP_OPTS
> also has a memory setting and is later in the command line, it takes
> precedence.
>
> I couldn't find any bug that matched this, so I thought I'd reach out
> to the community: Is this a known bug? Do the scripts and deb file
> belong to Hadoop in general, or is this the responsibility of a
> specific distribution?
>
> Thanks in advance!
>
> --Tom
>

Re: HADOOP_CLIENT_OPTS getting set multiple times (is this a bug?)

Posted by Matt Foley <mf...@hortonworks.com>.
>> I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).
...
>> Do the scripts and deb file belong to Hadoop in general, or is this the
responsibility of a specific distribution?

Hi Tom,
Good description.  I searched in Jira for "HADOOP_CLIENT_OPTS", and it
appears there are at least two bugs open on this issue (although in the
later context of 2.0.2 and 3.0):
HADOOP-9211<https://issues.apache.org/jira/browse/HADOOP-9211>
 and HADOOP-9351 <https://issues.apache.org/jira/browse/HADOOP-9351>.  I
encourage you to follow and/or contribute to those jiras if you are
interested in improving the usability issue.

Regarding whether you are looking at Apache stuff or something specific to
a distro:
I'm going to get a little pedantic here, sorry, there's no other way to
explain it and the differences are actually important from a legalistic
standpoint.

As members of this community, we wear multiple "hats".  I'm a committer and
PMC member for the Apache Hadoop project, and wearing that "hat" I was also
the release manager for Hadoop-1.0.2.  I think you found those deb packages in
the Apache artifact repositories.  If so, it was compiled by me, as Release
Manager for that release of Hadoop-1.  But it wasn't compiled by
Hortonworks -- even though I am also an employee of Hortonworks and
Hortonworks supports my work on behalf of the community.

Hortonworks makes releases of HDP, their supported product which includes
or is "powered by" Apache Hadoop and related projects.  Other companies
also publish distributions powered by Hadoop.  But those distros are
available from their respective companies' web sites.  Anything you
download from Apache is provided by members of the Apache Hadoop community
on a non-commercial basis.  All of our companies are proud to support this
work, as it is part of the opensource "virtuous circle" between the
community and the companies, the technology and the commerce.

Hope that helps.  Feel free to contact me off-list if you want to discuss
more.
Regards,
--Matt


On Tue, Mar 12, 2013 at 12:50 PM, Tom Brown <to...@gmail.com> wrote:

> I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).
>
> I noticed that my task tracker processes have multiple "-Xmx" configs
> attached, and that the later ones (128m) were overriding the ones I
> had intended to be used (500m).
>
> After digging through the various scripts, I found that the problem is
> happening because "hadoop-env.sh" is getting invoked multiple times.
> The deb file created a link from "/etc/profile.d/" to hadoop-env.sh,
> so this file is run whenever I log in. The "hadoop" script also
> invokes hadoop-env.sh (via "hadoop-config.sh"). The following sequence
> is causing the problem:
>
> 1. The first time hadoop-env.sh is invoked (when the user logs in),
> HADOOP_CLIENT_OPTS is set to "-Xmx128m ...".
>
> 2. The second time hadoop-env.sh is invoked (when a Hadoop process is
> started), HADOOP_OPTS is set to "... $HADOOP_CLIENT_OPTS" (thereby
> including the memory setting for all Hadoop processes in general)
>
> 3. Also during the second execution, HADOOP_CLIENT_OPTS is recursively
> set to "-Xmx128m $HADOOP_CLIENT_OPTS" (so it now contains "-Xmx128m
> -Xmx128m").
>
> 4. When the actual hadoop process is started, it always includes both
> JAVA_HEAP_SIZE and HADOOP_OPTS (in that order), but since HADOOP_OPTS
> also has a memory setting and is later in the command line, it takes
> precedence.
>
> I couldn't find any bug that matched this, so I thought I'd reach out
> to the community: Is this a known bug? Do the scripts and deb file
> belong to Hadoop in general, or is this the responsibility of a
> specific distribution?
>
> Thanks in advance!
>
> --Tom
>

Re: HADOOP_CLIENT_OPTS getting set multiple times (is this a bug?)

Posted by Matt Foley <mf...@hortonworks.com>.
>> I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).
...
>> Do the scripts and deb file belong to Hadoop in general, or is this the
responsibility of a specific distribution?

Hi Tom,
Good description.  I searched in Jira for "HADOOP_CLIENT_OPTS", and it
appears there are at least two bugs open on this issue (although in the
later context of 2.0.2 and 3.0):
HADOOP-9211<https://issues.apache.org/jira/browse/HADOOP-9211>
 and HADOOP-9351 <https://issues.apache.org/jira/browse/HADOOP-9351>.  I
encourage you to follow and/or contribute to those jiras if you are
interested in improving the usability issue.

Regarding whether you are looking at Apache stuff or something specific to
a distro:
I'm going to get a little pedantic here, sorry, there's no other way to
explain it and the differences are actually important from a legalistic
standpoint.

As members of this community, we wear multiple "hats".  I'm a committer and
PMC member for the Apache Hadoop project, and wearing that "hat" I was also
the release manager for Hadoop-1.0.2.  I think you found those deb packages in
the Apache artifact repositories.  If so, it was compiled by me, as Release
Manager for that release of Hadoop-1.  But it wasn't compiled by
Hortonworks -- even though I am also an employee of Hortonworks and
Hortonworks supports my work on behalf of the community.

Hortonworks makes releases of HDP, their supported product which includes
or is "powered by" Apache Hadoop and related projects.  Other companies
also publish distributions powered by Hadoop.  But those distros are
available from their respective companies' web sites.  Anything you
download from Apache is provided by members of the Apache Hadoop community
on a non-commercial basis.  All of our companies are proud to support this
work, as it is part of the opensource "virtuous circle" between the
community and the companies, the technology and the commerce.

Hope that helps.  Feel free to contact me off-list if you want to discuss
more.
Regards,
--Matt


On Tue, Mar 12, 2013 at 12:50 PM, Tom Brown <to...@gmail.com> wrote:

> I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).
>
> I noticed that my task tracker processes have multiple "-Xmx" configs
> attached, and that the later ones (128m) were overriding the ones I
> had intended to be used (500m).
>
> After digging through the various scripts, I found that the problem is
> happening because "hadoop-env.sh" is getting invoked multiple times.
> The deb file created a link from "/etc/profile.d/" to hadoop-env.sh,
> so this file is run whenever I log in. The "hadoop" script also
> invokes hadoop-env.sh (via "hadoop-config.sh"). The following sequence
> is causing the problem:
>
> 1. The first time hadoop-env.sh is invoked (when the user logs in),
> HADOOP_CLIENT_OPTS is set to "-Xmx128m ...".
>
> 2. The second time hadoop-env.sh is invoked (when a Hadoop process is
> started), HADOOP_OPTS is set to "... $HADOOP_CLIENT_OPTS" (thereby
> including the memory setting for all Hadoop processes in general)
>
> 3. Also during the second execution, HADOOP_CLIENT_OPTS is recursively
> set to "-Xmx128m $HADOOP_CLIENT_OPTS" (so it now contains "-Xmx128m
> -Xmx128m").
>
> 4. When the actual hadoop process is started, it always includes both
> JAVA_HEAP_SIZE and HADOOP_OPTS (in that order), but since HADOOP_OPTS
> also has a memory setting and is later in the command line, it takes
> precedence.
>
> I couldn't find any bug that matched this, so I thought I'd reach out
> to the community: Is this a known bug? Do the scripts and deb file
> belong to Hadoop in general, or is this the responsibility of a
> specific distribution?
>
> Thanks in advance!
>
> --Tom
>

Re: HADOOP_CLIENT_OPTS getting set multiple times (is this a bug?)

Posted by Matt Foley <mf...@hortonworks.com>.
>> I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).
...
>> Do the scripts and deb file belong to Hadoop in general, or is this the
responsibility of a specific distribution?

Hi Tom,
Good description.  I searched in Jira for "HADOOP_CLIENT_OPTS", and it
appears there are at least two bugs open on this issue (although in the
later context of 2.0.2 and 3.0):
HADOOP-9211<https://issues.apache.org/jira/browse/HADOOP-9211>
 and HADOOP-9351 <https://issues.apache.org/jira/browse/HADOOP-9351>.  I
encourage you to follow and/or contribute to those jiras if you are
interested in improving the usability issue.

Regarding whether you are looking at Apache stuff or something specific to
a distro:
I'm going to get a little pedantic here, sorry, there's no other way to
explain it and the differences are actually important from a legalistic
standpoint.

As members of this community, we wear multiple "hats".  I'm a committer and
PMC member for the Apache Hadoop project, and wearing that "hat" I was also
the release manager for Hadoop-1.0.2.  I think you found those deb packages in
the Apache artifact repositories.  If so, it was compiled by me, as Release
Manager for that release of Hadoop-1.  But it wasn't compiled by
Hortonworks -- even though I am also an employee of Hortonworks and
Hortonworks supports my work on behalf of the community.

Hortonworks makes releases of HDP, their supported product which includes
or is "powered by" Apache Hadoop and related projects.  Other companies
also publish distributions powered by Hadoop.  But those distros are
available from their respective companies' web sites.  Anything you
download from Apache is provided by members of the Apache Hadoop community
on a non-commercial basis.  All of our companies are proud to support this
work, as it is part of the opensource "virtuous circle" between the
community and the companies, the technology and the commerce.

Hope that helps.  Feel free to contact me off-list if you want to discuss
more.
Regards,
--Matt


On Tue, Mar 12, 2013 at 12:50 PM, Tom Brown <to...@gmail.com> wrote:

> I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).
>
> I noticed that my task tracker processes have multiple "-Xmx" configs
> attached, and that the later ones (128m) were overriding the ones I
> had intended to be used (500m).
>
> After digging through the various scripts, I found that the problem is
> happening because "hadoop-env.sh" is getting invoked multiple times.
> The deb file created a link from "/etc/profile.d/" to hadoop-env.sh,
> so this file is run whenever I log in. The "hadoop" script also
> invokes hadoop-env.sh (via "hadoop-config.sh"). The following sequence
> is causing the problem:
>
> 1. The first time hadoop-env.sh is invoked (when the user logs in),
> HADOOP_CLIENT_OPTS is set to "-Xmx128m ...".
>
> 2. The second time hadoop-env.sh is invoked (when a Hadoop process is
> started), HADOOP_OPTS is set to "... $HADOOP_CLIENT_OPTS" (thereby
> including the memory setting for all Hadoop processes in general)
>
> 3. Also during the second execution, HADOOP_CLIENT_OPTS is recursively
> set to "-Xmx128m $HADOOP_CLIENT_OPTS" (so it now contains "-Xmx128m
> -Xmx128m").
>
> 4. When the actual hadoop process is started, it always includes both
> JAVA_HEAP_SIZE and HADOOP_OPTS (in that order), but since HADOOP_OPTS
> also has a memory setting and is later in the command line, it takes
> precedence.
>
> I couldn't find any bug that matched this, so I thought I'd reach out
> to the community: Is this a known bug? Do the scripts and deb file
> belong to Hadoop in general, or is this the responsibility of a
> specific distribution?
>
> Thanks in advance!
>
> --Tom
>