You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ke xie <oe...@gmail.com> on 2011/03/30 12:08:50 UTC

how to set different hadoop.tmp.dir for each machines?

Hey guys, I'm new here, and recently I'm working on configuring a cluster
with 32 nodes.

However, there are some problems, I describe below

The cluster consists of nodes, which I don't have "root" to configure as I
wish. We only have the space /localhost_name/local space to use.
Thus, we only have

/machine_a/local
/machine_b/local
...

So I guess to set hadoop.tmp.dir=/${HOSTNAME}/local will work, but sadly it
didn't...

Almost all the tutorials online are trying to set hadoop.tmp.dir as a single
path, which assume on each machine the path is the same... but in my case
it's not...

I did some googling... like "hadoop.tmp.dir different"... but no results...

Anybody can help? I'll appreciate that... for i've been working on this
problem for more than 30 hours...

-- 
Name: Ke Xie   Eddy
Research Group of Information Retrieval
State Key Laboratory of Intelligent Technology and Systems
Tsinghua University

Re: how to set different hadoop.tmp.dir for each machines?

Posted by rishi pathak <ma...@gmail.com>.
This might help
http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/24351

See the comment at the last. It was done for mapred.local.dir but I guess
will work for hadoop.tmp.dir also.

On Wed, Mar 30, 2011 at 6:23 PM, modemide <mo...@gmail.com> wrote:

> I'm a little confused as to why you're putting
> /pseg/local /...
> as the location.
>
> Are you sure that you've been given a folder name at the root of the
> drive called /pseg/ ?
> Maybe try to ssh to your server and navigate to your datastore folder,
> then do "pwd".
>
> That should give you the working directory of the datastore.  Use that
> as the value for the tmp datastore location.
>
> Sorry if that seems like a stupid suggestion.  Just trying to get a
> handle on your actual problem.  My linux skillset is limited to the
> basics, so I'm troubleshooting by looking for the type of mistake that
> I would make.
>
> If the above is not the issue, then I'm not sure what the issue could
> be.  But, I'd be glad to continue trying to help (with my limited
> knowledge) :-)
>
>
> On Wed, Mar 30, 2011 at 8:37 AM, ke xie <oe...@gmail.com> wrote:
> > Thank you modemide for your quick response.
> >
> > Sorry for not be clear...your understanding is right.
> > I have a machine, called grande, and the other called pseg. Now i'm using
> > grande as master (by fill the masters file by "grande") and pseg as
> slave.
> >
> > the configuration of grande is (core-site.xml)
> >
> > <property>
> >        <name>fs.default.name</name>
> >        <value>hdfs://grande:8500</value>
> >        <description>The name of the default file system. A URI whose
> scheme
> > and
> >  authority determine the FileSystem implementation. </description>
> > </property>
> > <property>
> >  <name>hadoop.tmp.dir</name>
> >        <value>/grande/local/xieke-cluster/hadoop-tmp-data/</value>
> > <description>. A base for other temporary directories</description>
> > </property>
> >
> > and the configuration of pseg is:
> > <property>
> >        <name>fs.default.name</name>
> >        <value>hdfs://grandonf/8500</value>
> >        <description>The name of the default file system. A URI whose
> scheme
> > and
> >  authority determine the FileSystem implementation. </description>
> > </property>
> > <property>
> >  <name>hadoop.tmp.dir</name>
> >        <value>/pseg/local/xieke-cluster/hadoop-tmp-data/</value>
> > <description> A base for other temporary directories.</description>
> > </property>
> >
> >
> > just the same as your I think?
> >
> > then I run ./bin/hadoop namenode -format to format nodes.
> > and run ./bin/start-all.sh to start machines. but now:
> >
> > grande% ./bin/start-all.sh
> > starting namenode, logging to
> > /grande/local/hadoop/bin/../logs/hadoop-kx19-namenode-grande.out
> > *pseg: /grande/local/hadoop/bin/..: No such file or directory.*
> > grande: starting datanode, logging to
> > /grande/local/hadoop/bin/../logs/hadoop-kx19-datanode-grande.out
> > grande: starting secondarynamenode, logging to
> > /grande/local/hadoop/bin/../logs/hadoop-kx19-secondarynamenode-grande.out
> > starting jobtracker, logging to
> > /grande/local/hadoop/bin/../logs/hadoop-kx19-jobtracker-grande.out
> > pseg: /grande/local/hadoop/bin/..: No such file or directory.
> > grande: starting tasktracker, logging to
> > /grande/local/hadoop/bin/../logs/hadoop-kx19-tasktracker-grande.out
> >
> >
> > Any ideas?
> >
> >
> > On Wed, Mar 30, 2011 at 7:54 PM, modemide <mo...@gmail.com> wrote:
> >
> >> Ok, so if I understand correctly, you want to change the location of
> >> the datastore on individual computers.
> >>
> >> I've tested it on my cluster, and it seems to work.  Just for the sake
> >> of troubleshooting, you didn't mention the following:
> >> 1) Which computer were you editing the files on
> >> 2) which file were you editing?
> >>
> >>
> >>
> ******************************************************************************
> >> Here's my typical DataNode configuration:
> >> Computer: DataNode
> >> FileName: core-site.xml
> >> Contents:
> >> ....
> >> <name>hadoop.tmp.dir</name>
> >> <value>/usr/local/hadoop/datastore/hadoop-${user.name}</value>
> >> ...
> >>
> >>
> ******************************************************************************
> >> Here's the configuration of another DataNode I modified to test what
> >> you were asking:
> >> Computer: DataNode2
> >> FileName: core-site.xml
> >> Contents:
> >> ....
> >> <name>hadoop.tmp.dir</name>
> >> <value>/usr/local/hadoop/ANOTHERDATASTORE/hadoop-${user.name}</value>
> >> ....
> >>
> >>
> ******************************************************************************
> >> Then, I moved datastore to ANOTHERDATASTORE on DataNode1.
> >>
> >> I started my cluster back up, and it worked perfectly.
> >>
> >>
> >> On Wed, Mar 30, 2011 at 6:08 AM, ke xie <oe...@gmail.com> wrote:
> >> > Hey guys, I'm new here, and recently I'm working on configuring a
> cluster
> >> > with 32 nodes.
> >> >
> >> > However, there are some problems, I describe below
> >> >
> >> > The cluster consists of nodes, which I don't have "root" to configure
> as
> >> I
> >> > wish. We only have the space /localhost_name/local space to use.
> >> > Thus, we only have
> >> >
> >> > /machine_a/local
> >> > /machine_b/local
> >> > ...
> >> >
> >> > So I guess to set hadoop.tmp.dir=/${HOSTNAME}/local will work, but
> sadly
> >> it
> >> > didn't...
> >> >
> >> > Almost all the tutorials online are trying to set hadoop.tmp.dir as a
> >> single
> >> > path, which assume on each machine the path is the same... but in my
> case
> >> > it's not...
> >> >
> >> > I did some googling... like "hadoop.tmp.dir different"... but no
> >> results...
> >> >
> >> > Anybody can help? I'll appreciate that... for i've been working on
> this
> >> > problem for more than 30 hours...
> >> >
> >> > --
> >> > Name: Ke Xie   Eddy
> >> > Research Group of Information Retrieval
> >> > State Key Laboratory of Intelligent Technology and Systems
> >> > Tsinghua University
> >> >
> >>
> >
> >
> >
> > --
> > Name: Ke Xie   Eddy
> > Research Group of Information Retrieval
> > State Key Laboratory of Intelligent Technology and Systems
> > Tsinghua University
> >
>



-- 
---
Rishi Pathak
National PARAM Supercomputing Facility
C-DAC, Pune, India

Re: how to set different hadoop.tmp.dir for each machines?

Posted by modemide <mo...@gmail.com>.
I'm a little confused as to why you're putting
/pseg/local /...
as the location.

Are you sure that you've been given a folder name at the root of the
drive called /pseg/ ?
Maybe try to ssh to your server and navigate to your datastore folder,
then do "pwd".

That should give you the working directory of the datastore.  Use that
as the value for the tmp datastore location.

Sorry if that seems like a stupid suggestion.  Just trying to get a
handle on your actual problem.  My linux skillset is limited to the
basics, so I'm troubleshooting by looking for the type of mistake that
I would make.

If the above is not the issue, then I'm not sure what the issue could
be.  But, I'd be glad to continue trying to help (with my limited
knowledge) :-)


On Wed, Mar 30, 2011 at 8:37 AM, ke xie <oe...@gmail.com> wrote:
> Thank you modemide for your quick response.
>
> Sorry for not be clear...your understanding is right.
> I have a machine, called grande, and the other called pseg. Now i'm using
> grande as master (by fill the masters file by "grande") and pseg as slave.
>
> the configuration of grande is (core-site.xml)
>
> <property>
>        <name>fs.default.name</name>
>        <value>hdfs://grande:8500</value>
>        <description>The name of the default file system. A URI whose scheme
> and
>  authority determine the FileSystem implementation. </description>
> </property>
> <property>
>  <name>hadoop.tmp.dir</name>
>        <value>/grande/local/xieke-cluster/hadoop-tmp-data/</value>
> <description>. A base for other temporary directories</description>
> </property>
>
> and the configuration of pseg is:
> <property>
>        <name>fs.default.name</name>
>        <value>hdfs://grandonf/8500</value>
>        <description>The name of the default file system. A URI whose scheme
> and
>  authority determine the FileSystem implementation. </description>
> </property>
> <property>
>  <name>hadoop.tmp.dir</name>
>        <value>/pseg/local/xieke-cluster/hadoop-tmp-data/</value>
> <description> A base for other temporary directories.</description>
> </property>
>
>
> just the same as your I think?
>
> then I run ./bin/hadoop namenode -format to format nodes.
> and run ./bin/start-all.sh to start machines. but now:
>
> grande% ./bin/start-all.sh
> starting namenode, logging to
> /grande/local/hadoop/bin/../logs/hadoop-kx19-namenode-grande.out
> *pseg: /grande/local/hadoop/bin/..: No such file or directory.*
> grande: starting datanode, logging to
> /grande/local/hadoop/bin/../logs/hadoop-kx19-datanode-grande.out
> grande: starting secondarynamenode, logging to
> /grande/local/hadoop/bin/../logs/hadoop-kx19-secondarynamenode-grande.out
> starting jobtracker, logging to
> /grande/local/hadoop/bin/../logs/hadoop-kx19-jobtracker-grande.out
> pseg: /grande/local/hadoop/bin/..: No such file or directory.
> grande: starting tasktracker, logging to
> /grande/local/hadoop/bin/../logs/hadoop-kx19-tasktracker-grande.out
>
>
> Any ideas?
>
>
> On Wed, Mar 30, 2011 at 7:54 PM, modemide <mo...@gmail.com> wrote:
>
>> Ok, so if I understand correctly, you want to change the location of
>> the datastore on individual computers.
>>
>> I've tested it on my cluster, and it seems to work.  Just for the sake
>> of troubleshooting, you didn't mention the following:
>> 1) Which computer were you editing the files on
>> 2) which file were you editing?
>>
>>
>> ******************************************************************************
>> Here's my typical DataNode configuration:
>> Computer: DataNode
>> FileName: core-site.xml
>> Contents:
>> ....
>> <name>hadoop.tmp.dir</name>
>> <value>/usr/local/hadoop/datastore/hadoop-${user.name}</value>
>> ...
>>
>> ******************************************************************************
>> Here's the configuration of another DataNode I modified to test what
>> you were asking:
>> Computer: DataNode2
>> FileName: core-site.xml
>> Contents:
>> ....
>> <name>hadoop.tmp.dir</name>
>> <value>/usr/local/hadoop/ANOTHERDATASTORE/hadoop-${user.name}</value>
>> ....
>>
>> ******************************************************************************
>> Then, I moved datastore to ANOTHERDATASTORE on DataNode1.
>>
>> I started my cluster back up, and it worked perfectly.
>>
>>
>> On Wed, Mar 30, 2011 at 6:08 AM, ke xie <oe...@gmail.com> wrote:
>> > Hey guys, I'm new here, and recently I'm working on configuring a cluster
>> > with 32 nodes.
>> >
>> > However, there are some problems, I describe below
>> >
>> > The cluster consists of nodes, which I don't have "root" to configure as
>> I
>> > wish. We only have the space /localhost_name/local space to use.
>> > Thus, we only have
>> >
>> > /machine_a/local
>> > /machine_b/local
>> > ...
>> >
>> > So I guess to set hadoop.tmp.dir=/${HOSTNAME}/local will work, but sadly
>> it
>> > didn't...
>> >
>> > Almost all the tutorials online are trying to set hadoop.tmp.dir as a
>> single
>> > path, which assume on each machine the path is the same... but in my case
>> > it's not...
>> >
>> > I did some googling... like "hadoop.tmp.dir different"... but no
>> results...
>> >
>> > Anybody can help? I'll appreciate that... for i've been working on this
>> > problem for more than 30 hours...
>> >
>> > --
>> > Name: Ke Xie   Eddy
>> > Research Group of Information Retrieval
>> > State Key Laboratory of Intelligent Technology and Systems
>> > Tsinghua University
>> >
>>
>
>
>
> --
> Name: Ke Xie   Eddy
> Research Group of Information Retrieval
> State Key Laboratory of Intelligent Technology and Systems
> Tsinghua University
>

Re: how to set different hadoop.tmp.dir for each machines?

Posted by ke xie <oe...@gmail.com>.
Thank you modemide for your quick response.

Sorry for not be clear...your understanding is right.
I have a machine, called grande, and the other called pseg. Now i'm using
grande as master (by fill the masters file by "grande") and pseg as slave.

the configuration of grande is (core-site.xml)

<property>
        <name>fs.default.name</name>
        <value>hdfs://grande:8500</value>
        <description>The name of the default file system. A URI whose scheme
and
 authority determine the FileSystem implementation. </description>
</property>
<property>
  <name>hadoop.tmp.dir</name>
        <value>/grande/local/xieke-cluster/hadoop-tmp-data/</value>
<description>. A base for other temporary directories</description>
</property>

and the configuration of pseg is:
<property>
        <name>fs.default.name</name>
        <value>hdfs://grandonf/8500</value>
        <description>The name of the default file system. A URI whose scheme
and
 authority determine the FileSystem implementation. </description>
</property>
<property>
  <name>hadoop.tmp.dir</name>
        <value>/pseg/local/xieke-cluster/hadoop-tmp-data/</value>
<description> A base for other temporary directories.</description>
</property>


just the same as your I think?

then I run ./bin/hadoop namenode -format to format nodes.
and run ./bin/start-all.sh to start machines. but now:

grande% ./bin/start-all.sh
starting namenode, logging to
/grande/local/hadoop/bin/../logs/hadoop-kx19-namenode-grande.out
*pseg: /grande/local/hadoop/bin/..: No such file or directory.*
grande: starting datanode, logging to
/grande/local/hadoop/bin/../logs/hadoop-kx19-datanode-grande.out
grande: starting secondarynamenode, logging to
/grande/local/hadoop/bin/../logs/hadoop-kx19-secondarynamenode-grande.out
starting jobtracker, logging to
/grande/local/hadoop/bin/../logs/hadoop-kx19-jobtracker-grande.out
pseg: /grande/local/hadoop/bin/..: No such file or directory.
grande: starting tasktracker, logging to
/grande/local/hadoop/bin/../logs/hadoop-kx19-tasktracker-grande.out


Any ideas?


On Wed, Mar 30, 2011 at 7:54 PM, modemide <mo...@gmail.com> wrote:

> Ok, so if I understand correctly, you want to change the location of
> the datastore on individual computers.
>
> I've tested it on my cluster, and it seems to work.  Just for the sake
> of troubleshooting, you didn't mention the following:
> 1) Which computer were you editing the files on
> 2) which file were you editing?
>
>
> ******************************************************************************
> Here's my typical DataNode configuration:
> Computer: DataNode
> FileName: core-site.xml
> Contents:
> ....
> <name>hadoop.tmp.dir</name>
> <value>/usr/local/hadoop/datastore/hadoop-${user.name}</value>
> ...
>
> ******************************************************************************
> Here's the configuration of another DataNode I modified to test what
> you were asking:
> Computer: DataNode2
> FileName: core-site.xml
> Contents:
> ....
> <name>hadoop.tmp.dir</name>
> <value>/usr/local/hadoop/ANOTHERDATASTORE/hadoop-${user.name}</value>
> ....
>
> ******************************************************************************
> Then, I moved datastore to ANOTHERDATASTORE on DataNode1.
>
> I started my cluster back up, and it worked perfectly.
>
>
> On Wed, Mar 30, 2011 at 6:08 AM, ke xie <oe...@gmail.com> wrote:
> > Hey guys, I'm new here, and recently I'm working on configuring a cluster
> > with 32 nodes.
> >
> > However, there are some problems, I describe below
> >
> > The cluster consists of nodes, which I don't have "root" to configure as
> I
> > wish. We only have the space /localhost_name/local space to use.
> > Thus, we only have
> >
> > /machine_a/local
> > /machine_b/local
> > ...
> >
> > So I guess to set hadoop.tmp.dir=/${HOSTNAME}/local will work, but sadly
> it
> > didn't...
> >
> > Almost all the tutorials online are trying to set hadoop.tmp.dir as a
> single
> > path, which assume on each machine the path is the same... but in my case
> > it's not...
> >
> > I did some googling... like "hadoop.tmp.dir different"... but no
> results...
> >
> > Anybody can help? I'll appreciate that... for i've been working on this
> > problem for more than 30 hours...
> >
> > --
> > Name: Ke Xie   Eddy
> > Research Group of Information Retrieval
> > State Key Laboratory of Intelligent Technology and Systems
> > Tsinghua University
> >
>



-- 
Name: Ke Xie   Eddy
Research Group of Information Retrieval
State Key Laboratory of Intelligent Technology and Systems
Tsinghua University

Re: how to set different hadoop.tmp.dir for each machines?

Posted by modemide <mo...@gmail.com>.
Ok, so if I understand correctly, you want to change the location of
the datastore on individual computers.

I've tested it on my cluster, and it seems to work.  Just for the sake
of troubleshooting, you didn't mention the following:
1) Which computer were you editing the files on
2) which file were you editing?

******************************************************************************
Here's my typical DataNode configuration:
Computer: DataNode
FileName: core-site.xml
Contents:
....
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/datastore/hadoop-${user.name}</value>
...
******************************************************************************
Here's the configuration of another DataNode I modified to test what
you were asking:
Computer: DataNode2
FileName: core-site.xml
Contents:
....
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/ANOTHERDATASTORE/hadoop-${user.name}</value>
....
******************************************************************************
Then, I moved datastore to ANOTHERDATASTORE on DataNode1.

I started my cluster back up, and it worked perfectly.


On Wed, Mar 30, 2011 at 6:08 AM, ke xie <oe...@gmail.com> wrote:
> Hey guys, I'm new here, and recently I'm working on configuring a cluster
> with 32 nodes.
>
> However, there are some problems, I describe below
>
> The cluster consists of nodes, which I don't have "root" to configure as I
> wish. We only have the space /localhost_name/local space to use.
> Thus, we only have
>
> /machine_a/local
> /machine_b/local
> ...
>
> So I guess to set hadoop.tmp.dir=/${HOSTNAME}/local will work, but sadly it
> didn't...
>
> Almost all the tutorials online are trying to set hadoop.tmp.dir as a single
> path, which assume on each machine the path is the same... but in my case
> it's not...
>
> I did some googling... like "hadoop.tmp.dir different"... but no results...
>
> Anybody can help? I'll appreciate that... for i've been working on this
> problem for more than 30 hours...
>
> --
> Name: Ke Xie   Eddy
> Research Group of Information Retrieval
> State Key Laboratory of Intelligent Technology and Systems
> Tsinghua University
>