You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by rajgopalv <ra...@gmail.com> on 2011/01/17 16:01:27 UTC

Multiple directories for hadoop

I have a doubt in configuring dfs.data.dir,

One of my slave has 4 500GB harddisks. They are mounted on different mount
points : /data1 /data2 /data3 /data4
How can i make use of all the 4 harddisks for hdfs data and local jobcahe ? 

if i give comma seperated values for dfs.data.dir, will the total data be
replicated on all the 4 disks? , or the total data will be shared on the 4
disks (without replication) ? 

And, how to increase the space for local jobcache? 
http://www.mail-archive.com/core-user@hadoop.apache.org/msg04346.html
says hadoop.tmp.dir cannot be comma seperated.  But my mapreduce jobs will
eat a lot of jobcache in the local.

How should the configurartion be for the above scenario? 

-- 
View this message in context: http://old.nabble.com/Multiple-directories-for-hadoop-tp30676207p30676207.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Multiple directories for hadoop

Posted by Eric <er...@gmail.com>.

Hadoop knows that these are seperate disks (don't ask me how).
BTW,I would mount under something like /mnt/hadoop/disk[0...n]
In the latest Hadoop release from Cloudera for example, they say this:

"In CDH3 Beta 3, the mapred.system.dir directory must be located inside a
directory that is owned by mapred. For
example, if mapred.system.dir is specified as /mapred/system, then /mapred
must be owned by mapred. Don't,
for example, specify /mrsystem as mapred.system.dir because you don't want /
owned by mapred."


2011/1/18 shixing <pa...@gmail.com>

> you configue is right, such like:
> <property>
>  <name>dfs.data.dir</name>
>  <value>/data1,/data2,/data3,/data4</value>
> </property>
>
> So the hdfs's block data can stored in the dirs above.
>
> If you want local jobcache, you can use DistributedCache to archive the
> data
> as cache.
> See detail
>
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/mapred_tutorial.html#DistributedCache
>
> By the way, this is a hadoop question, not hbase :)
>
> On Mon, Jan 17, 2011 at 11:01 PM, rajgopalv <ra...@gmail.com> wrote:
>
> >
> > I have a doubt in configuring dfs.data.dir,
> >
> > One of my slave has 4 500GB harddisks. They are mounted on different
> mount
> > points : /data1 /data2 /data3 /data4
> > How can i make use of all the 4 harddisks for hdfs data and local jobcahe
> ?
> >
> > if i give comma seperated values for dfs.data.dir, will the total data be
> > replicated on all the 4 disks? , or the total data will be shared on the
> 4
> > disks (without replication) ?
> >
> > And, how to increase the space for local jobcache?
> > http://www.mail-archive.com/core-user@hadoop.apache.org/msg04346.html
> > says hadoop.tmp.dir cannot be comma seperated.  But my mapreduce jobs
> will
> > eat a lot of jobcache in the local.
> >
> > How should the configurartion be for the above scenario?
> >
> > --
> > View this message in context:
> >
> http://old.nabble.com/Multiple-directories-for-hadoop-tp30676207p30676207.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
> >
>
>
> --
> Best wishes!
> My Friend~
>

Re: Multiple directories for hadoop

Posted by shixing <pa...@gmail.com>.

you configue is right, such like:
<property>
  <name>dfs.data.dir</name>
  <value>/data1,/data2,/data3,/data4</value>
</property>

So the hdfs's block data can stored in the dirs above.

If you want local jobcache, you can use DistributedCache to archive the data
as cache.
See detail
http://hadoop.apache.org/mapreduce/docs/r0.21.0/mapred_tutorial.html#DistributedCache

By the way, this is a hadoop question, not hbase :)

On Mon, Jan 17, 2011 at 11:01 PM, rajgopalv <ra...@gmail.com> wrote:

>
> I have a doubt in configuring dfs.data.dir,
>
> One of my slave has 4 500GB harddisks. They are mounted on different mount
> points : /data1 /data2 /data3 /data4
> How can i make use of all the 4 harddisks for hdfs data and local jobcahe ?
>
> if i give comma seperated values for dfs.data.dir, will the total data be
> replicated on all the 4 disks? , or the total data will be shared on the 4
> disks (without replication) ?
>
> And, how to increase the space for local jobcache?
> http://www.mail-archive.com/core-user@hadoop.apache.org/msg04346.html
> says hadoop.tmp.dir cannot be comma seperated.  But my mapreduce jobs will
> eat a lot of jobcache in the local.
>
> How should the configurartion be for the above scenario?
>
> --
> View this message in context:
> http://old.nabble.com/Multiple-directories-for-hadoop-tp30676207p30676207.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>


-- 
Best wishes!
My Friend~