You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Sean Bigdatafun <se...@gmail.com> on 2010/06/24 08:53:30 UTC

dfs.data.dir and "hadoop namenode -format"

Can someone tell me what "hadoop namenode -format" does under the hood?

I have started my HDFS cell with the following configuration.
-------------------
 dfs.data.dir
    /opt/hadoop/data
--------------------

Overtime, I want to add another directory as the data.dir, how can I achieve
it?

1) Can I simply edit "dfs.data.dir" in the hdfs-site.xml without stopping my
cell?

2) If 1) is not legitimate, can I run "stop-dfs.sh", then do 1) and then
"start-dfs.sh"?

3) My last question here is what "hadoop namenode -format" does. If I run it
on my Namenode, does it clean up the data.dir? and do I need to manually
clean up the data.dir on Datanode?

Thanks,
Sean

Re: dfs.data.dir and "hadoop namenode -format"

Posted by Harsh J <qw...@gmail.com>.

On Thu, Jun 24, 2010 at 9:09 PM, Sean Bigdatafun
<se...@gmail.com> wrote:
>
>
> On Thu, Jun 24, 2010 at 8:20 AM, Jeff Whiting <je...@qualtrics.com> wrote:
>>
>> 1) You have to start and restart
>> 2) Yes that is how it would get the new directory.  However changing the
>> directory means it wont be able to find any of your old data.  If you don't
>> want to start over from scratch you will want to stop dfs then copy the
>> files over to the new data directory and then restart it.
>
> what about I simply add additional directory? i.e., leave the previous one
> in the list. will that make the old data lost?
No.
> From:
> -------------------
>  dfs.data.dir
>     /opt/hadoop/data
> --------------------
>
> To:
> -------------------
>  dfs.data.dir
>     /opt/hadoop/data, /home/sean/hadoopdata
> --------------------
>
>>
>> 3) Doing the namenode -format will clean up the directory on the name node
>> and make everything good to go with no data in dfs.  However you'll have to
>> go to the datanodes and clean out their data directories.   If you leave
>> data in the directory on the datanodes they will be unable to join with the
>> namenode.
>>
>> This isn't the most technical explanation but hopefully it helps.
>> ~Jeff
>>
>> Sean Bigdatafun wrote:
>>>
>>> Can someone tell me what "hadoop namenode -format" does under the hood?
>>>
>>> I have started my HDFS cell with the following configuration.
>>> -------------------
>>>  dfs.data.dir
>>>    /opt/hadoop/data
>>> --------------------
>>>
>>> Overtime, I want to add another directory as the data.dir, how can I
>>> achieve it?
>>> 1) Can I simply edit "dfs.data.dir" in the hdfs-site.xml without stopping
>>> my cell?
>>>
>>> 2) If 1) is not legitimate, can I run "stop-dfs.sh", then do 1) and then
>>> "start-dfs.sh"?
>>>
>>> 3) My last question here is what "hadoop namenode -format" does. If I run
>>> it on my Namenode, does it clean up the data.dir? and do I need to manually
>>> clean up the data.dir on Datanode?
>>>
>>> Thanks,
>>> Sean
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Jeff Whiting
>> Qualtrics Senior Software Engineer
>> jeffw@qualtrics.com
>>
>
>



-- 
Harsh J
www.harshj.com

Re: dfs.data.dir and "hadoop namenode -format"

Posted by Sean Bigdatafun <se...@gmail.com>.

On Thu, Jun 24, 2010 at 8:20 AM, Jeff Whiting <je...@qualtrics.com> wrote:

> 1) You have to start and restart
> 2) Yes that is how it would get the new directory.  However changing the
> directory means it wont be able to find any of your old data.  If you don't
> want to start over from scratch you will want to stop dfs then copy the
> files over to the new data directory and then restart it.
>
what about I simply add additional directory? i.e., leave the previous one
in the list. will that make the old data lost?

From:
-------------------
 dfs.data.dir
    /opt/hadoop/data
--------------------


To:
-------------------
 dfs.data.dir
    /opt/hadoop/data, /home/sean/hadoopdata
--------------------



> 3) Doing the namenode -format will clean up the directory on the name node
> and make everything good to go with no data in dfs.  However you'll have to
> go to the datanodes and clean out their data directories.   If you leave
> data in the directory on the datanodes they will be unable to join with the
> namenode.
>
> This isn't the most technical explanation but hopefully it helps.
> ~Jeff
>
>
> Sean Bigdatafun wrote:
>
>> Can someone tell me what "hadoop namenode -format" does under the hood?
>>
>> I have started my HDFS cell with the following configuration.
>> -------------------
>>  dfs.data.dir
>>    /opt/hadoop/data
>> --------------------
>>
>> Overtime, I want to add another directory as the data.dir, how can I
>> achieve it?
>> 1) Can I simply edit "dfs.data.dir" in the hdfs-site.xml without stopping
>> my cell?
>>
>> 2) If 1) is not legitimate, can I run "stop-dfs.sh", then do 1) and then
>> "start-dfs.sh"?
>>
>> 3) My last question here is what "hadoop namenode -format" does. If I run
>> it on my Namenode, does it clean up the data.dir? and do I need to manually
>> clean up the data.dir on Datanode?
>>
>> Thanks,
>> Sean
>>
>>
>>
>>
>>
>>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> jeffw@qualtrics.com
>
>

Re: dfs.data.dir and "hadoop namenode -format"

Posted by Jeff Whiting <je...@qualtrics.com>.

1) You have to start and restart
2) Yes that is how it would get the new directory.  However changing the 
directory means it wont be able to find any of your old data.  If you 
don't want to start over from scratch you will want to stop dfs then 
copy the files over to the new data directory and then restart it.
3) Doing the namenode -format will clean up the directory on the name 
node and make everything good to go with no data in dfs.  However you'll 
have to go to the datanodes and clean out their data directories.   If 
you leave data in the directory on the datanodes they will be unable to 
join with the namenode.

This isn't the most technical explanation but hopefully it helps.
~Jeff

Sean Bigdatafun wrote:
> Can someone tell me what "hadoop namenode -format" does under the hood?
>
> I have started my HDFS cell with the following configuration.
> -------------------
>  dfs.data.dir
>     /opt/hadoop/data
> --------------------
>
> Overtime, I want to add another directory as the data.dir, how can I 
> achieve it? 
>
> 1) Can I simply edit "dfs.data.dir" in the hdfs-site.xml without 
> stopping my cell?
>
> 2) If 1) is not legitimate, can I run "stop-dfs.sh", then do 1) and 
> then "start-dfs.sh"?
>
> 3) My last question here is what "hadoop namenode -format" does. If I 
> run it on my Namenode, does it clean up the data.dir? and do I need to 
> manually clean up the data.dir on Datanode?
>
> Thanks,
> Sean
>
>
>
>  
>
>

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com

Re: Newbie to HDFS compression

Posted by Harsh J <qw...@gmail.com>.

On Fri, Jun 25, 2010 at 8:20 AM, James Seigel <ja...@tynt.com> wrote:
> OOps.  Replied to wrong email.
>
> Well I should add something useful to the conversation now.
>
> I think LZO has all the right features.  However, not great support in Pig if that is what you are using.
There's elephant-bird from Twitter that provides Pig-extensions for
Lzo store/load operations! It *almost* works out of the box (err, git
repository) :)
>
> It is good to have something splittable.  LZO - check
>
> Compress intermediate files...this is a no brainer.
>
> Stick with it...it is complicated ( a bit )  to install
>
> Cheers
> J
>
> On 2010-06-24, at 8:45 PM, James Seigel wrote:
>
>> Cool.  Maybe we should start a page.
>>
>> J
>> On 2010-06-24, at 8:16 PM, Harsh J wrote:
>>
>>> On Fri, Jun 25, 2010 at 2:42 AM, Raymond Jennings III
>>> <ra...@yahoo.com> wrote:
>>>> Oh, maybe that's what I meant :-)  I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities.  Looking at the config xml files it's not clear what I need to do.  Thanks.
>>>>
>>> LZO Compression is the one you probably read about. Otherwise
>>> available CompressionCodecs are BZip2 and GZip, and you should be able
>>> to use those files just fine.
>>>
>>> Something like FileOutputFormat.setCompressOutput(conf, true);
>>>
>>> (Also look at mapred.output.compress configuration var for
>>> map-output-compression)
>>>>
>>>>
>>>> ----- Original Message ----
>>>> From: Eric Sammer <es...@cloudera.com>
>>>> To: common-user@hadoop.apache.org
>>>> Sent: Thu, June 24, 2010 5:09:33 PM
>>>> Subject: Re: Newbie to HDFS compression
>>>>
>>>> There is no file system level compression in HDFS. You can stored
>>>> compressed files in HDFS, however.
>>>>
>>>> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
>>>> <ra...@yahoo.com> wrote:
>>>>> Are there instructions on how to enable (which type?) of compression on hdfs?  Does this have to be done during installation or can it be added to a running cluster?
>>>>>
>>>>> Thanks,
>>>>> Ray
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Eric Sammer
>>>> twitter: esammer
>>>> data: www.cloudera.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>> www.harshj.com
>>
>
>



-- 
Harsh J
www.harshj.com

Re: Newbie to HDFS compression

Posted by James Seigel <ja...@tynt.com>.

OOps.  Replied to wrong email.  

Well I should add something useful to the conversation now.

I think LZO has all the right features.  However, not great support in Pig if that is what you are using.

It is good to have something splittable.  LZO - check

Compress intermediate files...this is a no brainer.

Stick with it...it is complicated ( a bit )  to install

Cheers
J

On 2010-06-24, at 8:45 PM, James Seigel wrote:

> Cool.  Maybe we should start a page.
> 
> J
> On 2010-06-24, at 8:16 PM, Harsh J wrote:
> 
>> On Fri, Jun 25, 2010 at 2:42 AM, Raymond Jennings III
>> <ra...@yahoo.com> wrote:
>>> Oh, maybe that's what I meant :-)  I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities.  Looking at the config xml files it's not clear what I need to do.  Thanks.
>>> 
>> LZO Compression is the one you probably read about. Otherwise
>> available CompressionCodecs are BZip2 and GZip, and you should be able
>> to use those files just fine.
>> 
>> Something like FileOutputFormat.setCompressOutput(conf, true);
>> 
>> (Also look at mapred.output.compress configuration var for
>> map-output-compression)
>>> 
>>> 
>>> ----- Original Message ----
>>> From: Eric Sammer <es...@cloudera.com>
>>> To: common-user@hadoop.apache.org
>>> Sent: Thu, June 24, 2010 5:09:33 PM
>>> Subject: Re: Newbie to HDFS compression
>>> 
>>> There is no file system level compression in HDFS. You can stored
>>> compressed files in HDFS, however.
>>> 
>>> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
>>> <ra...@yahoo.com> wrote:
>>>> Are there instructions on how to enable (which type?) of compression on hdfs?  Does this have to be done during installation or can it be added to a running cluster?
>>>> 
>>>> Thanks,
>>>> Ray
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Eric Sammer
>>> twitter: esammer
>>> data: www.cloudera.com
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Harsh J
>> www.harshj.com
>

Re: Newbie to HDFS compression

Posted by James Seigel <ja...@tynt.com>.

Cool.  Maybe we should start a page.

J
On 2010-06-24, at 8:16 PM, Harsh J wrote:

> On Fri, Jun 25, 2010 at 2:42 AM, Raymond Jennings III
> <ra...@yahoo.com> wrote:
>> Oh, maybe that's what I meant :-)  I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities.  Looking at the config xml files it's not clear what I need to do.  Thanks.
>> 
> LZO Compression is the one you probably read about. Otherwise
> available CompressionCodecs are BZip2 and GZip, and you should be able
> to use those files just fine.
> 
> Something like FileOutputFormat.setCompressOutput(conf, true);
> 
> (Also look at mapred.output.compress configuration var for
> map-output-compression)
>> 
>> 
>> ----- Original Message ----
>> From: Eric Sammer <es...@cloudera.com>
>> To: common-user@hadoop.apache.org
>> Sent: Thu, June 24, 2010 5:09:33 PM
>> Subject: Re: Newbie to HDFS compression
>> 
>> There is no file system level compression in HDFS. You can stored
>> compressed files in HDFS, however.
>> 
>> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
>> <ra...@yahoo.com> wrote:
>>> Are there instructions on how to enable (which type?) of compression on hdfs?  Does this have to be done during installation or can it be added to a running cluster?
>>> 
>>> Thanks,
>>> Ray
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Eric Sammer
>> twitter: esammer
>> data: www.cloudera.com
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> www.harshj.com

Re: Newbie to HDFS compression

Posted by Harsh J <qw...@gmail.com>.

On Fri, Jun 25, 2010 at 2:42 AM, Raymond Jennings III
<ra...@yahoo.com> wrote:
> Oh, maybe that's what I meant :-)  I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities.  Looking at the config xml files it's not clear what I need to do.  Thanks.
>
LZO Compression is the one you probably read about. Otherwise
available CompressionCodecs are BZip2 and GZip, and you should be able
to use those files just fine.

Something like FileOutputFormat.setCompressOutput(conf, true);

(Also look at mapred.output.compress configuration var for
map-output-compression)
>
>
> ----- Original Message ----
> From: Eric Sammer <es...@cloudera.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, June 24, 2010 5:09:33 PM
> Subject: Re: Newbie to HDFS compression
>
> There is no file system level compression in HDFS. You can stored
> compressed files in HDFS, however.
>
> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
> <ra...@yahoo.com> wrote:
>> Are there instructions on how to enable (which type?) of compression on hdfs?  Does this have to be done during installation or can it be added to a running cluster?
>>
>> Thanks,
>> Ray
>>
>>
>>
>>
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>
>
>
>
>



-- 
Harsh J
www.harshj.com

Re: Newbie to HDFS compression

Posted by Josh Patterson <jo...@cloudera.com>.

Raymond,

LZO installation can be daunting even with the more recent
developments out there;

Most of this information is up at:

http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ

My quick guide: Installation for RedHat / Centos

- watch out for the various RPMs needed for lzo/2/devel support
- get the native libs in the hadoop/lib subdir from:
http://code.google.com/p/hadoop-gpl-compression/
- double check the permissions on these files; typically a set of "rw
rw r" permissions works well. also check the owner.
- get ant 1.8 to build the git repository if you are building any of the source
- move the lzo.jar into the hadoop/lib subdir


Changes to config: mapred-site.xml (add the following entries)

  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
  </property>

  <property>
    <name>mapred.child.env</name>
    <value>JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native</value>
  </property>

  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>


Changes to Config: core-site.xml

Add these entries:

<property>
    <name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
  </property>
  <property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>



hadoop-env.sh

export HADOOP_CLASSPATH=/usr/lib/hadoop/lib/hadoop-lzo-0.4.3.jar
export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32 (or
the 64bit version)

Usage

for older (deprecated/undeprecated) API to use lzo files as input to a MR job:

conf.setInputFormat( DeprecatedLzoTextInputFormat.class );

Use "lzop" to compress the file

http://www.lzop.org/

To index the file for splitting on input:

In process locally:

hadoop jar /path/to/your/hadoop-lzo.jar
com.hadoop.compression.lzo.LzoIndexer big_file.lzo

On cluster, In MR:

hadoop jar /path/to/your/hadoop-lzo.jar
com.hadoop.compression.lzo.DistributedLzoIndexer
/hdfs/dir/big_file.lzo

To Compress the output of the entire job so that the output file in
hdfs is a LZO compressed file:

TextOutputFormat.setOutputCompressorClass(conf,
com.hadoop.compression.lzo.LzopCodec.class)
TextOutputFormat.setCompressOutput(conf, true);


Josh Patterson

Solutions Architect
Cloudera

On Thu, Jun 24, 2010 at 5:12 PM, Raymond Jennings III
<ra...@yahoo.com> wrote:
>
> Oh, maybe that's what I meant :-)  I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities.  Looking at the config xml files it's not clear what I need to do.  Thanks.
>
>
>
> ----- Original Message ----
> From: Eric Sammer <es...@cloudera.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, June 24, 2010 5:09:33 PM
> Subject: Re: Newbie to HDFS compression
>
> There is no file system level compression in HDFS. You can stored
> compressed files in HDFS, however.
>
> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
> <ra...@yahoo.com> wrote:
> > Are there instructions on how to enable (which type?) of compression on hdfs?  Does this have to be done during installation or can it be added to a running cluster?
> >
> > Thanks,
> > Ray
> >
> >
> >
> >
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>
>
>
>

Re: Newbie to HDFS compression

Posted by Raymond Jennings III <ra...@yahoo.com>.

Oh, maybe that's what I meant :-)  I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities.  Looking at the config xml files it's not clear what I need to do.  Thanks.

----- Original Message ----
From: Eric Sammer <es...@cloudera.com>
To: common-user@hadoop.apache.org
Sent: Thu, June 24, 2010 5:09:33 PM
Subject: Re: Newbie to HDFS compression

There is no file system level compression in HDFS. You can stored
compressed files in HDFS, however.

On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
<ra...@yahoo.com> wrote:
> Are there instructions on how to enable (which type?) of compression on hdfs?  Does this have to be done during installation or can it be added to a running cluster?
>
> Thanks,
> Ray
>
>
>
>

-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Re: Newbie to HDFS compression

Posted by Eric Sammer <es...@cloudera.com>.

There is no file system level compression in HDFS. You can stored
compressed files in HDFS, however.

On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
<ra...@yahoo.com> wrote:
> Are there instructions on how to enable (which type?) of compression on hdfs?  Does this have to be done during installation or can it be added to a running cluster?
>
> Thanks,
> Ray
>
>
>
>



-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Newbie to HDFS compression

Posted by Raymond Jennings III <ra...@yahoo.com>.

Are there instructions on how to enable (which type?) of compression on hdfs?  Does this have to be done during installation or can it be added to a running cluster?

Thanks,
Ray

dfs.data.dir and "hadoop namenode -format"

Posted by Sean Bigdatafun <se...@gmail.com>.

---------- Forwarded message ----------
From: Sean Bigdatafun <se...@gmail.com>
Date: Wed, Jun 23, 2010 at 11:53 PM
Subject: dfs.data.dir and "hadoop namenode -format"
To: hdfs-user@hadoop.apache.org

Can someone tell me what "hadoop namenode -format" does under the hood?

I have started my HDFS cell with the following configuration.
-------------------
 dfs.data.dir
    /opt/hadoop/data
--------------------

Overtime, I want to add another directory as the data.dir, how can I achieve
it?

1) Can I simply edit "dfs.data.dir" in the hdfs-site.xml without stopping my
cell?

2) If 1) is not legitimate, can I run "stop-dfs.sh", then do 1) and then
"start-dfs.sh"?

3) My last question here is what "hadoop namenode -format" does. If I run it
on my Namenode, does it clean up the data.dir? and do I need to manually
clean up the data.dir on Datanode?

Thanks,
Sean