You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Raymond Jennings III <ra...@yahoo.com> on 2010/06/24 17:26:20 UTC
Newbie to HDFS compression
Are there instructions on how to enable (which type?) of compression on hdfs? Does this have to be done during installation or can it be added to a running cluster?
Thanks,
Ray
Re: Newbie to HDFS compression
Posted by Harsh J <qw...@gmail.com>.
On Fri, Jun 25, 2010 at 8:20 AM, James Seigel <ja...@tynt.com> wrote:
> OOps. Replied to wrong email.
>
> Well I should add something useful to the conversation now.
>
> I think LZO has all the right features. However, not great support in Pig if that is what you are using.
There's elephant-bird from Twitter that provides Pig-extensions for
Lzo store/load operations! It *almost* works out of the box (err, git
repository) :)
>
> It is good to have something splittable. LZO - check
>
> Compress intermediate files...this is a no brainer.
>
> Stick with it...it is complicated ( a bit ) to install
>
> Cheers
> J
>
> On 2010-06-24, at 8:45 PM, James Seigel wrote:
>
>> Cool. Maybe we should start a page.
>>
>> J
>> On 2010-06-24, at 8:16 PM, Harsh J wrote:
>>
>>> On Fri, Jun 25, 2010 at 2:42 AM, Raymond Jennings III
>>> <ra...@yahoo.com> wrote:
>>>> Oh, maybe that's what I meant :-) I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities. Looking at the config xml files it's not clear what I need to do. Thanks.
>>>>
>>> LZO Compression is the one you probably read about. Otherwise
>>> available CompressionCodecs are BZip2 and GZip, and you should be able
>>> to use those files just fine.
>>>
>>> Something like FileOutputFormat.setCompressOutput(conf, true);
>>>
>>> (Also look at mapred.output.compress configuration var for
>>> map-output-compression)
>>>>
>>>>
>>>> ----- Original Message ----
>>>> From: Eric Sammer <es...@cloudera.com>
>>>> To: common-user@hadoop.apache.org
>>>> Sent: Thu, June 24, 2010 5:09:33 PM
>>>> Subject: Re: Newbie to HDFS compression
>>>>
>>>> There is no file system level compression in HDFS. You can stored
>>>> compressed files in HDFS, however.
>>>>
>>>> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
>>>> <ra...@yahoo.com> wrote:
>>>>> Are there instructions on how to enable (which type?) of compression on hdfs? Does this have to be done during installation or can it be added to a running cluster?
>>>>>
>>>>> Thanks,
>>>>> Ray
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Eric Sammer
>>>> twitter: esammer
>>>> data: www.cloudera.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>> www.harshj.com
>>
>
>
--
Harsh J
www.harshj.com
Re: Newbie to HDFS compression
Posted by James Seigel <ja...@tynt.com>.
OOps. Replied to wrong email.
Well I should add something useful to the conversation now.
I think LZO has all the right features. However, not great support in Pig if that is what you are using.
It is good to have something splittable. LZO - check
Compress intermediate files...this is a no brainer.
Stick with it...it is complicated ( a bit ) to install
Cheers
J
On 2010-06-24, at 8:45 PM, James Seigel wrote:
> Cool. Maybe we should start a page.
>
> J
> On 2010-06-24, at 8:16 PM, Harsh J wrote:
>
>> On Fri, Jun 25, 2010 at 2:42 AM, Raymond Jennings III
>> <ra...@yahoo.com> wrote:
>>> Oh, maybe that's what I meant :-) I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities. Looking at the config xml files it's not clear what I need to do. Thanks.
>>>
>> LZO Compression is the one you probably read about. Otherwise
>> available CompressionCodecs are BZip2 and GZip, and you should be able
>> to use those files just fine.
>>
>> Something like FileOutputFormat.setCompressOutput(conf, true);
>>
>> (Also look at mapred.output.compress configuration var for
>> map-output-compression)
>>>
>>>
>>> ----- Original Message ----
>>> From: Eric Sammer <es...@cloudera.com>
>>> To: common-user@hadoop.apache.org
>>> Sent: Thu, June 24, 2010 5:09:33 PM
>>> Subject: Re: Newbie to HDFS compression
>>>
>>> There is no file system level compression in HDFS. You can stored
>>> compressed files in HDFS, however.
>>>
>>> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
>>> <ra...@yahoo.com> wrote:
>>>> Are there instructions on how to enable (which type?) of compression on hdfs? Does this have to be done during installation or can it be added to a running cluster?
>>>>
>>>> Thanks,
>>>> Ray
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Eric Sammer
>>> twitter: esammer
>>> data: www.cloudera.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com
>
Re: Newbie to HDFS compression
Posted by James Seigel <ja...@tynt.com>.
Cool. Maybe we should start a page.
J
On 2010-06-24, at 8:16 PM, Harsh J wrote:
> On Fri, Jun 25, 2010 at 2:42 AM, Raymond Jennings III
> <ra...@yahoo.com> wrote:
>> Oh, maybe that's what I meant :-) I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities. Looking at the config xml files it's not clear what I need to do. Thanks.
>>
> LZO Compression is the one you probably read about. Otherwise
> available CompressionCodecs are BZip2 and GZip, and you should be able
> to use those files just fine.
>
> Something like FileOutputFormat.setCompressOutput(conf, true);
>
> (Also look at mapred.output.compress configuration var for
> map-output-compression)
>>
>>
>> ----- Original Message ----
>> From: Eric Sammer <es...@cloudera.com>
>> To: common-user@hadoop.apache.org
>> Sent: Thu, June 24, 2010 5:09:33 PM
>> Subject: Re: Newbie to HDFS compression
>>
>> There is no file system level compression in HDFS. You can stored
>> compressed files in HDFS, however.
>>
>> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
>> <ra...@yahoo.com> wrote:
>>> Are there instructions on how to enable (which type?) of compression on hdfs? Does this have to be done during installation or can it be added to a running cluster?
>>>
>>> Thanks,
>>> Ray
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Eric Sammer
>> twitter: esammer
>> data: www.cloudera.com
>>
>>
>>
>>
>>
>
>
>
> --
> Harsh J
> www.harshj.com
Re: Newbie to HDFS compression
Posted by Harsh J <qw...@gmail.com>.
On Fri, Jun 25, 2010 at 2:42 AM, Raymond Jennings III
<ra...@yahoo.com> wrote:
> Oh, maybe that's what I meant :-) I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities. Looking at the config xml files it's not clear what I need to do. Thanks.
>
LZO Compression is the one you probably read about. Otherwise
available CompressionCodecs are BZip2 and GZip, and you should be able
to use those files just fine.
Something like FileOutputFormat.setCompressOutput(conf, true);
(Also look at mapred.output.compress configuration var for
map-output-compression)
>
>
> ----- Original Message ----
> From: Eric Sammer <es...@cloudera.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, June 24, 2010 5:09:33 PM
> Subject: Re: Newbie to HDFS compression
>
> There is no file system level compression in HDFS. You can stored
> compressed files in HDFS, however.
>
> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
> <ra...@yahoo.com> wrote:
>> Are there instructions on how to enable (which type?) of compression on hdfs? Does this have to be done during installation or can it be added to a running cluster?
>>
>> Thanks,
>> Ray
>>
>>
>>
>>
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>
>
>
>
>
--
Harsh J
www.harshj.com
Re: Newbie to HDFS compression
Posted by Josh Patterson <jo...@cloudera.com>.
Raymond,
LZO installation can be daunting even with the more recent
developments out there;
Most of this information is up at:
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
My quick guide: Installation for RedHat / Centos
- watch out for the various RPMs needed for lzo/2/devel support
- get the native libs in the hadoop/lib subdir from:
http://code.google.com/p/hadoop-gpl-compression/
- double check the permissions on these files; typically a set of "rw
rw r" permissions works well. also check the owner.
- get ant 1.8 to build the git repository if you are building any of the source
- move the lzo.jar into the hadoop/lib subdir
Changes to config: mapred-site.xml (add the following entries)
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.child.env</name>
<value>JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
Changes to Config: core-site.xml
Add these entries:
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
hadoop-env.sh
export HADOOP_CLASSPATH=/usr/lib/hadoop/lib/hadoop-lzo-0.4.3.jar
export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32 (or
the 64bit version)
Usage
for older (deprecated/undeprecated) API to use lzo files as input to a MR job:
conf.setInputFormat( DeprecatedLzoTextInputFormat.class );
Use "lzop" to compress the file
http://www.lzop.org/
To index the file for splitting on input:
In process locally:
hadoop jar /path/to/your/hadoop-lzo.jar
com.hadoop.compression.lzo.LzoIndexer big_file.lzo
On cluster, In MR:
hadoop jar /path/to/your/hadoop-lzo.jar
com.hadoop.compression.lzo.DistributedLzoIndexer
/hdfs/dir/big_file.lzo
To Compress the output of the entire job so that the output file in
hdfs is a LZO compressed file:
TextOutputFormat.setOutputCompressorClass(conf,
com.hadoop.compression.lzo.LzopCodec.class)
TextOutputFormat.setCompressOutput(conf, true);
Josh Patterson
Solutions Architect
Cloudera
On Thu, Jun 24, 2010 at 5:12 PM, Raymond Jennings III
<ra...@yahoo.com> wrote:
>
> Oh, maybe that's what I meant :-) I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities. Looking at the config xml files it's not clear what I need to do. Thanks.
>
>
>
> ----- Original Message ----
> From: Eric Sammer <es...@cloudera.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, June 24, 2010 5:09:33 PM
> Subject: Re: Newbie to HDFS compression
>
> There is no file system level compression in HDFS. You can stored
> compressed files in HDFS, however.
>
> On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
> <ra...@yahoo.com> wrote:
> > Are there instructions on how to enable (which type?) of compression on hdfs? Does this have to be done during installation or can it be added to a running cluster?
> >
> > Thanks,
> > Ray
> >
> >
> >
> >
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>
>
>
>
Re: Newbie to HDFS compression
Posted by Raymond Jennings III <ra...@yahoo.com>.
Oh, maybe that's what I meant :-) I recall reading something on this mail group that "the compression" in not included with the hadoop binary and that you have to get and install it separately due to license incompatibilities. Looking at the config xml files it's not clear what I need to do. Thanks.
----- Original Message ----
From: Eric Sammer <es...@cloudera.com>
To: common-user@hadoop.apache.org
Sent: Thu, June 24, 2010 5:09:33 PM
Subject: Re: Newbie to HDFS compression
There is no file system level compression in HDFS. You can stored
compressed files in HDFS, however.
On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
<ra...@yahoo.com> wrote:
> Are there instructions on how to enable (which type?) of compression on hdfs? Does this have to be done during installation or can it be added to a running cluster?
>
> Thanks,
> Ray
>
>
>
>
--
Eric Sammer
twitter: esammer
data: www.cloudera.com
Re: Newbie to HDFS compression
Posted by Eric Sammer <es...@cloudera.com>.
There is no file system level compression in HDFS. You can stored
compressed files in HDFS, however.
On Thu, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
<ra...@yahoo.com> wrote:
> Are there instructions on how to enable (which type?) of compression on hdfs? Does this have to be done during installation or can it be added to a running cluster?
>
> Thanks,
> Ray
>
>
>
>
--
Eric Sammer
twitter: esammer
data: www.cloudera.com