You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Vikas Srivastava <vi...@one97.net> on 2011/07/25 13:04:43 UTC

Lzo Compression

Hey ,

i just want to use any compression in hadoop so i heard about lzo which is
best among all the compression (after snappy)

please any1 tell me who is already using any kind of compression in hadoop
0.20.2



-- 
With Regards
Vikas Srivastava

DWH & Analytics Team
Mob:+91 9560885900
One97 | Let's get talking !

Re: Lzo Compression

Posted by Ankit Jain <an...@gmail.com>.
I have used hadoop-0.20.2 version of hadoop.

On Wed, Jul 27, 2011 at 1:29 PM, Koert Kuipers <ko...@tresata.com> wrote:

> i just tried your step 10 and 11 on my setup and it works. See my earlier
> message about my setup.
>
>
> On Wed, Jul 27, 2011 at 9:37 AM, Ankit Jain <an...@gmail.com>wrote:
>
>> Hi all,
>> I tried to index the lzo file but got the following error while indexing
>> the lzo file :
>>
>> java.lang.ClassCastException:
>> com.hadoop.compression.lzo.LzopCodec$LzopDecompressor cannot be cast to
>> com.hadoop.compression.lzo.LzopDecompressor
>>
>> I have performed following steps:-
>>
>> 1. $sudo apt-get install liblzo2-dev
>> 2. Download the Hadoop-lzo clone from github repo (
>> https://github.com/kevinweil/hadoop-lzo )
>> 3. Build the Hadoop-lzo project.
>> 4. Copy the hadoop-lzo-*.jar file at $HADOOP_HOME/lib dir of cluster nodes
>> 5. Copy the hadoop-lzo-install-dir/build/hadoop-lzo-*/native library at
>> $HADOOP_HOME/lib dir of cluster nodes.
>> 6. core-site.xml:
>>
>>             <property>
>>             <name>io.compression.codecs</name>
>>         <value>org.apache.hadoop.io.compress.GzipCodec
>>             ,org.apache.hadoop.io.compress.DefaultCodec,
>>             com.hadoop.compression.lzo.LzoCodec,
>>             com.hadoop.compression.lzo.LzopCodec,
>>             org.apache.hadoop.io.compress.BZip2Codec
>>         </value>
>>       </property>
>>       <property>
>>             <name>io.compression.codec.lzo.class</name>
>>             <value>com.hadoop.compression.lzo.LzoCodec</value>
>>       </property>
>> 7. mapred-site.xml:
>>
>>     <property>
>>                     <name>mapred.child.env</name>
>>
>> <value>JAVA_LIBRARY_PATH=/opt/ladap/common/hadoop-0.20.2/lib/native/Linux-i386-32/*</value>
>>
>>             </property>
>>
>>           <property>
>>                 <name>mapred.map.output.compression.codec</name>
>>                 <value>com.hadoop.compression.lzo.LzoCodec</value>
>>           </property>
>> 8. hadoop-env.sh
>>
>> export
>> HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/ankit/hadoop-0.20.1/lib/hadoop-lzo-0.4.12.jar
>>     export
>> JAVA_LIBRARY_PATH=/home/ankit/hadoop-0.20.1/lib/native/Linux-i386-32/
>>
>> 9. Restart the cluster
>>
>> 10. uploaded lzo file into hdfs
>>
>> 11. Runned the following command for indexing:
>> bin/hadoop jar path/to/hadoop-lzo-*.jar
>> com.hadoop.compression.lzo.LzoIndexer lzofile.lzo
>>
>>
>>
>>
>> On Tue, Jul 26, 2011 at 1:39 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> my installation notes for lzo-hadoop (might be wrong or incomplete):
>>>
>>> we run centos 5.6 and cdh3
>>>
>>> yum -y install lzo
>>> git checkout https://github.com/toddlipcon/hadoop-lzo.git
>>> cd hadoop-lzo
>>> ant
>>> cd build
>>> cp hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar /usr/lib/hadoop/lib
>>> cp -r hadoop-lzo-0.4.10/lib/native /usr/lib/hadoop/lib
>>>
>>>
>>> in core.site.xml:
>>>  <property>
>>>     <name>io.compression.codecs</name>
>>>
>>> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
>>>     <final>true</final>
>>>   </property>
>>>
>>>   <property>
>>>     <name>io.compression.codec.lzo.class</name>
>>>     <value>com.hadoop.compression.lzo.LzoCodec</value>
>>>     <final>true</final>
>>>   </property>
>>>
>>>
>>> in mapred.site.xml:
>>>   <property>
>>>     <name>mapred.compress.map.output</name>
>>>     <value>true</value>
>>>     <final>false</final>
>>>   </property>
>>>
>>>   <property>
>>>     <name>mapred.map.output.compression.codec</name>
>>>     <value>com.hadoop.compression.lzo.LzoCodec</value>
>>>     <final>false</final>
>>>   </property>
>>>
>>>   <property>
>>>     <name>mapred.output.compress</name>
>>>     <value>true</value>
>>>     <final>false</final>
>>>   </property>
>>>
>>>   <property>
>>>     <name>mapred.output.compression.codec</name>
>>>     <value>com.hadoop.compression.lzo.LzoCodec</value>
>>>     <final>false</final>
>>>   </property>
>>>
>>>   <property>
>>>     <name>mapred.output.compression.type</name>
>>>     <value>BLOCK</value>
>>>     <final>false</final>
>>>   </property>
>>>
>>>
>>> On Mon, Jul 25, 2011 at 7:14 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:
>>>
>>>> Vikas,
>>>>
>>>> You should be able to use the Snappy codec doing some minor tweaks
>>>> from http://code.google.com/p/hadoop-snappy/ until a Hadoop releases
>>>> with Snappy support.
>>>>
>>>> Thxs.
>>>>
>>>> Alejandro.
>>>>
>>>> On Mon, Jul 25, 2011 at 4:04 AM, Vikas Srivastava
>>>> <vi...@one97.net> wrote:
>>>> > Hey ,
>>>> >
>>>> > i just want to use any compression in hadoop so i heard about lzo
>>>> which is
>>>> > best among all the compression (after snappy)
>>>> >
>>>> > please any1 tell me who is already using any kind of compression in
>>>> hadoop
>>>> > 0.20.2
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > With Regards
>>>> > Vikas Srivastava
>>>> >
>>>> > DWH & Analytics Team
>>>> > Mob:+91 9560885900
>>>> > One97 | Let's get talking !
>>>> >
>>>>
>>>
>>>
>>
>

Re: Lzo Compression

Posted by Koert Kuipers <ko...@tresata.com>.
i just tried your step 10 and 11 on my setup and it works. See my earlier
message about my setup.

On Wed, Jul 27, 2011 at 9:37 AM, Ankit Jain <an...@gmail.com> wrote:

> Hi all,
> I tried to index the lzo file but got the following error while indexing
> the lzo file :
>
> java.lang.ClassCastException:
> com.hadoop.compression.lzo.LzopCodec$LzopDecompressor cannot be cast to
> com.hadoop.compression.lzo.LzopDecompressor
>
> I have performed following steps:-
>
> 1. $sudo apt-get install liblzo2-dev
> 2. Download the Hadoop-lzo clone from github repo (
> https://github.com/kevinweil/hadoop-lzo )
> 3. Build the Hadoop-lzo project.
> 4. Copy the hadoop-lzo-*.jar file at $HADOOP_HOME/lib dir of cluster nodes
> 5. Copy the hadoop-lzo-install-dir/build/hadoop-lzo-*/native library at
> $HADOOP_HOME/lib dir of cluster nodes.
> 6. core-site.xml:
>
>             <property>
>             <name>io.compression.codecs</name>
>         <value>org.apache.hadoop.io.compress.GzipCodec
>             ,org.apache.hadoop.io.compress.DefaultCodec,
>             com.hadoop.compression.lzo.LzoCodec,
>             com.hadoop.compression.lzo.LzopCodec,
>             org.apache.hadoop.io.compress.BZip2Codec
>         </value>
>       </property>
>       <property>
>             <name>io.compression.codec.lzo.class</name>
>             <value>com.hadoop.compression.lzo.LzoCodec</value>
>       </property>
> 7. mapred-site.xml:
>
>     <property>
>                     <name>mapred.child.env</name>
>
> <value>JAVA_LIBRARY_PATH=/opt/ladap/common/hadoop-0.20.2/lib/native/Linux-i386-32/*</value>
>
>             </property>
>
>           <property>
>                 <name>mapred.map.output.compression.codec</name>
>                 <value>com.hadoop.compression.lzo.LzoCodec</value>
>           </property>
> 8. hadoop-env.sh
>
> export
> HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/ankit/hadoop-0.20.1/lib/hadoop-lzo-0.4.12.jar
>     export
> JAVA_LIBRARY_PATH=/home/ankit/hadoop-0.20.1/lib/native/Linux-i386-32/
>
> 9. Restart the cluster
>
> 10. uploaded lzo file into hdfs
>
> 11. Runned the following command for indexing:
> bin/hadoop jar path/to/hadoop-lzo-*.jar
> com.hadoop.compression.lzo.LzoIndexer lzofile.lzo
>
>
>
>
> On Tue, Jul 26, 2011 at 1:39 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> my installation notes for lzo-hadoop (might be wrong or incomplete):
>>
>> we run centos 5.6 and cdh3
>>
>> yum -y install lzo
>> git checkout https://github.com/toddlipcon/hadoop-lzo.git
>> cd hadoop-lzo
>> ant
>> cd build
>> cp hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar /usr/lib/hadoop/lib
>> cp -r hadoop-lzo-0.4.10/lib/native /usr/lib/hadoop/lib
>>
>>
>> in core.site.xml:
>>  <property>
>>     <name>io.compression.codecs</name>
>>
>> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
>>     <final>true</final>
>>   </property>
>>
>>   <property>
>>     <name>io.compression.codec.lzo.class</name>
>>     <value>com.hadoop.compression.lzo.LzoCodec</value>
>>     <final>true</final>
>>   </property>
>>
>>
>> in mapred.site.xml:
>>   <property>
>>     <name>mapred.compress.map.output</name>
>>     <value>true</value>
>>     <final>false</final>
>>   </property>
>>
>>   <property>
>>     <name>mapred.map.output.compression.codec</name>
>>     <value>com.hadoop.compression.lzo.LzoCodec</value>
>>     <final>false</final>
>>   </property>
>>
>>   <property>
>>     <name>mapred.output.compress</name>
>>     <value>true</value>
>>     <final>false</final>
>>   </property>
>>
>>   <property>
>>     <name>mapred.output.compression.codec</name>
>>     <value>com.hadoop.compression.lzo.LzoCodec</value>
>>     <final>false</final>
>>   </property>
>>
>>   <property>
>>     <name>mapred.output.compression.type</name>
>>     <value>BLOCK</value>
>>     <final>false</final>
>>   </property>
>>
>>
>> On Mon, Jul 25, 2011 at 7:14 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:
>>
>>> Vikas,
>>>
>>> You should be able to use the Snappy codec doing some minor tweaks
>>> from http://code.google.com/p/hadoop-snappy/ until a Hadoop releases
>>> with Snappy support.
>>>
>>> Thxs.
>>>
>>> Alejandro.
>>>
>>> On Mon, Jul 25, 2011 at 4:04 AM, Vikas Srivastava
>>> <vi...@one97.net> wrote:
>>> > Hey ,
>>> >
>>> > i just want to use any compression in hadoop so i heard about lzo which
>>> is
>>> > best among all the compression (after snappy)
>>> >
>>> > please any1 tell me who is already using any kind of compression in
>>> hadoop
>>> > 0.20.2
>>> >
>>> >
>>> >
>>> > --
>>> > With Regards
>>> > Vikas Srivastava
>>> >
>>> > DWH & Analytics Team
>>> > Mob:+91 9560885900
>>> > One97 | Let's get talking !
>>> >
>>>
>>
>>
>

Re: Lzo Compression

Posted by Ankit Jain <an...@gmail.com>.
Hi all,
I tried to index the lzo file but got the following error while indexing the
lzo file :

java.lang.ClassCastException:
com.hadoop.compression.lzo.LzopCodec$LzopDecompressor cannot be cast to
com.hadoop.compression.lzo.LzopDecompressor

I have performed following steps:-

1. $sudo apt-get install liblzo2-dev
2. Download the Hadoop-lzo clone from github repo (
https://github.com/kevinweil/hadoop-lzo )
3. Build the Hadoop-lzo project.
4. Copy the hadoop-lzo-*.jar file at $HADOOP_HOME/lib dir of cluster nodes
5. Copy the hadoop-lzo-install-dir/build/hadoop-lzo-*/native library at
$HADOOP_HOME/lib dir of cluster nodes.
6. core-site.xml:
            <property>
            <name>io.compression.codecs</name>
        <value>org.apache.hadoop.io.compress.GzipCodec
            ,org.apache.hadoop.io.compress.DefaultCodec,
            com.hadoop.compression.lzo.LzoCodec,
            com.hadoop.compression.lzo.LzopCodec,
            org.apache.hadoop.io.compress.BZip2Codec
        </value>
      </property>
      <property>
            <name>io.compression.codec.lzo.class</name>
            <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
7. mapred-site.xml:

    <property>
                    <name>mapred.child.env</name>

<value>JAVA_LIBRARY_PATH=/opt/ladap/common/hadoop-0.20.2/lib/native/Linux-i386-32/*</value>
            </property>

          <property>
                <name>mapred.map.output.compression.codec</name>
                <value>com.hadoop.compression.lzo.LzoCodec</value>
          </property>
8. hadoop-env.sh

export
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/ankit/hadoop-0.20.1/lib/hadoop-lzo-0.4.12.jar
    export
JAVA_LIBRARY_PATH=/home/ankit/hadoop-0.20.1/lib/native/Linux-i386-32/

9. Restart the cluster

10. uploaded lzo file into hdfs

11. Runned the following command for indexing:
bin/hadoop jar path/to/hadoop-lzo-*.jar
com.hadoop.compression.lzo.LzoIndexer lzofile.lzo



On Tue, Jul 26, 2011 at 1:39 PM, Koert Kuipers <ko...@tresata.com> wrote:

> my installation notes for lzo-hadoop (might be wrong or incomplete):
>
> we run centos 5.6 and cdh3
>
> yum -y install lzo
> git checkout https://github.com/toddlipcon/hadoop-lzo.git
> cd hadoop-lzo
> ant
> cd build
> cp hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar /usr/lib/hadoop/lib
> cp -r hadoop-lzo-0.4.10/lib/native /usr/lib/hadoop/lib
>
>
> in core.site.xml:
>  <property>
>     <name>io.compression.codecs</name>
>
> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
>     <final>true</final>
>   </property>
>
>   <property>
>     <name>io.compression.codec.lzo.class</name>
>     <value>com.hadoop.compression.lzo.LzoCodec</value>
>     <final>true</final>
>   </property>
>
>
> in mapred.site.xml:
>   <property>
>     <name>mapred.compress.map.output</name>
>     <value>true</value>
>     <final>false</final>
>   </property>
>
>   <property>
>     <name>mapred.map.output.compression.codec</name>
>     <value>com.hadoop.compression.lzo.LzoCodec</value>
>     <final>false</final>
>   </property>
>
>   <property>
>     <name>mapred.output.compress</name>
>     <value>true</value>
>     <final>false</final>
>   </property>
>
>   <property>
>     <name>mapred.output.compression.codec</name>
>     <value>com.hadoop.compression.lzo.LzoCodec</value>
>     <final>false</final>
>   </property>
>
>   <property>
>     <name>mapred.output.compression.type</name>
>     <value>BLOCK</value>
>     <final>false</final>
>   </property>
>
>
> On Mon, Jul 25, 2011 at 7:14 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:
>
>> Vikas,
>>
>> You should be able to use the Snappy codec doing some minor tweaks
>> from http://code.google.com/p/hadoop-snappy/ until a Hadoop releases
>> with Snappy support.
>>
>> Thxs.
>>
>> Alejandro.
>>
>> On Mon, Jul 25, 2011 at 4:04 AM, Vikas Srivastava
>> <vi...@one97.net> wrote:
>> > Hey ,
>> >
>> > i just want to use any compression in hadoop so i heard about lzo which
>> is
>> > best among all the compression (after snappy)
>> >
>> > please any1 tell me who is already using any kind of compression in
>> hadoop
>> > 0.20.2
>> >
>> >
>> >
>> > --
>> > With Regards
>> > Vikas Srivastava
>> >
>> > DWH & Analytics Team
>> > Mob:+91 9560885900
>> > One97 | Let's get talking !
>> >
>>
>
>

Re: Lzo Compression

Posted by Koert Kuipers <ko...@tresata.com>.
my installation notes for lzo-hadoop (might be wrong or incomplete):

we run centos 5.6 and cdh3

yum -y install lzo
git checkout https://github.com/toddlipcon/hadoop-lzo.git
cd hadoop-lzo
ant
cd build
cp hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar /usr/lib/hadoop/lib
cp -r hadoop-lzo-0.4.10/lib/native /usr/lib/hadoop/lib


in core.site.xml:
 <property>
    <name>io.compression.codecs</name>

<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
    <final>true</final>
  </property>

  <property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
    <final>true</final>
  </property>


in mapred.site.xml:
  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
    <final>false</final>
  </property>

  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
    <final>false</final>
  </property>

  <property>
    <name>mapred.output.compress</name>
    <value>true</value>
    <final>false</final>
  </property>

  <property>
    <name>mapred.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
    <final>false</final>
  </property>

  <property>
    <name>mapred.output.compression.type</name>
    <value>BLOCK</value>
    <final>false</final>
  </property>

On Mon, Jul 25, 2011 at 7:14 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:

> Vikas,
>
> You should be able to use the Snappy codec doing some minor tweaks
> from http://code.google.com/p/hadoop-snappy/ until a Hadoop releases
> with Snappy support.
>
> Thxs.
>
> Alejandro.
>
> On Mon, Jul 25, 2011 at 4:04 AM, Vikas Srivastava
> <vi...@one97.net> wrote:
> > Hey ,
> >
> > i just want to use any compression in hadoop so i heard about lzo which
> is
> > best among all the compression (after snappy)
> >
> > please any1 tell me who is already using any kind of compression in
> hadoop
> > 0.20.2
> >
> >
> >
> > --
> > With Regards
> > Vikas Srivastava
> >
> > DWH & Analytics Team
> > Mob:+91 9560885900
> > One97 | Let's get talking !
> >
>

Re: Lzo Compression

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Vikas,

You should be able to use the Snappy codec doing some minor tweaks
from http://code.google.com/p/hadoop-snappy/ until a Hadoop releases
with Snappy support.

Thxs.

Alejandro.

On Mon, Jul 25, 2011 at 4:04 AM, Vikas Srivastava
<vi...@one97.net> wrote:
> Hey ,
>
> i just want to use any compression in hadoop so i heard about lzo which is
> best among all the compression (after snappy)
>
> please any1 tell me who is already using any kind of compression in hadoop
> 0.20.2
>
>
>
> --
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>

Re: Lzo Compression

Posted by Koert Kuipers <ko...@tresata.com>.
I have LZO compression enabled by default in hadoop 0.20.2 and hive 0.7.0
and it works well so far.

On Mon, Jul 25, 2011 at 7:04 AM, Vikas Srivastava <
vikas.srivastava@one97.net> wrote:

> Hey ,
>
> i just want to use any compression in hadoop so i heard about lzo which is
> best among all the compression (after snappy)
>
> please any1 tell me who is already using any kind of compression in hadoop
> 0.20.2
>
>
>
> --
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>
>