You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Hong Tang <ht...@yahoo-inc.com> on 2010/05/18 20:11:42 UTC

Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

Stan,

See my comments inline.

Thanks, Hong

On May 18, 2010, at 8:44 AM, stan lee wrote:

> Hi Guys,
>
> I am trying to use compression to reduce the IO workload when trying  
> to run
> a job but failed. I have several questions which needs your help.
>
> For lzo compression, I found a guide
> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it  
> said "Note
> that you must have both 32-bit and 64-bit liblzo2 installed" ? I am  
> not sure
> whether it means that we also need 32bit liblzo2 installed even when  
> we are
> on 64bit system. If so, why?

The answer on the wiki page is to the question of how to set up the  
native libraries so that both 32-bit AND 64-bit java would work. If  
you adhere to an environment with the same flavor of java across the  
whole cluster, then the solution would not apply to you.

> Also if I don't use lzo compression and tried to use gzip to  
> compress the
> final reduce output file, I just set below value in mapred-site.xml,  
> but
> seems it doesn't work(how can I find the final .gz file compressed?  
> I used
> "hadoop dfs -l <dir>" and didn't find that.). My question: can we  
> use gzip
> to compress the final result when it's not streaming job? How can we  
> ensure
> that the compression has been enabled during a job execution?
>
> <property>
>       <name>mapred.output.compress</name>
>       <value>true</value>
> </property>
>

The truth is, this option is honored by the implementation of  
OutputFormat classes.  If you use TextOutputFormat, then you should  
see files like "part-xxxx.gz" in the output directory. If you write  
your own output format class, then you should follow the  
implementations of TextOutputFormat or SequenceFileOutputFormat to set  
up compression properly.