You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Kumar Jayapal <kj...@gmail.com> on 2015/08/05 22:20:07 UTC

compress folder in hadoop

Hi All,

How to compress a folder in hadoop?

I want to compress a folder which has old data and not frequently used. How
can I do that ?

When I searched the web I got some idea to compress the files. Can some
please help me understanding Why files are not in .lzo or .gz format.


I am test executing below command for two types of compression, lzo and
gzip when I check the files they are of same size. How do I check if the
compression was successful,When I cat the files I can see the data.

 MR job was successfull and created these file.?

# hadoop jar hadoop-streaming.jar "-Dmapreduce.compress.map.output=true"
"-Dmapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec"
"-Dmapreduce.output.compress=true"
"-Dmapreduce.output.compression.codec=com.hadoop.compression.lzo.LzopCodec"
 -input /tmp/hdfs/hdfsNID9801P.csv -output /tmp/hdfs/hdfslzo


# hadoop jar hadoop-streaming.jar "-Dmapreduce.compress.map.output=true"
"-Dmapreduce.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"
"-Dmapreduce.output.compress=true"
"-Dmapreduce.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"
 -input /tmp/hdfs/hdfsNID9801P.csv -output /tmp/hdfs/hdfsgzip



output partfiles below.


15/08/05 18:36:07 INFO streaming.StreamJob: Output directory:
/tmp/hdfs/hdfsgzip
# hadoop fs -ls /tmp/hdfs/hdfsgzip
Found 5 items
-rw-r--r--   3 hdfs supergroup          0 2015-08-05 18:36
/tmp/hdfs/hdfsgzip/_SUCCESS
-rw-r--r--   3 hdfs supergroup 6061954911 2015-08-05 18:36
/tmp/hdfs/hdfsgzip/part-00000
-rw-r--r--   3 hdfs supergroup 6062727606 2015-08-05 18:35
/tmp/hdfs/hdfsgzip/part-00001
-rw-r--r--   3 hdfs supergroup 6064932250 2015-08-05 18:35
/tmp/hdfs/hdfsgzip/part-00002
-rw-r--r--   3 hdfs supergroup 6062737354 2015-08-05 18:36
/tmp/hdfs/hdfsgzip/part-00003
# hadoop fs -ls /tmp/hdfs/hdfslzo
Found 5 items
-rw-r--r--   3 hdfs supergroup          0 2015-08-05 18:28
/tmp/hdfs/hdfslzo/_SUCCESS
-rw-r--r--   3 hdfs supergroup 6061954911 2015-08-05 18:27
/tmp/hdfs/hdfslzo/part-00000
-rw-r--r--   3 hdfs supergroup 6062727606 2015-08-05 18:27
/tmp/hdfs/hdfslzo/part-00001
-rw-r--r--   3 hdfs supergroup 6064932250 2015-08-05 18:27
/tmp/hdfs/hdfslzo/part-00002
-rw-r--r--   3 hdfs supergroup 6062737354 2015-08-05 18:28
/tmp/hdfs/hdfslzo/part-00003

it will be great help if you point me to any link regarding compression.



Thanks
Jay