You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Michael Harris <Mi...@Telespree.com> on 2007/12/07 01:55:10 UTC

RE: Whether the file is compressed before store it to hadoop?

I am not completely sure I understand this response. I was wondering the exact same thing as Ryan. So I followed what you said here and did the following :

FileSystem fs = DistributedFileSystem.get(conf);
            Path p = new Path("/user/hadoop/indexweek/" + startOfWeek.getTime()
                    + "_" + endOfWeek.getTime() + ".db");
            OutputStream dos = new GZIPOutputStream(fs.create(p));

Clearly the input is now compressed in the DFS, but how does Hadoop recognize that the input is compressed? For example when I browse to that file using the DFS web interface I get the compressed version. Will only map/reduce jobs see the uncompressed version or should it be uncompressed when viewed through the DFS web interface? Also how can Hadoop correctly decompress part of a file (a single block) when the file spans multiple blocks without decompressing the file as a whole?

Is automatic block compression on the roadmap for hadoop? It seems it would be very useful. In my case my compression ratio is 9:1 which seems like it would translate to significant space / io savings.

I haven’t had a chance to run it with Pig to see if map/reduce is working properly. Should I just expect it to work correctly or did I miss something?

Any clarification / correction would be appreciated.

Thanks,
Michael

-----Original Message-----
From: Stu Hood [mailto:stuhood@webmail.us] 
Sent: Monday, November 26, 2007 6:16 PM
To: hadoop-user@lucene.apache.org
Subject: RE: Whether the file is compressed before store it to hadoop?

Hadoop will not automatically compress a file that you place into it.

If you compress a file before placing it in Hadoop, the compression package is used by MapReduce jobs to transparently decompress your GZipped files as input.

Thanks,
Stu

-----Original Message-----
From: Ryan <sa...@live.com>
Sent: Monday, November 26, 2007 8:16pm
To: hadoop-user@lucene.apache.org
Subject: Whether the file is compressed before store it to hadoop?

Hi,
 I'm new to the Hadoop, I'm confused by the store procedures, I found a zlib implementation in the package org.apache.hadoop.io.compression, So I wonder whether the file stored in Hadoop is compressed before it actually in Hadoop. that means Hadoop store the file is its compressed one. 
   
  sincerely
   Ryan

_________________________________________________________________
用 Live Search 搜尽天下资讯!
http://www.live.com/?searchOnly=true