You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Michael Harris <Mi...@Telespree.com> on 2007/12/07 01:55:10 UTC
RE: Whether the file is compressed before store it to hadoop?
I am not completely sure I understand this response. I was wondering the exact same thing as Ryan. So I followed what you said here and did the following :
FileSystem fs = DistributedFileSystem.get(conf);
Path p = new Path("/user/hadoop/indexweek/" + startOfWeek.getTime()
+ "_" + endOfWeek.getTime() + ".db");
OutputStream dos = new GZIPOutputStream(fs.create(p));
Clearly the input is now compressed in the DFS, but how does Hadoop recognize that the input is compressed? For example when I browse to that file using the DFS web interface I get the compressed version. Will only map/reduce jobs see the uncompressed version or should it be uncompressed when viewed through the DFS web interface? Also how can Hadoop correctly decompress part of a file (a single block) when the file spans multiple blocks without decompressing the file as a whole?
Is automatic block compression on the roadmap for hadoop? It seems it would be very useful. In my case my compression ratio is 9:1 which seems like it would translate to significant space / io savings.
I haven’t had a chance to run it with Pig to see if map/reduce is working properly. Should I just expect it to work correctly or did I miss something?
Any clarification / correction would be appreciated.
Thanks,
Michael
-----Original Message-----
From: Stu Hood [mailto:stuhood@webmail.us]
Sent: Monday, November 26, 2007 6:16 PM
To: hadoop-user@lucene.apache.org
Subject: RE: Whether the file is compressed before store it to hadoop?
Hadoop will not automatically compress a file that you place into it.
If you compress a file before placing it in Hadoop, the compression package is used by MapReduce jobs to transparently decompress your GZipped files as input.
Thanks,
Stu
-----Original Message-----
From: Ryan <sa...@live.com>
Sent: Monday, November 26, 2007 8:16pm
To: hadoop-user@lucene.apache.org
Subject: Whether the file is compressed before store it to hadoop?
Hi,
I'm new to the Hadoop, I'm confused by the store procedures, I found a zlib implementation in the package org.apache.hadoop.io.compression, So I wonder whether the file stored in Hadoop is compressed before it actually in Hadoop. that means Hadoop store the file is its compressed one.
sincerely
Ryan
_________________________________________________________________
用 Live Search 搜尽天下资讯!
http://www.live.com/?searchOnly=true