You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mapred Learn <ma...@gmail.com> on 2011/06/23 08:21:32 UTC
can we split a big gzipped file on HDFS ?
Hi,
If I have a big gzipped text file (~ 60 GB) in HDFS, can i split it into
smaller chunks (~ 1 GB) so that I can run a map-red job on those files
and finish faster than running job on 1 big file ?
Thanks,
-JJ
Re: can we split a big gzipped file on HDFS ?
Posted by Robert Evans <ev...@yahoo-inc.com>.
JJ,
If it is just gzipped then no. gzip does not allow for splitting as you cannot seek to an arbitrary point in the file and then after, possibly moving to a sync point, start reading out the data. If it is a sequence file with gzip compression then yes, because the sequence file format only compresses the file in chunks, not the entire file at once.
--Bobby Evans
On 6/23/11 1:21 AM, "Mapred Learn" <ma...@gmail.com> wrote:
Hi,
If I have a big gzipped text file (~ 60 GB) in HDFS, can i split it into smaller chunks (~ 1 GB) so that I can run a map-red job on those files and finish faster than running job on 1 big file ?
Thanks,
-JJ