You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ryan LeCompte <le...@gmail.com> on 2008/12/01 16:46:45 UTC

Hadoop and .tgz files

Hello all,

I'm using Hadoop 0.19 and just discovered that it has no problems
processing .tgz files that contain text files. I was under the
impression that it wouldn't be able to break a .tgz file up into
multiple maps, but instead just treat it as 1 map per .tgz file. Was
this a recent change or enhancement? I'm noticing that it is breaking
up the .tgz file into multiple maps.

Thanks,
Ryan

Re: Hadoop and .tgz files

Posted by John Heidemann <jo...@isi.edu>.
On Mon, 01 Dec 2008 12:16:28 EST, "Ryan LeCompte" wrote: 
>I believe I spoke a little too soon. Looks like Hadoop supports .gz
>files, not .tgz. :-)
>
>
>On Mon, Dec 1, 2008 at 10:46 AM, Ryan LeCompte <le...@gmail.com> wrote:
>> Hello all,
>>
>> I'm using Hadoop 0.19 and just discovered that it has no problems
>> processing .tgz files that contain text files. I was under the
>> impression that it wouldn't be able to break a .tgz file up into
>> multiple maps, but instead just treat it as 1 map per .tgz file. Was
>> this a recent change or enhancement? I'm noticing that it is breaking
>> up the .tgz file into multiple maps.
>>
>> Thanks,
>> Ryan
>>
>

Work is in progress to support splitting of .bz2 files.
See  http://issues.apache.org/jira/browse/HADOOP-4012

I don't believe splitting of .tgz files is possible, something
compressed with gzip can only be uncompressed from the beginning.

   -John Heidemann


Re: Hadoop and .tgz files

Posted by Ryan LeCompte <le...@gmail.com>.
I believe I spoke a little too soon. Looks like Hadoop supports .gz
files, not .tgz. :-)


On Mon, Dec 1, 2008 at 10:46 AM, Ryan LeCompte <le...@gmail.com> wrote:
> Hello all,
>
> I'm using Hadoop 0.19 and just discovered that it has no problems
> processing .tgz files that contain text files. I was under the
> impression that it wouldn't be able to break a .tgz file up into
> multiple maps, but instead just treat it as 1 map per .tgz file. Was
> this a recent change or enhancement? I'm noticing that it is breaking
> up the .tgz file into multiple maps.
>
> Thanks,
> Ryan
>