You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Arv Mistry <ar...@kindsight.net> on 2010/06/01 16:28:01 UTC
Writing compressed data to HDFS
Hi,
I have a java process that writes compressed data to the HDFS. The way I
am doing that is wrapping the FSDataOutputSTream with GZIPOutputStream
and calling the write() method i.e. something like
FSDataOutputSTream out = fs.create(file);
gzip = new GZIPOutputStream(out);
gzip.write("sss".getBytes("UTF8");
The file seems to get written ok.
However, when I get the file out of HDFS and try to unzip it, it
complains;
gunzip: cs_1_20100601_120000_1275396891183.cgz: unknown suffix --
ignored
When I do 'file' it is recognized as 'gzip compressed data, from FAT
filesystem (MS-DOS, OS/2, NT)'
Any ideas? Appreciate any help.
Cheers Arv
Re: Writing compressed data to HDFS
Posted by Eric Sammer <es...@cloudera.com>.
This isn't really a Hadoop issue, but gunzip will refuse to decompress
files that don't have a well known suffix. Rename the file to have the
file .gz and try again or use the -S option to specify an alternate
suffix.
On Tue, Jun 1, 2010 at 10:28 AM, Arv Mistry <ar...@kindsight.net> wrote:
> Hi,
>
> I have a java process that writes compressed data to the HDFS. The way I
> am doing that is wrapping the FSDataOutputSTream with GZIPOutputStream
> and calling the write() method i.e. something like
>
> FSDataOutputSTream out = fs.create(file);
> gzip = new GZIPOutputStream(out);
> gzip.write("sss".getBytes("UTF8");
>
> The file seems to get written ok.
>
> However, when I get the file out of HDFS and try to unzip it, it
> complains;
>
> gunzip: cs_1_20100601_120000_1275396891183.cgz: unknown suffix --
> ignored
>
> When I do 'file' it is recognized as 'gzip compressed data, from FAT
> filesystem (MS-DOS, OS/2, NT)'
>
> Any ideas? Appreciate any help.
>
> Cheers Arv
>
--
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com