You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Jonathan Bishop <jb...@gmail.com> on 2012/10/23 20:10:57 UTC

zlib does not uncompress gzip during MR run

Hi,

My input files are gzipped, and I am using the builtin java codecs
successfully to uncompress them in a normal java run...

        fileIn = fs.open(fsplit.getPath());
        codec = compressionCodecs.getCodec(fsplit.getPath());
        in = new LineReader(codec != null ? codec.createInputStream(fileIn)
: fileIn, config);

But when I use the same piece of code in a MR job I am getting...



12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table
output configured.
12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process
: 3
12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
12/10/23 11:02:49 INFO mapred.JobClient: Task Id :
attempt_201210221549_0014_m_000003_0, Status : FAILED
java.io.IOException: incorrect header check
    at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
Method)
    at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
    at
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
    at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
    at java.io.InputStream.read(InputStream.java:101)

So I am thinking that there is some incompatibility of zlib and my gzip. Is
there a way to force hadoop to use the java built-in compression codecs?

Also, I would like to try lzo which I hope will allow splitting of the
input files (I recall reading this somewhere). Can someone point me to the
best way to do this?

Thanks,

Jon

Re: zlib does not uncompress gzip during MR run

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Do your files carry the .gz extension? They shouldn't have been split if so.

However, if I look at your code, you do not seem to be using the
fileSplit object's offset and length attributes anywhere. So, this
doesn't look like a problem of splits to me.

You're loading up the wrong codec unintentionally. Only .gz
filename suffixes map to GzipCodec if you use
"compressionCodecs.getCodec(fsplit.getPath());", otherwise,
instantiate the GzipCodec directly and use that instead of the
getCodec helper call.

On Wed, Oct 24, 2012 at 12:11 AM, Jonathan Bishop <jb...@gmail.com> wrote:
> Just to follow up on my own question...
>
> I believe the problem is caused by the input split during MR. So my real
> question is how to handle input splits when the input is gzipped.
>
> Is it even possible to have splits of a gzipped file?
>
> Thanks,
>
> Jon
>
>
> On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> My input files are gzipped, and I am using the builtin java codecs
>> successfully to uncompress them in a normal java run...
>>
>>         fileIn = fs.open(fsplit.getPath());
>>         codec = compressionCodecs.getCodec(fsplit.getPath());
>>         in = new LineReader(codec != null ?
>> codec.createInputStream(fileIn) : fileIn, config);
>>
>> But when I use the same piece of code in a MR job I am getting...
>>
>>
>>
>> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop
>> library
>> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized
>> native-zlib library
>> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
>> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table
>> output configured.
>> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process
>> : 3
>> 12/10/23 11:02:27 INFO mapred.JobClient: Running job:
>> job_201210221549_0014
>> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id :
>> attempt_201210221549_0014_m_000003_0, Status : FAILED
>> java.io.IOException: incorrect header check
>>     at
>> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
>> Method)
>>     at
>> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>>     at
>> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>>     at
>> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>>     at java.io.InputStream.read(InputStream.java:101)
>>
>> So I am thinking that there is some incompatibility of zlib and my gzip.
>> Is there a way to force hadoop to use the java built-in compression codecs?
>>
>> Also, I would like to try lzo which I hope will allow splitting of the
>> input files (I recall reading this somewhere). Can someone point me to the
>> best way to do this?
>>
>> Thanks,
>>
>> Jon
>
>



-- 
Harsh J

Re: zlib does not uncompress gzip during MR run

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Do your files carry the .gz extension? They shouldn't have been split if so.

However, if I look at your code, you do not seem to be using the
fileSplit object's offset and length attributes anywhere. So, this
doesn't look like a problem of splits to me.

You're loading up the wrong codec unintentionally. Only .gz
filename suffixes map to GzipCodec if you use
"compressionCodecs.getCodec(fsplit.getPath());", otherwise,
instantiate the GzipCodec directly and use that instead of the
getCodec helper call.

On Wed, Oct 24, 2012 at 12:11 AM, Jonathan Bishop <jb...@gmail.com> wrote:
> Just to follow up on my own question...
>
> I believe the problem is caused by the input split during MR. So my real
> question is how to handle input splits when the input is gzipped.
>
> Is it even possible to have splits of a gzipped file?
>
> Thanks,
>
> Jon
>
>
> On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> My input files are gzipped, and I am using the builtin java codecs
>> successfully to uncompress them in a normal java run...
>>
>>         fileIn = fs.open(fsplit.getPath());
>>         codec = compressionCodecs.getCodec(fsplit.getPath());
>>         in = new LineReader(codec != null ?
>> codec.createInputStream(fileIn) : fileIn, config);
>>
>> But when I use the same piece of code in a MR job I am getting...
>>
>>
>>
>> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop
>> library
>> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized
>> native-zlib library
>> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
>> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table
>> output configured.
>> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process
>> : 3
>> 12/10/23 11:02:27 INFO mapred.JobClient: Running job:
>> job_201210221549_0014
>> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id :
>> attempt_201210221549_0014_m_000003_0, Status : FAILED
>> java.io.IOException: incorrect header check
>>     at
>> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
>> Method)
>>     at
>> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>>     at
>> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>>     at
>> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>>     at java.io.InputStream.read(InputStream.java:101)
>>
>> So I am thinking that there is some incompatibility of zlib and my gzip.
>> Is there a way to force hadoop to use the java built-in compression codecs?
>>
>> Also, I would like to try lzo which I hope will allow splitting of the
>> input files (I recall reading this somewhere). Can someone point me to the
>> best way to do this?
>>
>> Thanks,
>>
>> Jon
>
>



-- 
Harsh J

Re: zlib does not uncompress gzip during MR run

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Do your files carry the .gz extension? They shouldn't have been split if so.

However, if I look at your code, you do not seem to be using the
fileSplit object's offset and length attributes anywhere. So, this
doesn't look like a problem of splits to me.

You're loading up the wrong codec unintentionally. Only .gz
filename suffixes map to GzipCodec if you use
"compressionCodecs.getCodec(fsplit.getPath());", otherwise,
instantiate the GzipCodec directly and use that instead of the
getCodec helper call.

On Wed, Oct 24, 2012 at 12:11 AM, Jonathan Bishop <jb...@gmail.com> wrote:
> Just to follow up on my own question...
>
> I believe the problem is caused by the input split during MR. So my real
> question is how to handle input splits when the input is gzipped.
>
> Is it even possible to have splits of a gzipped file?
>
> Thanks,
>
> Jon
>
>
> On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> My input files are gzipped, and I am using the builtin java codecs
>> successfully to uncompress them in a normal java run...
>>
>>         fileIn = fs.open(fsplit.getPath());
>>         codec = compressionCodecs.getCodec(fsplit.getPath());
>>         in = new LineReader(codec != null ?
>> codec.createInputStream(fileIn) : fileIn, config);
>>
>> But when I use the same piece of code in a MR job I am getting...
>>
>>
>>
>> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop
>> library
>> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized
>> native-zlib library
>> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
>> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table
>> output configured.
>> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process
>> : 3
>> 12/10/23 11:02:27 INFO mapred.JobClient: Running job:
>> job_201210221549_0014
>> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id :
>> attempt_201210221549_0014_m_000003_0, Status : FAILED
>> java.io.IOException: incorrect header check
>>     at
>> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
>> Method)
>>     at
>> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>>     at
>> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>>     at
>> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>>     at java.io.InputStream.read(InputStream.java:101)
>>
>> So I am thinking that there is some incompatibility of zlib and my gzip.
>> Is there a way to force hadoop to use the java built-in compression codecs?
>>
>> Also, I would like to try lzo which I hope will allow splitting of the
>> input files (I recall reading this somewhere). Can someone point me to the
>> best way to do this?
>>
>> Thanks,
>>
>> Jon
>
>



-- 
Harsh J

Re: zlib does not uncompress gzip during MR run

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Do your files carry the .gz extension? They shouldn't have been split if so.

However, if I look at your code, you do not seem to be using the
fileSplit object's offset and length attributes anywhere. So, this
doesn't look like a problem of splits to me.

You're loading up the wrong codec unintentionally. Only .gz
filename suffixes map to GzipCodec if you use
"compressionCodecs.getCodec(fsplit.getPath());", otherwise,
instantiate the GzipCodec directly and use that instead of the
getCodec helper call.

On Wed, Oct 24, 2012 at 12:11 AM, Jonathan Bishop <jb...@gmail.com> wrote:
> Just to follow up on my own question...
>
> I believe the problem is caused by the input split during MR. So my real
> question is how to handle input splits when the input is gzipped.
>
> Is it even possible to have splits of a gzipped file?
>
> Thanks,
>
> Jon
>
>
> On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> My input files are gzipped, and I am using the builtin java codecs
>> successfully to uncompress them in a normal java run...
>>
>>         fileIn = fs.open(fsplit.getPath());
>>         codec = compressionCodecs.getCodec(fsplit.getPath());
>>         in = new LineReader(codec != null ?
>> codec.createInputStream(fileIn) : fileIn, config);
>>
>> But when I use the same piece of code in a MR job I am getting...
>>
>>
>>
>> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop
>> library
>> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized
>> native-zlib library
>> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
>> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table
>> output configured.
>> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process
>> : 3
>> 12/10/23 11:02:27 INFO mapred.JobClient: Running job:
>> job_201210221549_0014
>> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id :
>> attempt_201210221549_0014_m_000003_0, Status : FAILED
>> java.io.IOException: incorrect header check
>>     at
>> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
>> Method)
>>     at
>> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>>     at
>> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>>     at
>> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>>     at java.io.InputStream.read(InputStream.java:101)
>>
>> So I am thinking that there is some incompatibility of zlib and my gzip.
>> Is there a way to force hadoop to use the java built-in compression codecs?
>>
>> Also, I would like to try lzo which I hope will allow splitting of the
>> input files (I recall reading this somewhere). Can someone point me to the
>> best way to do this?
>>
>> Thanks,
>>
>> Jon
>
>



-- 
Harsh J

Re: zlib does not uncompress gzip during MR run

Posted by Jonathan Bishop <jb...@gmail.com>.
Just to follow up on my own question...

I believe the problem is caused by the input split during MR. So my real
question is how to handle input splits when the input is gzipped.

Is it even possible to have splits of a gzipped file?

Thanks,

Jon

On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com>wrote:

> Hi,
>
> My input files are gzipped, and I am using the builtin java codecs
> successfully to uncompress them in a normal java run...
>
>         fileIn = fs.open(fsplit.getPath());
>         codec = compressionCodecs.getCodec(fsplit.getPath());
>         in = new LineReader(codec != null ?
> codec.createInputStream(fileIn) : fileIn, config);
>
> But when I use the same piece of code in a MR job I am getting...
>
>
>
> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table
> output configured.
> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process
> : 3
> 12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id :
> attempt_201210221549_0014_m_000003_0, Status : FAILED
> java.io.IOException: incorrect header check
>     at
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
> Method)
>     at
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>     at
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>     at
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>     at java.io.InputStream.read(InputStream.java:101)
>
> So I am thinking that there is some incompatibility of zlib and my gzip.
> Is there a way to force hadoop to use the java built-in compression codecs?
>
> Also, I would like to try lzo which I hope will allow splitting of the
> input files (I recall reading this somewhere). Can someone point me to the
> best way to do this?
>
> Thanks,
>
> Jon
>

Re: zlib does not uncompress gzip during MR run

Posted by Jonathan Bishop <jb...@gmail.com>.
Just to follow up on my own question...

I believe the problem is caused by the input split during MR. So my real
question is how to handle input splits when the input is gzipped.

Is it even possible to have splits of a gzipped file?

Thanks,

Jon

On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com>wrote:

> Hi,
>
> My input files are gzipped, and I am using the builtin java codecs
> successfully to uncompress them in a normal java run...
>
>         fileIn = fs.open(fsplit.getPath());
>         codec = compressionCodecs.getCodec(fsplit.getPath());
>         in = new LineReader(codec != null ?
> codec.createInputStream(fileIn) : fileIn, config);
>
> But when I use the same piece of code in a MR job I am getting...
>
>
>
> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table
> output configured.
> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process
> : 3
> 12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id :
> attempt_201210221549_0014_m_000003_0, Status : FAILED
> java.io.IOException: incorrect header check
>     at
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
> Method)
>     at
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>     at
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>     at
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>     at java.io.InputStream.read(InputStream.java:101)
>
> So I am thinking that there is some incompatibility of zlib and my gzip.
> Is there a way to force hadoop to use the java built-in compression codecs?
>
> Also, I would like to try lzo which I hope will allow splitting of the
> input files (I recall reading this somewhere). Can someone point me to the
> best way to do this?
>
> Thanks,
>
> Jon
>

Re: zlib does not uncompress gzip during MR run

Posted by Jonathan Bishop <jb...@gmail.com>.
Just to follow up on my own question...

I believe the problem is caused by the input split during MR. So my real
question is how to handle input splits when the input is gzipped.

Is it even possible to have splits of a gzipped file?

Thanks,

Jon

On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com>wrote:

> Hi,
>
> My input files are gzipped, and I am using the builtin java codecs
> successfully to uncompress them in a normal java run...
>
>         fileIn = fs.open(fsplit.getPath());
>         codec = compressionCodecs.getCodec(fsplit.getPath());
>         in = new LineReader(codec != null ?
> codec.createInputStream(fileIn) : fileIn, config);
>
> But when I use the same piece of code in a MR job I am getting...
>
>
>
> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table
> output configured.
> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process
> : 3
> 12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id :
> attempt_201210221549_0014_m_000003_0, Status : FAILED
> java.io.IOException: incorrect header check
>     at
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
> Method)
>     at
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>     at
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>     at
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>     at java.io.InputStream.read(InputStream.java:101)
>
> So I am thinking that there is some incompatibility of zlib and my gzip.
> Is there a way to force hadoop to use the java built-in compression codecs?
>
> Also, I would like to try lzo which I hope will allow splitting of the
> input files (I recall reading this somewhere). Can someone point me to the
> best way to do this?
>
> Thanks,
>
> Jon
>

Re: zlib does not uncompress gzip during MR run

Posted by Jonathan Bishop <jb...@gmail.com>.
Just to follow up on my own question...

I believe the problem is caused by the input split during MR. So my real
question is how to handle input splits when the input is gzipped.

Is it even possible to have splits of a gzipped file?

Thanks,

Jon

On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com>wrote:

> Hi,
>
> My input files are gzipped, and I am using the builtin java codecs
> successfully to uncompress them in a normal java run...
>
>         fileIn = fs.open(fsplit.getPath());
>         codec = compressionCodecs.getCodec(fsplit.getPath());
>         in = new LineReader(codec != null ?
> codec.createInputStream(fileIn) : fileIn, config);
>
> But when I use the same piece of code in a MR job I am getting...
>
>
>
> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table
> output configured.
> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process
> : 3
> 12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id :
> attempt_201210221549_0014_m_000003_0, Status : FAILED
> java.io.IOException: incorrect header check
>     at
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
> Method)
>     at
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>     at
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>     at
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>     at java.io.InputStream.read(InputStream.java:101)
>
> So I am thinking that there is some incompatibility of zlib and my gzip.
> Is there a way to force hadoop to use the java built-in compression codecs?
>
> Also, I would like to try lzo which I hope will allow splitting of the
> input files (I recall reading this somewhere). Can someone point me to the
> best way to do this?
>
> Thanks,
>
> Jon
>