You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by Kévin Poupon <po...@gmail.com> on 2014/07/11 16:02:46 UTC

Custom compression codec integration

Hello,

I developped a custom compression codec for a 2-nodes Hadoop cluster. Let's
call it my.custom.ComprCodec.
It works well in Eclipse test environnement but I have troubles integrating
it to Hadoop.

I updated the following properties via Cloudera:
io.compression.codecs: I added my.custom.ComprCodec
mapreduce.output.fileoutputformat.compress: set to true (ckeckmarked)
mapreduce.output.fileoutputformat.compress.codec: set to
my.custom.ComprCodec
I then placed the codec.jar on the HDFS NameNode machine in
/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hadoop/lib/, next to
Snappy codec.

Then to try the codec I launch a streaming job:

hadoop jar
/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hadoop-mapreduce/hadoop-streaming.jar
-input /user/hadoop/compr/ -output /user/hadoop/decompr/ -mapper cat
-reducer cat

but the job fails:

14/07/09 14:43:46 INFO mapreduce.Job:  map 0% reduce 0%
14/07/09 14:43:55 INFO mapreduce.Job:  map 50% reduce 0%
14/07/09 14:44:02 INFO mapreduce.Job:  map 100% reduce 0%
14/07/09 14:44:09 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000000_0, Status : FAILED
14/07/09 14:44:15 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000001_0, Status : FAILED
14/07/09 14:44:21 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000000_1, Status : FAILED
14/07/09 14:44:27 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000001_1, Status : FAILED
14/07/09 14:44:33 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000000_2, Status : FAILED
14/07/09 14:44:38 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000001_2, Status : FAILED
14/07/09 14:44:44 INFO mapreduce.Job:  map 100% reduce 100%
14/07/09 14:44:44 INFO mapreduce.Job: Job job_1404901272360_0003 failed
with state FAILED due to: Task failed task_1404901272360_0003_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1

Note that the job works well when the compression codec to use is set to be
Snappy.

How to integrate a custom codec to Hadoop? What did I forget ?

Thank you