You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by Kévin Poupon <po...@gmail.com> on 2014/07/11 16:02:46 UTC
Custom compression codec integration
Hello,
I developped a custom compression codec for a 2-nodes Hadoop cluster. Let's
call it my.custom.ComprCodec.
It works well in Eclipse test environnement but I have troubles integrating
it to Hadoop.
I updated the following properties via Cloudera:
io.compression.codecs: I added my.custom.ComprCodec
mapreduce.output.fileoutputformat.compress: set to true (ckeckmarked)
mapreduce.output.fileoutputformat.compress.codec: set to
my.custom.ComprCodec
I then placed the codec.jar on the HDFS NameNode machine in
/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hadoop/lib/, next to
Snappy codec.
Then to try the codec I launch a streaming job:
hadoop jar
/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hadoop-mapreduce/hadoop-streaming.jar
-input /user/hadoop/compr/ -output /user/hadoop/decompr/ -mapper cat
-reducer cat
but the job fails:
14/07/09 14:43:46 INFO mapreduce.Job: map 0% reduce 0%
14/07/09 14:43:55 INFO mapreduce.Job: map 50% reduce 0%
14/07/09 14:44:02 INFO mapreduce.Job: map 100% reduce 0%
14/07/09 14:44:09 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000000_0, Status : FAILED
14/07/09 14:44:15 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000001_0, Status : FAILED
14/07/09 14:44:21 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000000_1, Status : FAILED
14/07/09 14:44:27 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000001_1, Status : FAILED
14/07/09 14:44:33 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000000_2, Status : FAILED
14/07/09 14:44:38 INFO mapreduce.Job: Task Id :
attempt_1404901272360_0003_r_000001_2, Status : FAILED
14/07/09 14:44:44 INFO mapreduce.Job: map 100% reduce 100%
14/07/09 14:44:44 INFO mapreduce.Job: Job job_1404901272360_0003 failed
with state FAILED due to: Task failed task_1404901272360_0003_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1
Note that the job works well when the compression codec to use is set to be
Snappy.
How to integrate a custom codec to Hadoop? What did I forget ?
Thank you