You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Aleksandr Elbakyan <ra...@yahoo.com> on 2013/01/15 01:23:57 UTC
Issue with partitioning using streaming
Hello All,
I am trying to partition data and sort it in hadoop streaming.
Most of the time the data is sorted and partitioned correctly but if I run multiple times sometimes data goes to other partition
The data looks like
asdas 0 ada
asdas 1 asd
12123 1 ccc
12123 0 xxx
hadoop jar ${HADOOP_HOME}/contrib/streaming/hadoop-*streaming.jar \
-D mapred.task.timeout=3600000 \
-D mapred.map.tasks=${GD_NUM_MAP_TASKS} \
-D mapred.reduce.tasks=${GD_NUM_REDUCE_TASKS} \
-D stream.non.zero.exit.is.failure=true \
-D stream.num.map.output.key.fields=2 \
-D mapred.text.key.partitioner.options="-k1,1" \
-D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \
-D mapred.text.key.comparator.options=-k1,2n \
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
-input input \
-output output \
-mapper " cat" \
-reducer " cat" \
-verbose
in reducer code I have some logic which depend on correct partitioning and sorting.
Regards.