You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by 黄 山 <th...@gmail.com> on 2012/04/26 11:08:08 UTC
AutoInputFormat
I use org.apache.hadoop.streaming.AutoInputFormat to handle sequence file input for streaming, but I found that it provide format below for <key, value>. ( key is a string , value is binary)
"keystring\tvalue\n"
since value is binary, there is a lot '\n' within value, my mapper can't distinguish it.
in other words, I need value presented as length + raw bytes or typed bytes
I called streaming as below:
$HADOOP_HOME/bin/hadoop jar \
$HADOOP_HOME/contrib/streaming/hadoop-streaming-1.0.2.jar \
-input data.seq \
-output output \
-mapper mapper \
-reducer reducer \
-inputformat org.apache.hadoop.streaming.AutoInputFormat \
-file mapper \
-file reducer
huangs
thuhuangs09@gmail.com