You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Periya.Data" <pe...@gmail.com> on 2012/08/28 19:17:04 UTC

Hadoop Streaming question

Hi all,
   I am using Python on CDH3u3 for streaming. I do not know how to provide
command-line arguments. My python mapper takes in 3 arguments - 2 input
files and one placeholder for an output file. I am doing something like
this, but fails. Where am I going wrong? What other options do I have? Any
best practices? I am using cmdenv, but, do not know how exactly to use it.
I have seen this question on the net, but, I have not found a working
answer..



HDFS_INPUT_1=/user/kk/book/eccfile.txt
HDFS_INPUT_2=/user/kk/book/calist.txt
LOCAL_INPUT_1=$KK_HOME/eccfile.txt
LOCAL_INPUT_2=$KK_HOME/calist.txt

HDFS_OUTPUT=/user/kk/book/eccoutput
LOCAL_OUTPUT=$KK_HOME/

hadoop jar
/usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.*-cdh*.jar \
        -D mapred.job.name="CM" \
        -D mapred.reduce.tasks=0 \
        -files $LOCAL_INPUT_1, $LOCAL_INPUT_2 \
        -input  $HDFS_INPUT_1 \
        -output $HDFS_OUTPUT \
        -file   $KK_HOME/ec_ca.py \
        -cmdenv arg1=$LOCAL_INPUT_1 \
        -cmdenv arg2=$LOCAL_INPUT_2 \
        -cmdenv arg3=$LOCAL_OUTPUT \
        -mapper "$KK_HOME/ec_ca.py $arg1 $arg2 $arg3"

======================================================================

Some more related questions:

   1. what is the option for sending a file to all the nodes (say, arg2).
   This file is a "reference" input file that is needed for processing. Should
   I use the option"-files"? like DistributedCache.
   2. I really do not know what happens if I specify an output file (in
   local dir). I understand that specifying a HDFS location for output will
   nicely place the output in that dir. My Python script writes the output
   into a local directory - which I tested and worked fine locally. But, what
   really happens when I try to run on Hadoop? This is my $arg3.


Thanks and appreciate your help,
PD.