You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Jonathan Poon (JIRA)" <ji...@apache.org> on 2011/08/11 02:33:27 UTC

[jira] [Created] (HADOOP-7535) Hadoop Streaming map_input_file

Hadoop Streaming map_input_file
-------------------------------

                 Key: HADOOP-7535
                 URL: https://issues.apache.org/jira/browse/HADOOP-7535
             Project: Hadoop Common
          Issue Type: Bug
    Affects Versions: 0.20.203.0
         Environment: Debian Squeeze
            Reporter: Jonathan Poon


I'm currently trying to use the map_input_file environment variable to determine the file the stdin stream is coming from into a mapper code I've written.  I used the following command to print the environment variables and see what map_input_file was given:

hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-0.20.203.0.jar -input A -input S -input F -output B -mapper "bash -c \"export\""

I get the following output for the map_input_file:
declare -x map_input_file="hdfs://localhost:54310/user/poonj/A/A.txt"
declare -x map_input_file="hdfs://localhost:54310/user/poonj/A/A.txt"
declare -x map_input_file="hdfs://localhost:54310/user/poonj/F/F.txt"
declare -x map_input_file="hdfs://localhost:54310/user/poonj/S/S.txt"

I am under the assumption that the variable is only used once and should store the current file being processed.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira