You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Steve Gao <st...@yahoo.com> on 2008/10/23 03:55:42 UTC

Is there a way to know the input filename at Hadoop Streaming?

I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?

For example:
$HADOOP_HOME/bin/hadoop  \
jar $HADOOP_HOME/hadoop-streaming.jar \
    -input file1 \
    -input file2 \
    -output myOutputDir \
    -mapper mapper \
    -reducer reducer

In mapper:
while (<STDIN>){
  //how to tell the current line is from file1 or file2?
}




      

Re: Is there a way to know the input filename at Hadoop Streaming?

Posted by Rick Cox <ri...@gmail.com>.
On Wed, Oct 22, 2008 at 18:55, Steve Gao <st...@yahoo.com> wrote:
> I am using Hadoop Streaming. The input are multiple files.
> Is there a way to get the current filename in mapper?
>

Streaming map tasks should have a "map_input_file" environment
variable like the following:

map_input_file=hdfs://HOST/path/to/file

rick

> For example:
> $HADOOP_HOME/bin/hadoop  \
> jar $HADOOP_HOME/hadoop-streaming.jar \
>    -input file1 \
>    -input file2 \
>    -output myOutputDir \
>    -mapper mapper \
>    -reducer reducer
>
> In mapper:
> while (<STDIN>){
>  //how to tell the current line is from file1 or file2?
> }
>
>
>
>
>

[Help needed] Is there a way to know the input filename at Hadoop Streaming?

Posted by Steve Gao <st...@yahoo.com>.
Sorry for the email. Thanks for any help or hint.

    I am using Hadoop Streaming. The input are multiple files.
    Is there a way to get the current filename in mapper?

    For example:
    $HADOOP_HOME/bin/hadoop  \
    jar $HADOOP_HOME/hadoop-streaming.jar \
        -input file1 \
        -input file2 \
        -output myOutputDir \
        -mapper mapper \
        -reducer reducer

    In mapper:
    while (<STDIN>){
      //how to tell the current line is from file1 or file2?
    }



      

[Help needed] Is there a way to know the input filename at Hadoop Streaming?

Posted by Steve Gao <st...@yahoo.com>.
Sorry for the email. Thanks for any help or hint.

    I am using Hadoop Streaming. The input are multiple files.
    Is there a way to get the current filename in mapper?

    For example:
    $HADOOP_HOME/bin/hadoop  \
    jar $HADOOP_HOME/hadoop-streaming.jar \
        -input file1 \
        -input file2 \
        -output myOutputDir \
        -mapper mapper \
        -reducer reducer

    In mapper:
    while (<STDIN>){
      //how to tell the current line is from file1 or file2?
    }