You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Steve Gao <st...@yahoo.com> on 2008/10/23 03:55:42 UTC
Is there a way to know the input filename at Hadoop Streaming?
I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?
For example:
$HADOOP_HOME/bin/hadoop \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer
In mapper:
while (<STDIN>){
//how to tell the current line is from file1 or file2?
}
Re: Is there a way to know the input filename at Hadoop Streaming?
Posted by Rick Cox <ri...@gmail.com>.
On Wed, Oct 22, 2008 at 18:55, Steve Gao <st...@yahoo.com> wrote:
> I am using Hadoop Streaming. The input are multiple files.
> Is there a way to get the current filename in mapper?
>
Streaming map tasks should have a "map_input_file" environment
variable like the following:
map_input_file=hdfs://HOST/path/to/file
rick
> For example:
> $HADOOP_HOME/bin/hadoop \
> jar $HADOOP_HOME/hadoop-streaming.jar \
> -input file1 \
> -input file2 \
> -output myOutputDir \
> -mapper mapper \
> -reducer reducer
>
> In mapper:
> while (<STDIN>){
> //how to tell the current line is from file1 or file2?
> }
>
>
>
>
>
[Help needed] Is there a way to know the input filename at Hadoop Streaming?
Posted by Steve Gao <st...@yahoo.com>.
Sorry for the email. Thanks for any help or hint.
I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?
For example:
$HADOOP_HOME/bin/hadoop \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer
In mapper:
while (<STDIN>){
//how to tell the current line is from file1 or file2?
}
[Help needed] Is there a way to know the input filename at Hadoop Streaming?
Posted by Steve Gao <st...@yahoo.com>.
Sorry for the email. Thanks for any help or hint.
I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?
For example:
$HADOOP_HOME/bin/hadoop \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer
In mapper:
while (<STDIN>){
//how to tell the current line is from file1 or file2?
}