You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Steve Gao <st...@yahoo.com> on 2009/05/16 01:52:52 UTC
How to get jobconf variables in streaming's mapper/reducer?
I am using streaming with perl, and I want to get jobconf variable values. As many tutorials say they are in environment, but I can not get them.
For example, in reducer:
while (<STDIN>){
my $part = $ENV{"mapred.task.partition"};
print ("$part\n");
}
It turns out that $ENV{"mapred.task.partition"} is not defined.
HOWEVER, I can get myself defined variable value. For example:
$HADOOP_HOME/bin/hadoop \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-output myOutputDir \
-mapper mapper \
-reducer reducer \
-jobcont arg=test
In reducer:
while (<STDIN>){
my $part2 = $ENV{"arg"};
print ("$part2\n");
}
It works.
Anybody knows why is that? How to get jobconf variables in streaming? Thanks lot!
Re: How to get jobconf variables in streaming's mapper/reducer?
Posted by Peter Skomoroch <pe...@gmail.com>.
It took me a while to track this down, Todd is half right (at least for
18.3)...
mapred.task.partition actually turns into $mapred_task_partition (note it
is lowercase)
for example, to get the filename in the mapper of a python streaming job:
----------
import sys, os
filename = os.environ["map_input_file"]
taskpartition = os.environ["mapred_task_partition"]
filename will have the form:
hdfs://domU-12-31-38-01-6C-F1.compute-1.internal:9000/user/root/myinputs/gzpagecounts/pagecounts-20090501-030001.gz
See:
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200904.mbox/%3C49E13557.7090504@domaintools.com%3E
and
http://svn.apache.org/repos/asf/hadoop/core/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/PipeMapRed.java
-Pete
On Fri, May 15, 2009 at 8:01 PM, Todd Lipcon <to...@cloudera.com> wrote:
> Hi Steve,
>
> The variables are transformed before going to the mappers.
> mapred.task.partition turns into $MAPRED_TASK_PARTITION to be more unix-y
>
> -Todd
>
> On Fri, May 15, 2009 at 4:52 PM, Steve Gao <st...@yahoo.com> wrote:
>
> > I am using streaming with perl, and I want to get jobconf variable
> values.
> > As many tutorials say they are in environment, but I can not get them.
> >
> > For example, in reducer:
> > while (<STDIN>){
> > my $part = $ENV{"mapred.task.partition"};
> > print ("$part\n");
> > }
> >
> > It turns out that $ENV{"mapred.task.partition"} is not defined.
> >
> > HOWEVER, I can get myself defined variable value. For example:
> >
> > $HADOOP_HOME/bin/hadoop \
> > jar $HADOOP_HOME/hadoop-streaming.jar \
> > -input file1 \
> > -output myOutputDir \
> > -mapper mapper \
> > -reducer reducer \
> > -jobcont arg=test
> >
> > In reducer:
> >
> > while (<STDIN>){
> >
> > my $part2 = $ENV{"arg"};
> >
> > print ("$part2\n");
> >
> > }
> >
> >
> > It works.
> >
> > Anybody knows why is that? How to get jobconf variables in streaming?
> > Thanks lot!
> >
> >
> >
> >
>
--
Peter N. Skomoroch
617.285.8348
http://www.datawrangling.com
http://delicious.com/pskomoroch
http://twitter.com/peteskomoroch
Re: How to get jobconf variables in streaming's mapper/reducer?
Posted by Todd Lipcon <to...@cloudera.com>.
Hi Steve,
The variables are transformed before going to the mappers.
mapred.task.partition turns into $MAPRED_TASK_PARTITION to be more unix-y
-Todd
On Fri, May 15, 2009 at 4:52 PM, Steve Gao <st...@yahoo.com> wrote:
> I am using streaming with perl, and I want to get jobconf variable values.
> As many tutorials say they are in environment, but I can not get them.
>
> For example, in reducer:
> while (<STDIN>){
> my $part = $ENV{"mapred.task.partition"};
> print ("$part\n");
> }
>
> It turns out that $ENV{"mapred.task.partition"} is not defined.
>
> HOWEVER, I can get myself defined variable value. For example:
>
> $HADOOP_HOME/bin/hadoop \
> jar $HADOOP_HOME/hadoop-streaming.jar \
> -input file1 \
> -output myOutputDir \
> -mapper mapper \
> -reducer reducer \
> -jobcont arg=test
>
> In reducer:
>
> while (<STDIN>){
>
> my $part2 = $ENV{"arg"};
>
> print ("$part2\n");
>
> }
>
>
> It works.
>
> Anybody knows why is that? How to get jobconf variables in streaming?
> Thanks lot!
>
>
>
>