You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Erik Test <er...@gmail.com> on 2010/08/02 18:17:03 UTC

Set variables in mapper

Hi,

I'm trying to set a variable in my mapper class by reading an argument from
the command line and then passing the entry to the mapper from main. Is this
possible?

  public static void main(String[] args) throws Exception
  {
    JobConf conf = new JobConf(DistanceCalc2.class);
    conf.setJobName("Calculate Distances");

    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(DoubleWritable.class);

    conf.setMapperClass(Map.class);
    //conf.setReducerClass(Reduce.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));

    Map.setN(args[2]);

    JobClient.runJob(conf);
  }//main


  public static class Map extends MapReduceBase
    implements Mapper<LongWritable, Text,
      Text, DoubleWritable>
        {
               ...
               private static int N;

               ...

               public void map(LongWritable key, Text value,
                 OutputCollector<Text, DoubleWritable> output,
                  Reporter reporter) throws IOException
                {
                    ....
                    dim = tokens.length / N;
                    ...
                }

               public static void setN(String newN)
               {
                  N = Integer.parseInt(newN);
               }
        }

I've tried the code above but I get an error saying that I'm dividing by
zero. Obviously, the argument I enter for N isn't being set as specified.
Erik

Re: Set variables in mapper

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,

It would also be worthwhile to look at the Tool interface
(http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Tool),
which is used by example programs in the MapReduce examples as well.
This would allow any arguments to be passed using the
-Dvar.name=var.value convention on command line.

Thanks
Hemanth

On Mon, Aug 2, 2010 at 10:33 PM, Harsh J <qw...@gmail.com> wrote:
> And since it is an integer you're looking for, use the utility methods
> JobConf.setInt and JobConf.getInt:
>
> Integer N = Integer.parseInt(args[2]);
> JobConf.setInt("your.pack.some.name", N);
>
> And in the Mapper's "@Override void configure(JobConf conf)", do:
> conf.getInt("your.pack.some.name", 1 /* Or other default value */);
>
> On Mon, Aug 2, 2010 at 9:53 PM, Edward Capriolo <ed...@gmail.com> wrote:
>> On Mon, Aug 2, 2010 at 12:17 PM, Erik Test <er...@gmail.com> wrote:
>>> Hi,
>>>
>>> I'm trying to set a variable in my mapper class by reading an argument from
>>> the command line and then passing the entry to the mapper from main. Is this
>>> possible?
>>>
>>>  public static void main(String[] args) throws Exception
>>>  {
>>>    JobConf conf = new JobConf(DistanceCalc2.class);
>>>    conf.setJobName("Calculate Distances");
>>>
>>>    conf.setOutputKeyClass(Text.class);
>>>    conf.setOutputValueClass(DoubleWritable.class);
>>>
>>>    conf.setMapperClass(Map.class);
>>>    //conf.setReducerClass(Reduce.class);
>>>
>>>    conf.setInputFormat(TextInputFormat.class);
>>>    conf.setOutputFormat(TextOutputFormat.class);
>>>
>>>    FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>    FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>
>>>    Map.setN(args[2]);
>>>
>>>    JobClient.runJob(conf);
>>>  }//main
>>>
>>>
>>>  public static class Map extends MapReduceBase
>>>    implements Mapper<LongWritable, Text,
>>>      Text, DoubleWritable>
>>>        {
>>>               ...
>>>               private static int N;
>>>
>>>               ...
>>>
>>>               public void map(LongWritable key, Text value,
>>>                 OutputCollector<Text, DoubleWritable> output,
>>>                  Reporter reporter) throws IOException
>>>                {
>>>                    ....
>>>                    dim = tokens.length / N;
>>>                    ...
>>>                }
>>>
>>>               public static void setN(String newN)
>>>               {
>>>                  N = Integer.parseInt(newN);
>>>               }
>>>        }
>>>
>>> I've tried the code above but I get an error saying that I'm dividing by
>>> zero. Obviously, the argument I enter for N isn't being set as specified.
>>> Erik
>>>
>>
>> You can pass variables to the Job using the JobConf class.
>>
>> In your Driver class:
>> jobConf.set("clone_path", clonePath);
>>
>> Then in your mapper / reducer override configure:
>>
>>  private JobConf jobConf;
>>  public void configure(JobConf jobConf) {
>>        super.configure(jobConf);
>>        this.jobConf=jobConf;
>>  }
>>
>
>
>
> --
> Harsh J
> www.harshj.com
>

Re: Set variables in mapper

Posted by Harsh J <qw...@gmail.com>.
And since it is an integer you're looking for, use the utility methods
JobConf.setInt and JobConf.getInt:

Integer N = Integer.parseInt(args[2]);
JobConf.setInt("your.pack.some.name", N);

And in the Mapper's "@Override void configure(JobConf conf)", do:
conf.getInt("your.pack.some.name", 1 /* Or other default value */);

On Mon, Aug 2, 2010 at 9:53 PM, Edward Capriolo <ed...@gmail.com> wrote:
> On Mon, Aug 2, 2010 at 12:17 PM, Erik Test <er...@gmail.com> wrote:
>> Hi,
>>
>> I'm trying to set a variable in my mapper class by reading an argument from
>> the command line and then passing the entry to the mapper from main. Is this
>> possible?
>>
>>  public static void main(String[] args) throws Exception
>>  {
>>    JobConf conf = new JobConf(DistanceCalc2.class);
>>    conf.setJobName("Calculate Distances");
>>
>>    conf.setOutputKeyClass(Text.class);
>>    conf.setOutputValueClass(DoubleWritable.class);
>>
>>    conf.setMapperClass(Map.class);
>>    //conf.setReducerClass(Reduce.class);
>>
>>    conf.setInputFormat(TextInputFormat.class);
>>    conf.setOutputFormat(TextOutputFormat.class);
>>
>>    FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>    FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>
>>    Map.setN(args[2]);
>>
>>    JobClient.runJob(conf);
>>  }//main
>>
>>
>>  public static class Map extends MapReduceBase
>>    implements Mapper<LongWritable, Text,
>>      Text, DoubleWritable>
>>        {
>>               ...
>>               private static int N;
>>
>>               ...
>>
>>               public void map(LongWritable key, Text value,
>>                 OutputCollector<Text, DoubleWritable> output,
>>                  Reporter reporter) throws IOException
>>                {
>>                    ....
>>                    dim = tokens.length / N;
>>                    ...
>>                }
>>
>>               public static void setN(String newN)
>>               {
>>                  N = Integer.parseInt(newN);
>>               }
>>        }
>>
>> I've tried the code above but I get an error saying that I'm dividing by
>> zero. Obviously, the argument I enter for N isn't being set as specified.
>> Erik
>>
>
> You can pass variables to the Job using the JobConf class.
>
> In your Driver class:
> jobConf.set("clone_path", clonePath);
>
> Then in your mapper / reducer override configure:
>
>  private JobConf jobConf;
>  public void configure(JobConf jobConf) {
>        super.configure(jobConf);
>        this.jobConf=jobConf;
>  }
>



-- 
Harsh J
www.harshj.com

Re: Set variables in mapper

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Aug 2, 2010 at 12:17 PM, Erik Test <er...@gmail.com> wrote:
> Hi,
>
> I'm trying to set a variable in my mapper class by reading an argument from
> the command line and then passing the entry to the mapper from main. Is this
> possible?
>
>  public static void main(String[] args) throws Exception
>  {
>    JobConf conf = new JobConf(DistanceCalc2.class);
>    conf.setJobName("Calculate Distances");
>
>    conf.setOutputKeyClass(Text.class);
>    conf.setOutputValueClass(DoubleWritable.class);
>
>    conf.setMapperClass(Map.class);
>    //conf.setReducerClass(Reduce.class);
>
>    conf.setInputFormat(TextInputFormat.class);
>    conf.setOutputFormat(TextOutputFormat.class);
>
>    FileInputFormat.setInputPaths(conf, new Path(args[0]));
>    FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>
>    Map.setN(args[2]);
>
>    JobClient.runJob(conf);
>  }//main
>
>
>  public static class Map extends MapReduceBase
>    implements Mapper<LongWritable, Text,
>      Text, DoubleWritable>
>        {
>               ...
>               private static int N;
>
>               ...
>
>               public void map(LongWritable key, Text value,
>                 OutputCollector<Text, DoubleWritable> output,
>                  Reporter reporter) throws IOException
>                {
>                    ....
>                    dim = tokens.length / N;
>                    ...
>                }
>
>               public static void setN(String newN)
>               {
>                  N = Integer.parseInt(newN);
>               }
>        }
>
> I've tried the code above but I get an error saying that I'm dividing by
> zero. Obviously, the argument I enter for N isn't being set as specified.
> Erik
>

You can pass variables to the Job using the JobConf class.

In your Driver class:
jobConf.set("clone_path", clonePath);

Then in your mapper / reducer override configure:

  private JobConf jobConf;
  public void configure(JobConf jobConf) {
        super.configure(jobConf);
        this.jobConf=jobConf;
  }

Re: Set variables in mapper

Posted by Erik Test <er...@gmail.com>.
O ok. Yes this is clear now. Thanks for the explanation
Erik


On 3 August 2010 11:34, Owen O'Malley <om...@apache.org> wrote:

>
> On Aug 3, 2010, at 6:12 AM, Erik Test wrote:
>
>  Really? This seems pretty nice.
>>
>> In the future, with your implementation, would the value always have to be
>> wrapped in a MyMapper instance? How would parameters be removed if
>> necessary?
>>
>
> Sorry, I wasn't clear. I mean that if you make the sub-classes of Mapper
> serializable, the framework will serialize them for you and deserialize them
> on the cluster.
>
> So a fuller example would look like:
>
> public class MyMapper extends Mapper<IntWritable,Text,IntWritable,Text>
> implements Writable {
>  int param;
>
>  public MyMapper() { param = 0; }
>  public MyMapper(int param) { this.param = param; }
>
>  public void map(IntWritable key, Text value, Context context) {...}
>
>  public void readFields(DataInputStream in) throws IOException {
>    param = in.readInt();
>  }
>
>  public void write(DataOutputStream out) throws IOException {
>     out.writeInt(param);
>  }
> }
>
> You won't need to use Writable, you can use ProtocolBuffers, Thrift, or
> Avro. Where this comes in really handy is places like the InputFormats and
> OutputFormats. It enables you to replace the current:
>
> job.setInputFormatClass(SequenceFileInputFormat.class);
> FileInputFormat.setInputPath(job, inDir);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> FileOutputFormat.setOutputPath(job, outDir);
>
> with the more natural:
>
> job.setInputFormat(new SequenceFileInputFormat(inDir));
> job.setOutputFormat(new SequenceFileOutputFormat(outDir));
>
> Is that clearer now?
>
> -- Owen
>

Re: Set variables in mapper

Posted by Owen O'Malley <om...@apache.org>.
On Aug 3, 2010, at 6:12 AM, Erik Test wrote:

> Really? This seems pretty nice.
>
> In the future, with your implementation, would the value always have  
> to be
> wrapped in a MyMapper instance? How would parameters be removed if
> necessary?

Sorry, I wasn't clear. I mean that if you make the sub-classes of  
Mapper serializable, the framework will serialize them for you and  
deserialize them on the cluster.

So a fuller example would look like:

public class MyMapper extends  
Mapper<IntWritable,Text,IntWritable,Text> implements Writable {
   int param;

   public MyMapper() { param = 0; }
   public MyMapper(int param) { this.param = param; }

   public void map(IntWritable key, Text value, Context context) {...}

   public void readFields(DataInputStream in) throws IOException {
     param = in.readInt();
   }

   public void write(DataOutputStream out) throws IOException {
      out.writeInt(param);
   }
}

You won't need to use Writable, you can use ProtocolBuffers, Thrift,  
or Avro. Where this comes in really handy is places like the  
InputFormats and OutputFormats. It enables you to replace the current:

job.setInputFormatClass(SequenceFileInputFormat.class);
FileInputFormat.setInputPath(job, inDir);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
FileOutputFormat.setOutputPath(job, outDir);

with the more natural:

job.setInputFormat(new SequenceFileInputFormat(inDir));
job.setOutputFormat(new SequenceFileOutputFormat(outDir));

Is that clearer now?

-- Owen

Re: Set variables in mapper

Posted by Erik Test <er...@gmail.com>.
Really? This seems pretty nice.

In the future, with your implementation, would the value always have to be
wrapped in a MyMapper instance? How would parameters be removed if
necessary?

Erik


On 3 August 2010 02:37, Owen O'Malley <om...@apache.org> wrote:

>
> On Aug 2, 2010, at 9:17 AM, Erik Test wrote:
>
>  I'm trying to set a variable in my mapper class by reading an argument
>> from
>> the command line and then passing the entry to the mapper from main. Is
>> this
>> possible?
>>
>
> Others have already answered with the current solution of using JobConf to
> store the value. I should also note that I plan to implement MAPREDUCE-1183
> for 0.22. It will allow you to do this directly like:
>
> job.setMapper(new MyMapper(someIntegerParameter));
>
> which will serialize MyMapper's state, including the integer parameter, and
> store it as part of your job.
>
> -- Owen
>

Re: Set variables in mapper

Posted by Owen O'Malley <om...@apache.org>.
On Aug 2, 2010, at 9:17 AM, Erik Test wrote:

> I'm trying to set a variable in my mapper class by reading an  
> argument from
> the command line and then passing the entry to the mapper from main.  
> Is this
> possible?

Others have already answered with the current solution of using  
JobConf to store the value. I should also note that I plan to  
implement MAPREDUCE-1183 for 0.22. It will allow you to do this  
directly like:

job.setMapper(new MyMapper(someIntegerParameter));

which will serialize MyMapper's state, including the integer  
parameter, and store it as part of your job.

-- Owen