You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Mike Spreitzer <ms...@us.ibm.com> on 2013/02/28 22:16:27 UTC

How to make a MapReduce job with no input?

I am using the mapred API of Hadoop 1.0.  I want to make a job that does 
not really depend on any input (the job conf supplies all the info needed 
in Mapper).  What is a good way to do this?

What I have done so far is write a job in which MyMapper.configure(..) 
reads all the real input from the JobConf, and MyMapper.map(..) ignores 
the given key and value, writing the output implied by the JobConf.  I set 
the InputFormat to TextInputFormat and the input paths to be a list of one 
filename; the named file contains one line of text (the word "one"), 
terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I 
find it has two map tasks --- one reads the first two bytes of my 
non-input file, and other reads the last two bytes of my non-input file! 
How can I make a job with just one map task?

Thanks,
Mike

Re: How to make a MapReduce job with no input?

Posted by Harsh J <ha...@cloudera.com>.

The default # of map tasks is set to 2 (via mapred.map.tasks from
mapred-default.xml) - which explains your 2-map run for even one line
of text.

For running with no inputs, take a look at Sleep Job's EmptySplits
technique on trunk:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/SleepJob.java?view=markup
(~line 70)

On Fri, Mar 1, 2013 at 2:46 AM, Mike Spreitzer <ms...@us.ibm.com> wrote:
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does not
> really depend on any input (the job conf supplies all the info needed in
> Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..) reads
> all the real input from the JobConf, and MyMapper.map(..) ignores the given
> key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
>
> Thanks,
> Mike



--
Harsh J

Re: How to make a MapReduce job with no input?

Posted by Harsh J <ha...@cloudera.com>.

The default # of map tasks is set to 2 (via mapred.map.tasks from
mapred-default.xml) - which explains your 2-map run for even one line
of text.

For running with no inputs, take a look at Sleep Job's EmptySplits
technique on trunk:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/SleepJob.java?view=markup
(~line 70)

On Fri, Mar 1, 2013 at 2:46 AM, Mike Spreitzer <ms...@us.ibm.com> wrote:
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does not
> really depend on any input (the job conf supplies all the info needed in
> Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..) reads
> all the real input from the JobConf, and MyMapper.map(..) ignores the given
> key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
>
> Thanks,
> Mike



--
Harsh J

Re: How to make a MapReduce job with no input?

Posted by David Boyd <db...@lorenzresearch.com>.

Below is some code I use.  Basically, the number of iterations
is the number of fake records to supply to each mapper.  You control
the number of mappers via the jobconf.
>  public static class EmptySplit implements InputSplit {
>     public void write(DataOutput out) throws IOException { }
>     public void readFields(DataInput in) throws IOException { }
>     public long getLength() { return 0L; }
>     public String[] getLocations() { return new String[0]; }
>   }
>
>   public static class FFTBenchInputFormat extends Configured
>       implements InputFormat<IntWritable,IntWritable> {
>     public InputSplit[] getSplits(JobConf conf, int numSplits) {
>       InputSplit[] ret = new InputSplit[numSplits];
>       for (int i = 0; i < numSplits; ++i) {
>         ret[i] = new EmptySplit();
>       }
>       return ret;
>     }
>     public RecordReader<IntWritable,IntWritable> getRecordReader(
>         InputSplit ignored, JobConf conf, Reporter reporter)
>         throws IOException {
>       final int size = conf.getInt("fftbench.map.size", 1);
>       if (size < 0) throw new IOException("Invalid map size: " + size);
>       final int iterations = conf.getInt("fftbench.map.iterations", 1);
>       if (iterations < 0) throw new IOException("Invalid map iterations: " + size);
>     return new RecordReader<IntWritable,IntWritable>() {
>         private int records = 0;
>         private int emitCount = 0;
>
>         public boolean next(IntWritable key, IntWritable value)
>             throws IOException {
>           key.set(size);
>           int emit = emitCount++;
>           value.set(emit);
>           return records++ < iterations;
>         }
>         public IntWritable createKey() { return new IntWritable(); }
>         public IntWritable createValue() { return new IntWritable(); }
>         public long getPos() throws IOException { return records; }
>         public void close() throws IOException { }
>         public float getProgress() throws IOException {
>           return records / ((float)iterations);
>         }
>       };
>     }
>   }


On 2/28/2013 4:16 PM, Mike Spreitzer wrote:
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info
> needed in Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores
> the given key and value, writing the output implied by the JobConf.  I
> set the InputFormat to TextInputFormat and the input paths to be a list
> of one filename; the named file contains one line of text (the word
> "one"), terminated by a newline.  When I run this job (on Linux,
> hadoop-1.0.0), I find it has two map tasks --- one reads the first two
> bytes of my non-input file, and other reads the last two bytes of my
> non-input file!  How can I make a job with just one map task?
>
> Thanks,
> Mike

-- 
========= mailto:dboyd@lorenzresearch.com ============
David W. Boyd
Vice President, Operations
Lorenz Research, a Data Tactics corporation
7901 Jones Branch, Suite 610
Mclean, VA 22102
office:   +1-703-506-3735, ext 308
fax:     +1-703-506-6703
cell:     +1-703-402-7908
============== http://www.lorenzresearch.com/ ============


The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.

Re: How to make a MapReduce job with no input?

Posted by Edward Capriolo <ed...@gmail.com>.

I made a https://github.com/edwardcapriolo/DualInputFormat for hive.
Always returns 1 split with 1 run. You can write the same type of
thing to create N splits.

On Thu, Feb 28, 2013 at 8:41 PM, Jeff Kubina <je...@gmail.com> wrote:
> Mike,
>
> To do this for the more general case of creating N map jobs with each job
> receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote an
> InputFormat, InputSplit, and RecordReader Hadoop class. The sample code is
> here. I think I wrote those for Hadoop 0.19, so they may need some tweaking
> for subsequent versions.
>
> Jeff
>
> On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <ms...@us.ibm.com> wrote:
>>
>> On closer inspection, I see that of my two tasks: the first processes 1
>> input record and the other processes 0 input records.  So I think this
>> solution is correct.  But perhaps it is not the most direct way to get the
>> job done?
>>
>>
>>
>>
>> From:        Mike Spreitzer/Watson/IBM@IBMUS
>> To:        user@hadoop.apache.org,
>> Date:        02/28/2013 04:18 PM
>> Subject:        How to make a MapReduce job with no input?
>> ________________________________
>>
>>
>>
>> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
>> not really depend on any input (the job conf supplies all the info needed in
>> Mapper).  What is a good way to do this?
>>
>> What I have done so far is write a job in which MyMapper.configure(..)
>> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
>> given key and value, writing the output implied by the JobConf.  I set the
>> InputFormat to TextInputFormat and the input paths to be a list of one
>> filename; the named file contains one line of text (the word "one"),
>> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
>> find it has two map tasks --- one reads the first two bytes of my non-input
>> file, and other reads the last two bytes of my non-input file!  How can I
>> make a job with just one map task?
>>
>> Thanks,
>> Mike
>
>

Re: How to make a MapReduce job with no input?

Posted by Edward Capriolo <ed...@gmail.com>.

I made a https://github.com/edwardcapriolo/DualInputFormat for hive.
Always returns 1 split with 1 run. You can write the same type of
thing to create N splits.

On Thu, Feb 28, 2013 at 8:41 PM, Jeff Kubina <je...@gmail.com> wrote:
> Mike,
>
> To do this for the more general case of creating N map jobs with each job
> receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote an
> InputFormat, InputSplit, and RecordReader Hadoop class. The sample code is
> here. I think I wrote those for Hadoop 0.19, so they may need some tweaking
> for subsequent versions.
>
> Jeff
>
> On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <ms...@us.ibm.com> wrote:
>>
>> On closer inspection, I see that of my two tasks: the first processes 1
>> input record and the other processes 0 input records.  So I think this
>> solution is correct.  But perhaps it is not the most direct way to get the
>> job done?
>>
>>
>>
>>
>> From:        Mike Spreitzer/Watson/IBM@IBMUS
>> To:        user@hadoop.apache.org,
>> Date:        02/28/2013 04:18 PM
>> Subject:        How to make a MapReduce job with no input?
>> ________________________________
>>
>>
>>
>> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
>> not really depend on any input (the job conf supplies all the info needed in
>> Mapper).  What is a good way to do this?
>>
>> What I have done so far is write a job in which MyMapper.configure(..)
>> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
>> given key and value, writing the output implied by the JobConf.  I set the
>> InputFormat to TextInputFormat and the input paths to be a list of one
>> filename; the named file contains one line of text (the word "one"),
>> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
>> find it has two map tasks --- one reads the first two bytes of my non-input
>> file, and other reads the last two bytes of my non-input file!  How can I
>> make a job with just one map task?
>>
>> Thanks,
>> Mike
>
>

Re: How to make a MapReduce job with no input?

Posted by Edward Capriolo <ed...@gmail.com>.

I made a https://github.com/edwardcapriolo/DualInputFormat for hive.
Always returns 1 split with 1 run. You can write the same type of
thing to create N splits.

On Thu, Feb 28, 2013 at 8:41 PM, Jeff Kubina <je...@gmail.com> wrote:
> Mike,
>
> To do this for the more general case of creating N map jobs with each job
> receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote an
> InputFormat, InputSplit, and RecordReader Hadoop class. The sample code is
> here. I think I wrote those for Hadoop 0.19, so they may need some tweaking
> for subsequent versions.
>
> Jeff
>
> On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <ms...@us.ibm.com> wrote:
>>
>> On closer inspection, I see that of my two tasks: the first processes 1
>> input record and the other processes 0 input records.  So I think this
>> solution is correct.  But perhaps it is not the most direct way to get the
>> job done?
>>
>>
>>
>>
>> From:        Mike Spreitzer/Watson/IBM@IBMUS
>> To:        user@hadoop.apache.org,
>> Date:        02/28/2013 04:18 PM
>> Subject:        How to make a MapReduce job with no input?
>> ________________________________
>>
>>
>>
>> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
>> not really depend on any input (the job conf supplies all the info needed in
>> Mapper).  What is a good way to do this?
>>
>> What I have done so far is write a job in which MyMapper.configure(..)
>> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
>> given key and value, writing the output implied by the JobConf.  I set the
>> InputFormat to TextInputFormat and the input paths to be a list of one
>> filename; the named file contains one line of text (the word "one"),
>> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
>> find it has two map tasks --- one reads the first two bytes of my non-input
>> file, and other reads the last two bytes of my non-input file!  How can I
>> make a job with just one map task?
>>
>> Thanks,
>> Mike
>
>

Re: How to make a MapReduce job with no input?

Posted by Edward Capriolo <ed...@gmail.com>.

I made a https://github.com/edwardcapriolo/DualInputFormat for hive.
Always returns 1 split with 1 run. You can write the same type of
thing to create N splits.

On Thu, Feb 28, 2013 at 8:41 PM, Jeff Kubina <je...@gmail.com> wrote:
> Mike,
>
> To do this for the more general case of creating N map jobs with each job
> receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote an
> InputFormat, InputSplit, and RecordReader Hadoop class. The sample code is
> here. I think I wrote those for Hadoop 0.19, so they may need some tweaking
> for subsequent versions.
>
> Jeff
>
> On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <ms...@us.ibm.com> wrote:
>>
>> On closer inspection, I see that of my two tasks: the first processes 1
>> input record and the other processes 0 input records.  So I think this
>> solution is correct.  But perhaps it is not the most direct way to get the
>> job done?
>>
>>
>>
>>
>> From:        Mike Spreitzer/Watson/IBM@IBMUS
>> To:        user@hadoop.apache.org,
>> Date:        02/28/2013 04:18 PM
>> Subject:        How to make a MapReduce job with no input?
>> ________________________________
>>
>>
>>
>> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
>> not really depend on any input (the job conf supplies all the info needed in
>> Mapper).  What is a good way to do this?
>>
>> What I have done so far is write a job in which MyMapper.configure(..)
>> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
>> given key and value, writing the output implied by the JobConf.  I set the
>> InputFormat to TextInputFormat and the input paths to be a list of one
>> filename; the named file contains one line of text (the word "one"),
>> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
>> find it has two map tasks --- one reads the first two bytes of my non-input
>> file, and other reads the last two bytes of my non-input file!  How can I
>> make a job with just one map task?
>>
>> Thanks,
>> Mike
>
>

Re: How to make a MapReduce job with no input?

Posted by Jeff Kubina <je...@gmail.com>.

Mike,

To do this for the more general case of creating N map jobs with each job
receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote
an InputFormat, InputSplit, and RecordReader Hadoop class. The sample code
is here <http://goo.gl/npKfP>. I think I wrote those for Hadoop 0.19, so
they may need some tweaking for subsequent versions.

Jeff

On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <ms...@us.ibm.com> wrote:

> On closer inspection, I see that of my two tasks: the first processes 1
> input record and the other processes 0 input records.  So I think this
> solution is correct.  But perhaps it is not the most direct way to get the
> job done?
>
>
>
>
> From:        Mike Spreitzer/Watson/IBM@IBMUS
> To:        user@hadoop.apache.org,
> Date:        02/28/2013 04:18 PM
> Subject:        How to make a MapReduce job with no input?
> ------------------------------
>
>
>
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info needed
> in Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
> given key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
>
> Thanks,
> Mike
>

Re: How to make a MapReduce job with no input?

Posted by Jeff Kubina <je...@gmail.com>.

Mike,

To do this for the more general case of creating N map jobs with each job
receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote
an InputFormat, InputSplit, and RecordReader Hadoop class. The sample code
is here <http://goo.gl/npKfP>. I think I wrote those for Hadoop 0.19, so
they may need some tweaking for subsequent versions.

Jeff

On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <ms...@us.ibm.com> wrote:

> On closer inspection, I see that of my two tasks: the first processes 1
> input record and the other processes 0 input records.  So I think this
> solution is correct.  But perhaps it is not the most direct way to get the
> job done?
>
>
>
>
> From:        Mike Spreitzer/Watson/IBM@IBMUS
> To:        user@hadoop.apache.org,
> Date:        02/28/2013 04:18 PM
> Subject:        How to make a MapReduce job with no input?
> ------------------------------
>
>
>
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info needed
> in Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
> given key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
>
> Thanks,
> Mike
>

Re: How to make a MapReduce job with no input?

Posted by Jeff Kubina <je...@gmail.com>.

Mike,

To do this for the more general case of creating N map jobs with each job
receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote
an InputFormat, InputSplit, and RecordReader Hadoop class. The sample code
is here <http://goo.gl/npKfP>. I think I wrote those for Hadoop 0.19, so
they may need some tweaking for subsequent versions.

Jeff

On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <ms...@us.ibm.com> wrote:

> On closer inspection, I see that of my two tasks: the first processes 1
> input record and the other processes 0 input records.  So I think this
> solution is correct.  But perhaps it is not the most direct way to get the
> job done?
>
>
>
>
> From:        Mike Spreitzer/Watson/IBM@IBMUS
> To:        user@hadoop.apache.org,
> Date:        02/28/2013 04:18 PM
> Subject:        How to make a MapReduce job with no input?
> ------------------------------
>
>
>
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info needed
> in Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
> given key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
>
> Thanks,
> Mike
>

Re: How to make a MapReduce job with no input?

Posted by Jeff Kubina <je...@gmail.com>.

Mike,

To do this for the more general case of creating N map jobs with each job
receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote
an InputFormat, InputSplit, and RecordReader Hadoop class. The sample code
is here <http://goo.gl/npKfP>. I think I wrote those for Hadoop 0.19, so
they may need some tweaking for subsequent versions.

Jeff

On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <ms...@us.ibm.com> wrote:

> On closer inspection, I see that of my two tasks: the first processes 1
> input record and the other processes 0 input records.  So I think this
> solution is correct.  But perhaps it is not the most direct way to get the
> job done?
>
>
>
>
> From:        Mike Spreitzer/Watson/IBM@IBMUS
> To:        user@hadoop.apache.org,
> Date:        02/28/2013 04:18 PM
> Subject:        How to make a MapReduce job with no input?
> ------------------------------
>
>
>
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info needed
> in Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
> given key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
>
> Thanks,
> Mike
>

Re: How to make a MapReduce job with no input?

Posted by Mike Spreitzer <ms...@us.ibm.com>.

On closer inspection, I see that of my two tasks: the first processes 1 
input record and the other processes 0 input records.  So I think this 
solution is correct.  But perhaps it is not the most direct way to get the 
job done?

From:   Mike Spreitzer/Watson/IBM@IBMUS
To:     user@hadoop.apache.org, 
Date:   02/28/2013 04:18 PM
Subject:        How to make a MapReduce job with no input?

I am using the mapred API of Hadoop 1.0.  I want to make a job that does 
not really depend on any input (the job conf supplies all the info needed 
in Mapper).  What is a good way to do this? 

What I have done so far is write a job in which MyMapper.configure(..) 
reads all the real input from the JobConf, and MyMapper.map(..) ignores 
the given key and value, writing the output implied by the JobConf.  I set 
the InputFormat to TextInputFormat and the input paths to be a list of one 
filename; the named file contains one line of text (the word "one"), 
terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I 
find it has two map tasks --- one reads the first two bytes of my 
non-input file, and other reads the last two bytes of my non-input file! 
How can I make a job with just one map task? 

Thanks, 
Mike

Re: How to make a MapReduce job with no input?

Posted by Harsh J <ha...@cloudera.com>.

The default # of map tasks is set to 2 (via mapred.map.tasks from
mapred-default.xml) - which explains your 2-map run for even one line
of text.

For running with no inputs, take a look at Sleep Job's EmptySplits
technique on trunk:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/SleepJob.java?view=markup
(~line 70)

On Fri, Mar 1, 2013 at 2:46 AM, Mike Spreitzer <ms...@us.ibm.com> wrote:
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does not
> really depend on any input (the job conf supplies all the info needed in
> Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..) reads
> all the real input from the JobConf, and MyMapper.map(..) ignores the given
> key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
>
> Thanks,
> Mike



--
Harsh J

Re: How to make a MapReduce job with no input?

Posted by David Boyd <db...@lorenzresearch.com>.

Below is some code I use.  Basically, the number of iterations
is the number of fake records to supply to each mapper.  You control
the number of mappers via the jobconf.
>  public static class EmptySplit implements InputSplit {
>     public void write(DataOutput out) throws IOException { }
>     public void readFields(DataInput in) throws IOException { }
>     public long getLength() { return 0L; }
>     public String[] getLocations() { return new String[0]; }
>   }
>
>   public static class FFTBenchInputFormat extends Configured
>       implements InputFormat<IntWritable,IntWritable> {
>     public InputSplit[] getSplits(JobConf conf, int numSplits) {
>       InputSplit[] ret = new InputSplit[numSplits];
>       for (int i = 0; i < numSplits; ++i) {
>         ret[i] = new EmptySplit();
>       }
>       return ret;
>     }
>     public RecordReader<IntWritable,IntWritable> getRecordReader(
>         InputSplit ignored, JobConf conf, Reporter reporter)
>         throws IOException {
>       final int size = conf.getInt("fftbench.map.size", 1);
>       if (size < 0) throw new IOException("Invalid map size: " + size);
>       final int iterations = conf.getInt("fftbench.map.iterations", 1);
>       if (iterations < 0) throw new IOException("Invalid map iterations: " + size);
>     return new RecordReader<IntWritable,IntWritable>() {
>         private int records = 0;
>         private int emitCount = 0;
>
>         public boolean next(IntWritable key, IntWritable value)
>             throws IOException {
>           key.set(size);
>           int emit = emitCount++;
>           value.set(emit);
>           return records++ < iterations;
>         }
>         public IntWritable createKey() { return new IntWritable(); }
>         public IntWritable createValue() { return new IntWritable(); }
>         public long getPos() throws IOException { return records; }
>         public void close() throws IOException { }
>         public float getProgress() throws IOException {
>           return records / ((float)iterations);
>         }
>       };
>     }
>   }


On 2/28/2013 4:16 PM, Mike Spreitzer wrote:
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info
> needed in Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores
> the given key and value, writing the output implied by the JobConf.  I
> set the InputFormat to TextInputFormat and the input paths to be a list
> of one filename; the named file contains one line of text (the word
> "one"), terminated by a newline.  When I run this job (on Linux,
> hadoop-1.0.0), I find it has two map tasks --- one reads the first two
> bytes of my non-input file, and other reads the last two bytes of my
> non-input file!  How can I make a job with just one map task?
>
> Thanks,
> Mike

-- 
========= mailto:dboyd@lorenzresearch.com ============
David W. Boyd
Vice President, Operations
Lorenz Research, a Data Tactics corporation
7901 Jones Branch, Suite 610
Mclean, VA 22102
office:   +1-703-506-3735, ext 308
fax:     +1-703-506-6703
cell:     +1-703-402-7908
============== http://www.lorenzresearch.com/ ============


The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.

Re: How to make a MapReduce job with no input?

Posted by Harsh J <ha...@cloudera.com>.

The default # of map tasks is set to 2 (via mapred.map.tasks from
mapred-default.xml) - which explains your 2-map run for even one line
of text.

For running with no inputs, take a look at Sleep Job's EmptySplits
technique on trunk:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/SleepJob.java?view=markup
(~line 70)

On Fri, Mar 1, 2013 at 2:46 AM, Mike Spreitzer <ms...@us.ibm.com> wrote:
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does not
> really depend on any input (the job conf supplies all the info needed in
> Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..) reads
> all the real input from the JobConf, and MyMapper.map(..) ignores the given
> key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
>
> Thanks,
> Mike



--
Harsh J

Re: How to make a MapReduce job with no input?

Posted by Mike Spreitzer <ms...@us.ibm.com>.

On closer inspection, I see that of my two tasks: the first processes 1 
input record and the other processes 0 input records.  So I think this 
solution is correct.  But perhaps it is not the most direct way to get the 
job done?

From:   Mike Spreitzer/Watson/IBM@IBMUS
To:     user@hadoop.apache.org, 
Date:   02/28/2013 04:18 PM
Subject:        How to make a MapReduce job with no input?

I am using the mapred API of Hadoop 1.0.  I want to make a job that does 
not really depend on any input (the job conf supplies all the info needed 
in Mapper).  What is a good way to do this? 

What I have done so far is write a job in which MyMapper.configure(..) 
reads all the real input from the JobConf, and MyMapper.map(..) ignores 
the given key and value, writing the output implied by the JobConf.  I set 
the InputFormat to TextInputFormat and the input paths to be a list of one 
filename; the named file contains one line of text (the word "one"), 
terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I 
find it has two map tasks --- one reads the first two bytes of my 
non-input file, and other reads the last two bytes of my non-input file! 
How can I make a job with just one map task? 

Thanks, 
Mike

Re: How to make a MapReduce job with no input?

Posted by David Boyd <db...@lorenzresearch.com>.

Below is some code I use.  Basically, the number of iterations
is the number of fake records to supply to each mapper.  You control
the number of mappers via the jobconf.
>  public static class EmptySplit implements InputSplit {
>     public void write(DataOutput out) throws IOException { }
>     public void readFields(DataInput in) throws IOException { }
>     public long getLength() { return 0L; }
>     public String[] getLocations() { return new String[0]; }
>   }
>
>   public static class FFTBenchInputFormat extends Configured
>       implements InputFormat<IntWritable,IntWritable> {
>     public InputSplit[] getSplits(JobConf conf, int numSplits) {
>       InputSplit[] ret = new InputSplit[numSplits];
>       for (int i = 0; i < numSplits; ++i) {
>         ret[i] = new EmptySplit();
>       }
>       return ret;
>     }
>     public RecordReader<IntWritable,IntWritable> getRecordReader(
>         InputSplit ignored, JobConf conf, Reporter reporter)
>         throws IOException {
>       final int size = conf.getInt("fftbench.map.size", 1);
>       if (size < 0) throw new IOException("Invalid map size: " + size);
>       final int iterations = conf.getInt("fftbench.map.iterations", 1);
>       if (iterations < 0) throw new IOException("Invalid map iterations: " + size);
>     return new RecordReader<IntWritable,IntWritable>() {
>         private int records = 0;
>         private int emitCount = 0;
>
>         public boolean next(IntWritable key, IntWritable value)
>             throws IOException {
>           key.set(size);
>           int emit = emitCount++;
>           value.set(emit);
>           return records++ < iterations;
>         }
>         public IntWritable createKey() { return new IntWritable(); }
>         public IntWritable createValue() { return new IntWritable(); }
>         public long getPos() throws IOException { return records; }
>         public void close() throws IOException { }
>         public float getProgress() throws IOException {
>           return records / ((float)iterations);
>         }
>       };
>     }
>   }


On 2/28/2013 4:16 PM, Mike Spreitzer wrote:
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info
> needed in Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores
> the given key and value, writing the output implied by the JobConf.  I
> set the InputFormat to TextInputFormat and the input paths to be a list
> of one filename; the named file contains one line of text (the word
> "one"), terminated by a newline.  When I run this job (on Linux,
> hadoop-1.0.0), I find it has two map tasks --- one reads the first two
> bytes of my non-input file, and other reads the last two bytes of my
> non-input file!  How can I make a job with just one map task?
>
> Thanks,
> Mike

-- 
========= mailto:dboyd@lorenzresearch.com ============
David W. Boyd
Vice President, Operations
Lorenz Research, a Data Tactics corporation
7901 Jones Branch, Suite 610
Mclean, VA 22102
office:   +1-703-506-3735, ext 308
fax:     +1-703-506-6703
cell:     +1-703-402-7908
============== http://www.lorenzresearch.com/ ============


The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.

Re: How to make a MapReduce job with no input?

Posted by David Boyd <db...@lorenzresearch.com>.

Below is some code I use.  Basically, the number of iterations
is the number of fake records to supply to each mapper.  You control
the number of mappers via the jobconf.
>  public static class EmptySplit implements InputSplit {
>     public void write(DataOutput out) throws IOException { }
>     public void readFields(DataInput in) throws IOException { }
>     public long getLength() { return 0L; }
>     public String[] getLocations() { return new String[0]; }
>   }
>
>   public static class FFTBenchInputFormat extends Configured
>       implements InputFormat<IntWritable,IntWritable> {
>     public InputSplit[] getSplits(JobConf conf, int numSplits) {
>       InputSplit[] ret = new InputSplit[numSplits];
>       for (int i = 0; i < numSplits; ++i) {
>         ret[i] = new EmptySplit();
>       }
>       return ret;
>     }
>     public RecordReader<IntWritable,IntWritable> getRecordReader(
>         InputSplit ignored, JobConf conf, Reporter reporter)
>         throws IOException {
>       final int size = conf.getInt("fftbench.map.size", 1);
>       if (size < 0) throw new IOException("Invalid map size: " + size);
>       final int iterations = conf.getInt("fftbench.map.iterations", 1);
>       if (iterations < 0) throw new IOException("Invalid map iterations: " + size);
>     return new RecordReader<IntWritable,IntWritable>() {
>         private int records = 0;
>         private int emitCount = 0;
>
>         public boolean next(IntWritable key, IntWritable value)
>             throws IOException {
>           key.set(size);
>           int emit = emitCount++;
>           value.set(emit);
>           return records++ < iterations;
>         }
>         public IntWritable createKey() { return new IntWritable(); }
>         public IntWritable createValue() { return new IntWritable(); }
>         public long getPos() throws IOException { return records; }
>         public void close() throws IOException { }
>         public float getProgress() throws IOException {
>           return records / ((float)iterations);
>         }
>       };
>     }
>   }


On 2/28/2013 4:16 PM, Mike Spreitzer wrote:
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info
> needed in Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores
> the given key and value, writing the output implied by the JobConf.  I
> set the InputFormat to TextInputFormat and the input paths to be a list
> of one filename; the named file contains one line of text (the word
> "one"), terminated by a newline.  When I run this job (on Linux,
> hadoop-1.0.0), I find it has two map tasks --- one reads the first two
> bytes of my non-input file, and other reads the last two bytes of my
> non-input file!  How can I make a job with just one map task?
>
> Thanks,
> Mike

-- 
========= mailto:dboyd@lorenzresearch.com ============
David W. Boyd
Vice President, Operations
Lorenz Research, a Data Tactics corporation
7901 Jones Branch, Suite 610
Mclean, VA 22102
office:   +1-703-506-3735, ext 308
fax:     +1-703-506-6703
cell:     +1-703-402-7908
============== http://www.lorenzresearch.com/ ============


The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.

Re: How to make a MapReduce job with no input?

Posted by Mike Spreitzer <ms...@us.ibm.com>.

On closer inspection, I see that of my two tasks: the first processes 1 
input record and the other processes 0 input records.  So I think this 
solution is correct.  But perhaps it is not the most direct way to get the 
job done?

From:   Mike Spreitzer/Watson/IBM@IBMUS
To:     user@hadoop.apache.org, 
Date:   02/28/2013 04:18 PM
Subject:        How to make a MapReduce job with no input?

I am using the mapred API of Hadoop 1.0.  I want to make a job that does 
not really depend on any input (the job conf supplies all the info needed 
in Mapper).  What is a good way to do this? 

What I have done so far is write a job in which MyMapper.configure(..) 
reads all the real input from the JobConf, and MyMapper.map(..) ignores 
the given key and value, writing the output implied by the JobConf.  I set 
the InputFormat to TextInputFormat and the input paths to be a list of one 
filename; the named file contains one line of text (the word "one"), 
terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I 
find it has two map tasks --- one reads the first two bytes of my 
non-input file, and other reads the last two bytes of my non-input file! 
How can I make a job with just one map task? 

Thanks, 
Mike

Re: How to make a MapReduce job with no input?

Posted by Mike Spreitzer <ms...@us.ibm.com>.

On closer inspection, I see that of my two tasks: the first processes 1 
input record and the other processes 0 input records.  So I think this 
solution is correct.  But perhaps it is not the most direct way to get the 
job done?

From:   Mike Spreitzer/Watson/IBM@IBMUS
To:     user@hadoop.apache.org, 
Date:   02/28/2013 04:18 PM
Subject:        How to make a MapReduce job with no input?

I am using the mapred API of Hadoop 1.0.  I want to make a job that does 
not really depend on any input (the job conf supplies all the info needed 
in Mapper).  What is a good way to do this? 

What I have done so far is write a job in which MyMapper.configure(..) 
reads all the real input from the JobConf, and MyMapper.map(..) ignores 
the given key and value, writing the output implied by the JobConf.  I set 
the InputFormat to TextInputFormat and the input paths to be a list of one 
filename; the named file contains one line of text (the word "one"), 
terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I 
find it has two map tasks --- one reads the first two bytes of my 
non-input file, and other reads the last two bytes of my non-input file! 
How can I make a job with just one map task? 

Thanks, 
Mike