You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by kumudu harshani <ku...@gmail.com> on 2012/09/23 05:40:05 UTC
How to set 2mappers on 1 job
Hi...
Could someone help me with following scenario..
I want implement a job which should get 2 mapper outputs and send them to 1
reducer. Attached image show the flow I wanted....
Normal flow is like:
JobConf conf2 = new JobConf(WordCount.class);
Job job2 = new Job(conf2);
conf2.setOutputKeyClass(IntWritable.class);
conf2.setOutputValueClass(Text.class);
conf2.setMapperClass(Map1.class);
conf2.setReducerClass(Reduce1.class);
--- where it takes 1 mapper and 1 reducer. What i want is to set 2
maps(mapper1a, mapper1b) and 1 reducer...
Is that possible, if so could someone please help..
thanks
kumudu
--
*Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
Partner | Software Engineer I | m: +94 719 258 242 |
www.microsoft.com/enterprisesearch
Re: How to set 2mappers on 1 job
Posted by Bertrand Dechoux <de...@gmail.com>.
Harsh's solution is indeed cleaner and must be what you were looking for
(and there is a version for both mapred and mapreduce packages).
If you are curious, see :
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapred/lib/MultipleInputs.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapred/lib/DelegatingMapper.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapreduce/lib/input/DelegatingMapper.java
Regards
Bertrand
On Sun, Sep 23, 2012 at 7:37 AM, Harsh J <ha...@cloudera.com> wrote:
> Hi,
>
> There's an easier way to do what Bertrand has suggested. Look at
> MultipleInputs class:
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.htmland see this blog post for an example on how to use it:
> http://kickstarthadoop.blogspot.in/2011/09/joins-with-plain-map-reduce.html
>
> Note though that the reducer input key and value types are singular, and
> you need to ensure that. There's no easy way around that aside of using
> generic containers.
>
>
> On Sun, Sep 23, 2012 at 9:34 AM, kumudu harshani <kumuduharshani@gmail.com
> > wrote:
>
>> I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
>>
>> The confusion i have is, if i put like:
>>
>> JobConf conf2 = new JobConf(WordCount.class);
>> Job job2 = new Job(conf2);
>> conf2.setOutputKeyClass(IntWritable.class);
>> conf2.setOutputValueClass(Text.class);
>>
>> conf2.setMapperClass(Map1.class);
>> conf2.setReducerClass(Reduce1.class);
>>
>> ---it will execute Map1.class and then Reduce1.class.
>>
>> so if i have Mapper1a.class and Mapper2a.class, how should i write the
>> code of job to execute both and then execute Reducer.class such that,
>> Reducer will take both mappers (1a, 1b) emit outputs...
>>
>> thanks
>> kumudu
>>
>> On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> You can use the map.input.file property to decide which logic should
>>> your mapper apply.
>>> Regards
>>> Bertrand
>>>
>>>
>>> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <
>>> kumuduharshani@gmail.com> wrote:
>>>
>>>> Hi...
>>>> Could someone help me with following scenario..
>>>>
>>>> I want implement a job which should get 2 mapper outputs and send them
>>>> to 1 reducer. Attached image show the flow I wanted....
>>>>
>>>>
>>>>
>>>>
>>>> Normal flow is like:
>>>>
>>>> JobConf conf2 = new JobConf(WordCount.class);
>>>> Job job2 = new Job(conf2);
>>>> conf2.setOutputKeyClass(IntWritable.class);
>>>> conf2.setOutputValueClass(Text.class);
>>>>
>>>> conf2.setMapperClass(Map1.class);
>>>> conf2.setReducerClass(Reduce1.class);
>>>>
>>>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>>>> maps(mapper1a, mapper1b) and 1 reducer...
>>>> Is that possible, if so could someone please help..
>>>>
>>>> thanks
>>>> kumudu
>>>> --
>>>>
>>>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>>>> Partner | Software Engineer I | m: +94 719 258 242 |
>>>> www.microsoft.com/enterprisesearch
>>>>
>>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>>
>> --
>>
>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>> Partner | Software Engineer I | m: +94 719 258 242 |
>> www.microsoft.com/enterprisesearch
>>
>>
>
>
> --
> Harsh J
>
--
Bertrand Dechoux
Re: How to set 2mappers on 1 job
Posted by Bertrand Dechoux <de...@gmail.com>.
Harsh's solution is indeed cleaner and must be what you were looking for
(and there is a version for both mapred and mapreduce packages).
If you are curious, see :
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapred/lib/MultipleInputs.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapred/lib/DelegatingMapper.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapreduce/lib/input/DelegatingMapper.java
Regards
Bertrand
On Sun, Sep 23, 2012 at 7:37 AM, Harsh J <ha...@cloudera.com> wrote:
> Hi,
>
> There's an easier way to do what Bertrand has suggested. Look at
> MultipleInputs class:
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.htmland see this blog post for an example on how to use it:
> http://kickstarthadoop.blogspot.in/2011/09/joins-with-plain-map-reduce.html
>
> Note though that the reducer input key and value types are singular, and
> you need to ensure that. There's no easy way around that aside of using
> generic containers.
>
>
> On Sun, Sep 23, 2012 at 9:34 AM, kumudu harshani <kumuduharshani@gmail.com
> > wrote:
>
>> I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
>>
>> The confusion i have is, if i put like:
>>
>> JobConf conf2 = new JobConf(WordCount.class);
>> Job job2 = new Job(conf2);
>> conf2.setOutputKeyClass(IntWritable.class);
>> conf2.setOutputValueClass(Text.class);
>>
>> conf2.setMapperClass(Map1.class);
>> conf2.setReducerClass(Reduce1.class);
>>
>> ---it will execute Map1.class and then Reduce1.class.
>>
>> so if i have Mapper1a.class and Mapper2a.class, how should i write the
>> code of job to execute both and then execute Reducer.class such that,
>> Reducer will take both mappers (1a, 1b) emit outputs...
>>
>> thanks
>> kumudu
>>
>> On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> You can use the map.input.file property to decide which logic should
>>> your mapper apply.
>>> Regards
>>> Bertrand
>>>
>>>
>>> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <
>>> kumuduharshani@gmail.com> wrote:
>>>
>>>> Hi...
>>>> Could someone help me with following scenario..
>>>>
>>>> I want implement a job which should get 2 mapper outputs and send them
>>>> to 1 reducer. Attached image show the flow I wanted....
>>>>
>>>>
>>>>
>>>>
>>>> Normal flow is like:
>>>>
>>>> JobConf conf2 = new JobConf(WordCount.class);
>>>> Job job2 = new Job(conf2);
>>>> conf2.setOutputKeyClass(IntWritable.class);
>>>> conf2.setOutputValueClass(Text.class);
>>>>
>>>> conf2.setMapperClass(Map1.class);
>>>> conf2.setReducerClass(Reduce1.class);
>>>>
>>>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>>>> maps(mapper1a, mapper1b) and 1 reducer...
>>>> Is that possible, if so could someone please help..
>>>>
>>>> thanks
>>>> kumudu
>>>> --
>>>>
>>>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>>>> Partner | Software Engineer I | m: +94 719 258 242 |
>>>> www.microsoft.com/enterprisesearch
>>>>
>>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>>
>> --
>>
>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>> Partner | Software Engineer I | m: +94 719 258 242 |
>> www.microsoft.com/enterprisesearch
>>
>>
>
>
> --
> Harsh J
>
--
Bertrand Dechoux
Re: How to set 2mappers on 1 job
Posted by Bertrand Dechoux <de...@gmail.com>.
Harsh's solution is indeed cleaner and must be what you were looking for
(and there is a version for both mapred and mapreduce packages).
If you are curious, see :
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapred/lib/MultipleInputs.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapred/lib/DelegatingMapper.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapreduce/lib/input/DelegatingMapper.java
Regards
Bertrand
On Sun, Sep 23, 2012 at 7:37 AM, Harsh J <ha...@cloudera.com> wrote:
> Hi,
>
> There's an easier way to do what Bertrand has suggested. Look at
> MultipleInputs class:
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.htmland see this blog post for an example on how to use it:
> http://kickstarthadoop.blogspot.in/2011/09/joins-with-plain-map-reduce.html
>
> Note though that the reducer input key and value types are singular, and
> you need to ensure that. There's no easy way around that aside of using
> generic containers.
>
>
> On Sun, Sep 23, 2012 at 9:34 AM, kumudu harshani <kumuduharshani@gmail.com
> > wrote:
>
>> I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
>>
>> The confusion i have is, if i put like:
>>
>> JobConf conf2 = new JobConf(WordCount.class);
>> Job job2 = new Job(conf2);
>> conf2.setOutputKeyClass(IntWritable.class);
>> conf2.setOutputValueClass(Text.class);
>>
>> conf2.setMapperClass(Map1.class);
>> conf2.setReducerClass(Reduce1.class);
>>
>> ---it will execute Map1.class and then Reduce1.class.
>>
>> so if i have Mapper1a.class and Mapper2a.class, how should i write the
>> code of job to execute both and then execute Reducer.class such that,
>> Reducer will take both mappers (1a, 1b) emit outputs...
>>
>> thanks
>> kumudu
>>
>> On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> You can use the map.input.file property to decide which logic should
>>> your mapper apply.
>>> Regards
>>> Bertrand
>>>
>>>
>>> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <
>>> kumuduharshani@gmail.com> wrote:
>>>
>>>> Hi...
>>>> Could someone help me with following scenario..
>>>>
>>>> I want implement a job which should get 2 mapper outputs and send them
>>>> to 1 reducer. Attached image show the flow I wanted....
>>>>
>>>>
>>>>
>>>>
>>>> Normal flow is like:
>>>>
>>>> JobConf conf2 = new JobConf(WordCount.class);
>>>> Job job2 = new Job(conf2);
>>>> conf2.setOutputKeyClass(IntWritable.class);
>>>> conf2.setOutputValueClass(Text.class);
>>>>
>>>> conf2.setMapperClass(Map1.class);
>>>> conf2.setReducerClass(Reduce1.class);
>>>>
>>>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>>>> maps(mapper1a, mapper1b) and 1 reducer...
>>>> Is that possible, if so could someone please help..
>>>>
>>>> thanks
>>>> kumudu
>>>> --
>>>>
>>>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>>>> Partner | Software Engineer I | m: +94 719 258 242 |
>>>> www.microsoft.com/enterprisesearch
>>>>
>>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>>
>> --
>>
>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>> Partner | Software Engineer I | m: +94 719 258 242 |
>> www.microsoft.com/enterprisesearch
>>
>>
>
>
> --
> Harsh J
>
--
Bertrand Dechoux
Re: How to set 2mappers on 1 job
Posted by Bertrand Dechoux <de...@gmail.com>.
Harsh's solution is indeed cleaner and must be what you were looking for
(and there is a version for both mapred and mapreduce packages).
If you are curious, see :
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapred/lib/MultipleInputs.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapred/lib/DelegatingMapper.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.3/src/mapred/org/apache/hadoop/mapreduce/lib/input/DelegatingMapper.java
Regards
Bertrand
On Sun, Sep 23, 2012 at 7:37 AM, Harsh J <ha...@cloudera.com> wrote:
> Hi,
>
> There's an easier way to do what Bertrand has suggested. Look at
> MultipleInputs class:
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.htmland see this blog post for an example on how to use it:
> http://kickstarthadoop.blogspot.in/2011/09/joins-with-plain-map-reduce.html
>
> Note though that the reducer input key and value types are singular, and
> you need to ensure that. There's no easy way around that aside of using
> generic containers.
>
>
> On Sun, Sep 23, 2012 at 9:34 AM, kumudu harshani <kumuduharshani@gmail.com
> > wrote:
>
>> I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
>>
>> The confusion i have is, if i put like:
>>
>> JobConf conf2 = new JobConf(WordCount.class);
>> Job job2 = new Job(conf2);
>> conf2.setOutputKeyClass(IntWritable.class);
>> conf2.setOutputValueClass(Text.class);
>>
>> conf2.setMapperClass(Map1.class);
>> conf2.setReducerClass(Reduce1.class);
>>
>> ---it will execute Map1.class and then Reduce1.class.
>>
>> so if i have Mapper1a.class and Mapper2a.class, how should i write the
>> code of job to execute both and then execute Reducer.class such that,
>> Reducer will take both mappers (1a, 1b) emit outputs...
>>
>> thanks
>> kumudu
>>
>> On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> You can use the map.input.file property to decide which logic should
>>> your mapper apply.
>>> Regards
>>> Bertrand
>>>
>>>
>>> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <
>>> kumuduharshani@gmail.com> wrote:
>>>
>>>> Hi...
>>>> Could someone help me with following scenario..
>>>>
>>>> I want implement a job which should get 2 mapper outputs and send them
>>>> to 1 reducer. Attached image show the flow I wanted....
>>>>
>>>>
>>>>
>>>>
>>>> Normal flow is like:
>>>>
>>>> JobConf conf2 = new JobConf(WordCount.class);
>>>> Job job2 = new Job(conf2);
>>>> conf2.setOutputKeyClass(IntWritable.class);
>>>> conf2.setOutputValueClass(Text.class);
>>>>
>>>> conf2.setMapperClass(Map1.class);
>>>> conf2.setReducerClass(Reduce1.class);
>>>>
>>>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>>>> maps(mapper1a, mapper1b) and 1 reducer...
>>>> Is that possible, if so could someone please help..
>>>>
>>>> thanks
>>>> kumudu
>>>> --
>>>>
>>>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>>>> Partner | Software Engineer I | m: +94 719 258 242 |
>>>> www.microsoft.com/enterprisesearch
>>>>
>>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>>
>> --
>>
>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>> Partner | Software Engineer I | m: +94 719 258 242 |
>> www.microsoft.com/enterprisesearch
>>
>>
>
>
> --
> Harsh J
>
--
Bertrand Dechoux
Re: How to set 2mappers on 1 job
Posted by Harsh J <ha...@cloudera.com>.
Hi,
There's an easier way to do what Bertrand has suggested. Look at
MultipleInputs class:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.htmland
see this blog post for an example on how to use it:
http://kickstarthadoop.blogspot.in/2011/09/joins-with-plain-map-reduce.html
Note though that the reducer input key and value types are singular, and
you need to ensure that. There's no easy way around that aside of using
generic containers.
On Sun, Sep 23, 2012 at 9:34 AM, kumudu harshani
<ku...@gmail.com>wrote:
> I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
>
> The confusion i have is, if i put like:
>
> JobConf conf2 = new JobConf(WordCount.class);
> Job job2 = new Job(conf2);
> conf2.setOutputKeyClass(IntWritable.class);
> conf2.setOutputValueClass(Text.class);
>
> conf2.setMapperClass(Map1.class);
> conf2.setReducerClass(Reduce1.class);
>
> ---it will execute Map1.class and then Reduce1.class.
>
> so if i have Mapper1a.class and Mapper2a.class, how should i write the
> code of job to execute both and then execute Reducer.class such that,
> Reducer will take both mappers (1a, 1b) emit outputs...
>
> thanks
> kumudu
>
> On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> You can use the map.input.file property to decide which logic should your
>> mapper apply.
>> Regards
>> Bertrand
>>
>>
>> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <
>> kumuduharshani@gmail.com> wrote:
>>
>>> Hi...
>>> Could someone help me with following scenario..
>>>
>>> I want implement a job which should get 2 mapper outputs and send them
>>> to 1 reducer. Attached image show the flow I wanted....
>>>
>>>
>>>
>>>
>>> Normal flow is like:
>>>
>>> JobConf conf2 = new JobConf(WordCount.class);
>>> Job job2 = new Job(conf2);
>>> conf2.setOutputKeyClass(IntWritable.class);
>>> conf2.setOutputValueClass(Text.class);
>>>
>>> conf2.setMapperClass(Map1.class);
>>> conf2.setReducerClass(Reduce1.class);
>>>
>>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>>> maps(mapper1a, mapper1b) and 1 reducer...
>>> Is that possible, if so could someone please help..
>>>
>>> thanks
>>> kumudu
>>> --
>>>
>>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>>> Partner | Software Engineer I | m: +94 719 258 242 |
>>> www.microsoft.com/enterprisesearch
>>>
>>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
>
> --
>
> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
> Partner | Software Engineer I | m: +94 719 258 242 |
> www.microsoft.com/enterprisesearch
>
>
--
Harsh J
Re: How to set 2mappers on 1 job
Posted by Harsh J <ha...@cloudera.com>.
Hi,
There's an easier way to do what Bertrand has suggested. Look at
MultipleInputs class:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.htmland
see this blog post for an example on how to use it:
http://kickstarthadoop.blogspot.in/2011/09/joins-with-plain-map-reduce.html
Note though that the reducer input key and value types are singular, and
you need to ensure that. There's no easy way around that aside of using
generic containers.
On Sun, Sep 23, 2012 at 9:34 AM, kumudu harshani
<ku...@gmail.com>wrote:
> I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
>
> The confusion i have is, if i put like:
>
> JobConf conf2 = new JobConf(WordCount.class);
> Job job2 = new Job(conf2);
> conf2.setOutputKeyClass(IntWritable.class);
> conf2.setOutputValueClass(Text.class);
>
> conf2.setMapperClass(Map1.class);
> conf2.setReducerClass(Reduce1.class);
>
> ---it will execute Map1.class and then Reduce1.class.
>
> so if i have Mapper1a.class and Mapper2a.class, how should i write the
> code of job to execute both and then execute Reducer.class such that,
> Reducer will take both mappers (1a, 1b) emit outputs...
>
> thanks
> kumudu
>
> On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> You can use the map.input.file property to decide which logic should your
>> mapper apply.
>> Regards
>> Bertrand
>>
>>
>> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <
>> kumuduharshani@gmail.com> wrote:
>>
>>> Hi...
>>> Could someone help me with following scenario..
>>>
>>> I want implement a job which should get 2 mapper outputs and send them
>>> to 1 reducer. Attached image show the flow I wanted....
>>>
>>>
>>>
>>>
>>> Normal flow is like:
>>>
>>> JobConf conf2 = new JobConf(WordCount.class);
>>> Job job2 = new Job(conf2);
>>> conf2.setOutputKeyClass(IntWritable.class);
>>> conf2.setOutputValueClass(Text.class);
>>>
>>> conf2.setMapperClass(Map1.class);
>>> conf2.setReducerClass(Reduce1.class);
>>>
>>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>>> maps(mapper1a, mapper1b) and 1 reducer...
>>> Is that possible, if so could someone please help..
>>>
>>> thanks
>>> kumudu
>>> --
>>>
>>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>>> Partner | Software Engineer I | m: +94 719 258 242 |
>>> www.microsoft.com/enterprisesearch
>>>
>>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
>
> --
>
> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
> Partner | Software Engineer I | m: +94 719 258 242 |
> www.microsoft.com/enterprisesearch
>
>
--
Harsh J
Re: How to set 2mappers on 1 job
Posted by Harsh J <ha...@cloudera.com>.
Hi,
There's an easier way to do what Bertrand has suggested. Look at
MultipleInputs class:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.htmland
see this blog post for an example on how to use it:
http://kickstarthadoop.blogspot.in/2011/09/joins-with-plain-map-reduce.html
Note though that the reducer input key and value types are singular, and
you need to ensure that. There's no easy way around that aside of using
generic containers.
On Sun, Sep 23, 2012 at 9:34 AM, kumudu harshani
<ku...@gmail.com>wrote:
> I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
>
> The confusion i have is, if i put like:
>
> JobConf conf2 = new JobConf(WordCount.class);
> Job job2 = new Job(conf2);
> conf2.setOutputKeyClass(IntWritable.class);
> conf2.setOutputValueClass(Text.class);
>
> conf2.setMapperClass(Map1.class);
> conf2.setReducerClass(Reduce1.class);
>
> ---it will execute Map1.class and then Reduce1.class.
>
> so if i have Mapper1a.class and Mapper2a.class, how should i write the
> code of job to execute both and then execute Reducer.class such that,
> Reducer will take both mappers (1a, 1b) emit outputs...
>
> thanks
> kumudu
>
> On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> You can use the map.input.file property to decide which logic should your
>> mapper apply.
>> Regards
>> Bertrand
>>
>>
>> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <
>> kumuduharshani@gmail.com> wrote:
>>
>>> Hi...
>>> Could someone help me with following scenario..
>>>
>>> I want implement a job which should get 2 mapper outputs and send them
>>> to 1 reducer. Attached image show the flow I wanted....
>>>
>>>
>>>
>>>
>>> Normal flow is like:
>>>
>>> JobConf conf2 = new JobConf(WordCount.class);
>>> Job job2 = new Job(conf2);
>>> conf2.setOutputKeyClass(IntWritable.class);
>>> conf2.setOutputValueClass(Text.class);
>>>
>>> conf2.setMapperClass(Map1.class);
>>> conf2.setReducerClass(Reduce1.class);
>>>
>>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>>> maps(mapper1a, mapper1b) and 1 reducer...
>>> Is that possible, if so could someone please help..
>>>
>>> thanks
>>> kumudu
>>> --
>>>
>>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>>> Partner | Software Engineer I | m: +94 719 258 242 |
>>> www.microsoft.com/enterprisesearch
>>>
>>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
>
> --
>
> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
> Partner | Software Engineer I | m: +94 719 258 242 |
> www.microsoft.com/enterprisesearch
>
>
--
Harsh J
Re: How to set 2mappers on 1 job
Posted by Harsh J <ha...@cloudera.com>.
Hi,
There's an easier way to do what Bertrand has suggested. Look at
MultipleInputs class:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.htmland
see this blog post for an example on how to use it:
http://kickstarthadoop.blogspot.in/2011/09/joins-with-plain-map-reduce.html
Note though that the reducer input key and value types are singular, and
you need to ensure that. There's no easy way around that aside of using
generic containers.
On Sun, Sep 23, 2012 at 9:34 AM, kumudu harshani
<ku...@gmail.com>wrote:
> I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
>
> The confusion i have is, if i put like:
>
> JobConf conf2 = new JobConf(WordCount.class);
> Job job2 = new Job(conf2);
> conf2.setOutputKeyClass(IntWritable.class);
> conf2.setOutputValueClass(Text.class);
>
> conf2.setMapperClass(Map1.class);
> conf2.setReducerClass(Reduce1.class);
>
> ---it will execute Map1.class and then Reduce1.class.
>
> so if i have Mapper1a.class and Mapper2a.class, how should i write the
> code of job to execute both and then execute Reducer.class such that,
> Reducer will take both mappers (1a, 1b) emit outputs...
>
> thanks
> kumudu
>
> On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> You can use the map.input.file property to decide which logic should your
>> mapper apply.
>> Regards
>> Bertrand
>>
>>
>> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <
>> kumuduharshani@gmail.com> wrote:
>>
>>> Hi...
>>> Could someone help me with following scenario..
>>>
>>> I want implement a job which should get 2 mapper outputs and send them
>>> to 1 reducer. Attached image show the flow I wanted....
>>>
>>>
>>>
>>>
>>> Normal flow is like:
>>>
>>> JobConf conf2 = new JobConf(WordCount.class);
>>> Job job2 = new Job(conf2);
>>> conf2.setOutputKeyClass(IntWritable.class);
>>> conf2.setOutputValueClass(Text.class);
>>>
>>> conf2.setMapperClass(Map1.class);
>>> conf2.setReducerClass(Reduce1.class);
>>>
>>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>>> maps(mapper1a, mapper1b) and 1 reducer...
>>> Is that possible, if so could someone please help..
>>>
>>> thanks
>>> kumudu
>>> --
>>>
>>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>>> Partner | Software Engineer I | m: +94 719 258 242 |
>>> www.microsoft.com/enterprisesearch
>>>
>>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
>
> --
>
> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
> Partner | Software Engineer I | m: +94 719 258 242 |
> www.microsoft.com/enterprisesearch
>
>
--
Harsh J
Re: How to set 2mappers on 1 job
Posted by kumudu harshani <ku...@gmail.com>.
I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
The confusion i have is, if i put like:
JobConf conf2 = new JobConf(WordCount.class);
Job job2 = new Job(conf2);
conf2.setOutputKeyClass(IntWritable.class);
conf2.setOutputValueClass(Text.class);
conf2.setMapperClass(Map1.class);
conf2.setReducerClass(Reduce1.class);
---it will execute Map1.class and then Reduce1.class.
so if i have Mapper1a.class and Mapper2a.class, how should i write the code
of job to execute both and then execute Reducer.class such that, Reducer
will take both mappers (1a, 1b) emit outputs...
thanks
kumudu
On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
> You can use the map.input.file property to decide which logic should your
> mapper apply.
> Regards
> Bertrand
>
>
> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <kumuduharshani@gmail.com
> > wrote:
>
>> Hi...
>> Could someone help me with following scenario..
>>
>> I want implement a job which should get 2 mapper outputs and send them to
>> 1 reducer. Attached image show the flow I wanted....
>>
>>
>>
>>
>> Normal flow is like:
>>
>> JobConf conf2 = new JobConf(WordCount.class);
>> Job job2 = new Job(conf2);
>> conf2.setOutputKeyClass(IntWritable.class);
>> conf2.setOutputValueClass(Text.class);
>>
>> conf2.setMapperClass(Map1.class);
>> conf2.setReducerClass(Reduce1.class);
>>
>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>> maps(mapper1a, mapper1b) and 1 reducer...
>> Is that possible, if so could someone please help..
>>
>> thanks
>> kumudu
>> --
>>
>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>> Partner | Software Engineer I | m: +94 719 258 242 |
>> www.microsoft.com/enterprisesearch
>>
>>
>
>
> --
> Bertrand Dechoux
>
--
*Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
Partner | Software Engineer I | m: +94 719 258 242 |
www.microsoft.com/enterprisesearch
Re: How to set 2mappers on 1 job
Posted by kumudu harshani <ku...@gmail.com>.
I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
The confusion i have is, if i put like:
JobConf conf2 = new JobConf(WordCount.class);
Job job2 = new Job(conf2);
conf2.setOutputKeyClass(IntWritable.class);
conf2.setOutputValueClass(Text.class);
conf2.setMapperClass(Map1.class);
conf2.setReducerClass(Reduce1.class);
---it will execute Map1.class and then Reduce1.class.
so if i have Mapper1a.class and Mapper2a.class, how should i write the code
of job to execute both and then execute Reducer.class such that, Reducer
will take both mappers (1a, 1b) emit outputs...
thanks
kumudu
On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
> You can use the map.input.file property to decide which logic should your
> mapper apply.
> Regards
> Bertrand
>
>
> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <kumuduharshani@gmail.com
> > wrote:
>
>> Hi...
>> Could someone help me with following scenario..
>>
>> I want implement a job which should get 2 mapper outputs and send them to
>> 1 reducer. Attached image show the flow I wanted....
>>
>>
>>
>>
>> Normal flow is like:
>>
>> JobConf conf2 = new JobConf(WordCount.class);
>> Job job2 = new Job(conf2);
>> conf2.setOutputKeyClass(IntWritable.class);
>> conf2.setOutputValueClass(Text.class);
>>
>> conf2.setMapperClass(Map1.class);
>> conf2.setReducerClass(Reduce1.class);
>>
>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>> maps(mapper1a, mapper1b) and 1 reducer...
>> Is that possible, if so could someone please help..
>>
>> thanks
>> kumudu
>> --
>>
>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>> Partner | Software Engineer I | m: +94 719 258 242 |
>> www.microsoft.com/enterprisesearch
>>
>>
>
>
> --
> Bertrand Dechoux
>
--
*Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
Partner | Software Engineer I | m: +94 719 258 242 |
www.microsoft.com/enterprisesearch
Re: How to set 2mappers on 1 job
Posted by kumudu harshani <ku...@gmail.com>.
I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
The confusion i have is, if i put like:
JobConf conf2 = new JobConf(WordCount.class);
Job job2 = new Job(conf2);
conf2.setOutputKeyClass(IntWritable.class);
conf2.setOutputValueClass(Text.class);
conf2.setMapperClass(Map1.class);
conf2.setReducerClass(Reduce1.class);
---it will execute Map1.class and then Reduce1.class.
so if i have Mapper1a.class and Mapper2a.class, how should i write the code
of job to execute both and then execute Reducer.class such that, Reducer
will take both mappers (1a, 1b) emit outputs...
thanks
kumudu
On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
> You can use the map.input.file property to decide which logic should your
> mapper apply.
> Regards
> Bertrand
>
>
> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <kumuduharshani@gmail.com
> > wrote:
>
>> Hi...
>> Could someone help me with following scenario..
>>
>> I want implement a job which should get 2 mapper outputs and send them to
>> 1 reducer. Attached image show the flow I wanted....
>>
>>
>>
>>
>> Normal flow is like:
>>
>> JobConf conf2 = new JobConf(WordCount.class);
>> Job job2 = new Job(conf2);
>> conf2.setOutputKeyClass(IntWritable.class);
>> conf2.setOutputValueClass(Text.class);
>>
>> conf2.setMapperClass(Map1.class);
>> conf2.setReducerClass(Reduce1.class);
>>
>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>> maps(mapper1a, mapper1b) and 1 reducer...
>> Is that possible, if so could someone please help..
>>
>> thanks
>> kumudu
>> --
>>
>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>> Partner | Software Engineer I | m: +94 719 258 242 |
>> www.microsoft.com/enterprisesearch
>>
>>
>
>
> --
> Bertrand Dechoux
>
--
*Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
Partner | Software Engineer I | m: +94 719 258 242 |
www.microsoft.com/enterprisesearch
Re: How to set 2mappers on 1 job
Posted by kumudu harshani <ku...@gmail.com>.
I am sorry.. I didn't get you.. shouldn't i handle that with jobconf code.
The confusion i have is, if i put like:
JobConf conf2 = new JobConf(WordCount.class);
Job job2 = new Job(conf2);
conf2.setOutputKeyClass(IntWritable.class);
conf2.setOutputValueClass(Text.class);
conf2.setMapperClass(Map1.class);
conf2.setReducerClass(Reduce1.class);
---it will execute Map1.class and then Reduce1.class.
so if i have Mapper1a.class and Mapper2a.class, how should i write the code
of job to execute both and then execute Reducer.class such that, Reducer
will take both mappers (1a, 1b) emit outputs...
thanks
kumudu
On Sun, Sep 23, 2012 at 9:23 AM, Bertrand Dechoux <de...@gmail.com>wrote:
> You can use the map.input.file property to decide which logic should your
> mapper apply.
> Regards
> Bertrand
>
>
> On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani <kumuduharshani@gmail.com
> > wrote:
>
>> Hi...
>> Could someone help me with following scenario..
>>
>> I want implement a job which should get 2 mapper outputs and send them to
>> 1 reducer. Attached image show the flow I wanted....
>>
>>
>>
>>
>> Normal flow is like:
>>
>> JobConf conf2 = new JobConf(WordCount.class);
>> Job job2 = new Job(conf2);
>> conf2.setOutputKeyClass(IntWritable.class);
>> conf2.setOutputValueClass(Text.class);
>>
>> conf2.setMapperClass(Map1.class);
>> conf2.setReducerClass(Reduce1.class);
>>
>> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
>> maps(mapper1a, mapper1b) and 1 reducer...
>> Is that possible, if so could someone please help..
>>
>> thanks
>> kumudu
>> --
>>
>> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
>> Partner | Software Engineer I | m: +94 719 258 242 |
>> www.microsoft.com/enterprisesearch
>>
>>
>
>
> --
> Bertrand Dechoux
>
--
*Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
Partner | Software Engineer I | m: +94 719 258 242 |
www.microsoft.com/enterprisesearch
Re: How to set 2mappers on 1 job
Posted by Bertrand Dechoux <de...@gmail.com>.
You can use the map.input.file property to decide which logic should your
mapper apply.
Regards
Bertrand
On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani
<ku...@gmail.com>wrote:
> Hi...
> Could someone help me with following scenario..
>
> I want implement a job which should get 2 mapper outputs and send them to
> 1 reducer. Attached image show the flow I wanted....
>
>
>
>
> Normal flow is like:
>
> JobConf conf2 = new JobConf(WordCount.class);
> Job job2 = new Job(conf2);
> conf2.setOutputKeyClass(IntWritable.class);
> conf2.setOutputValueClass(Text.class);
>
> conf2.setMapperClass(Map1.class);
> conf2.setReducerClass(Reduce1.class);
>
> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
> maps(mapper1a, mapper1b) and 1 reducer...
> Is that possible, if so could someone please help..
>
> thanks
> kumudu
> --
>
> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
> Partner | Software Engineer I | m: +94 719 258 242 |
> www.microsoft.com/enterprisesearch
>
>
--
Bertrand Dechoux
Re: How to set 2mappers on 1 job
Posted by Bertrand Dechoux <de...@gmail.com>.
You can use the map.input.file property to decide which logic should your
mapper apply.
Regards
Bertrand
On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani
<ku...@gmail.com>wrote:
> Hi...
> Could someone help me with following scenario..
>
> I want implement a job which should get 2 mapper outputs and send them to
> 1 reducer. Attached image show the flow I wanted....
>
>
>
>
> Normal flow is like:
>
> JobConf conf2 = new JobConf(WordCount.class);
> Job job2 = new Job(conf2);
> conf2.setOutputKeyClass(IntWritable.class);
> conf2.setOutputValueClass(Text.class);
>
> conf2.setMapperClass(Map1.class);
> conf2.setReducerClass(Reduce1.class);
>
> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
> maps(mapper1a, mapper1b) and 1 reducer...
> Is that possible, if so could someone please help..
>
> thanks
> kumudu
> --
>
> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
> Partner | Software Engineer I | m: +94 719 258 242 |
> www.microsoft.com/enterprisesearch
>
>
--
Bertrand Dechoux
Re: How to set 2mappers on 1 job
Posted by Bertrand Dechoux <de...@gmail.com>.
You can use the map.input.file property to decide which logic should your
mapper apply.
Regards
Bertrand
On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani
<ku...@gmail.com>wrote:
> Hi...
> Could someone help me with following scenario..
>
> I want implement a job which should get 2 mapper outputs and send them to
> 1 reducer. Attached image show the flow I wanted....
>
>
>
>
> Normal flow is like:
>
> JobConf conf2 = new JobConf(WordCount.class);
> Job job2 = new Job(conf2);
> conf2.setOutputKeyClass(IntWritable.class);
> conf2.setOutputValueClass(Text.class);
>
> conf2.setMapperClass(Map1.class);
> conf2.setReducerClass(Reduce1.class);
>
> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
> maps(mapper1a, mapper1b) and 1 reducer...
> Is that possible, if so could someone please help..
>
> thanks
> kumudu
> --
>
> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
> Partner | Software Engineer I | m: +94 719 258 242 |
> www.microsoft.com/enterprisesearch
>
>
--
Bertrand Dechoux
Re: How to set 2mappers on 1 job
Posted by Bertrand Dechoux <de...@gmail.com>.
You can use the map.input.file property to decide which logic should your
mapper apply.
Regards
Bertrand
On Sun, Sep 23, 2012 at 5:40 AM, kumudu harshani
<ku...@gmail.com>wrote:
> Hi...
> Could someone help me with following scenario..
>
> I want implement a job which should get 2 mapper outputs and send them to
> 1 reducer. Attached image show the flow I wanted....
>
>
>
>
> Normal flow is like:
>
> JobConf conf2 = new JobConf(WordCount.class);
> Job job2 = new Job(conf2);
> conf2.setOutputKeyClass(IntWritable.class);
> conf2.setOutputValueClass(Text.class);
>
> conf2.setMapperClass(Map1.class);
> conf2.setReducerClass(Reduce1.class);
>
> --- where it takes 1 mapper and 1 reducer. What i want is to set 2
> maps(mapper1a, mapper1b) and 1 reducer...
> Is that possible, if so could someone please help..
>
> thanks
> kumudu
> --
>
> *Kumudu Samarappuli* | Creative Search Technologies, a Microsoft IEG
> Partner | Software Engineer I | m: +94 719 258 242 |
> www.microsoft.com/enterprisesearch
>
>
--
Bertrand Dechoux