You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Jason Yang <li...@gmail.com> on 2012/09/11 05:33:02 UTC

How to make different mappers execute different processing on a same data ?

Hi, all

I've got a question about how to make different mappers execute different
processing on a same data?

Here is my scenario:
I got to process a data, however, there multiple choices to process this
data and I have no idea which one is better, so I was thinking that maybe I
could execute multiple mappers, in which different processing solution is
applied, and eventually the best one is chosen according to some evaluation
functions.

But I'm not sure whether this could be done in MapReduce.

Any help would be appreciated.

-- 
YANG, Lin

Re: How to make different mappers execute different processing on a same data ?

Posted by Jason Yang <li...@gmail.com>.
All right, I got it~Thank you very much.

2012/9/11 Harsh J <ha...@cloudera.com>

> Hey Jason,
>
> While I am not sure on whats the best way to automatically "evaluate"
> during the execution of a job, the MultipleInputs class offers a way
> to run different map implementations within a single job for different
> input paths. You could perhaps leverage that with duplicated (or
> symlinked?) input paths.
>
> Otherwise, perhaps do all the N types of computation in a single map()
> call, and judge the time inside it at the end of all, before emitting?
>
> On Tue, Sep 11, 2012 at 9:03 AM, Jason Yang <li...@gmail.com>
> wrote:
> > Hi, all
> >
> > I've got a question about how to make different mappers execute different
> > processing on a same data?
> >
> > Here is my scenario:
> > I got to process a data, however, there multiple choices to process this
> > data and I have no idea which one is better, so I was thinking that
> maybe I
> > could execute multiple mappers, in which different processing solution is
> > applied, and eventually the best one is chosen according to some
> evaluation
> > functions.
> >
> > But I'm not sure whether this could be done in MapReduce.
> >
> > Any help would be appreciated.
> >
> > --
> > YANG, Lin
> >
>
>
>
> --
> Harsh J
>



-- 
YANG, Lin

Re: How to make different mappers execute different processing on a same data ?

Posted by Jason Yang <li...@gmail.com>.
All right, I got it~Thank you very much.

2012/9/11 Harsh J <ha...@cloudera.com>

> Hey Jason,
>
> While I am not sure on whats the best way to automatically "evaluate"
> during the execution of a job, the MultipleInputs class offers a way
> to run different map implementations within a single job for different
> input paths. You could perhaps leverage that with duplicated (or
> symlinked?) input paths.
>
> Otherwise, perhaps do all the N types of computation in a single map()
> call, and judge the time inside it at the end of all, before emitting?
>
> On Tue, Sep 11, 2012 at 9:03 AM, Jason Yang <li...@gmail.com>
> wrote:
> > Hi, all
> >
> > I've got a question about how to make different mappers execute different
> > processing on a same data?
> >
> > Here is my scenario:
> > I got to process a data, however, there multiple choices to process this
> > data and I have no idea which one is better, so I was thinking that
> maybe I
> > could execute multiple mappers, in which different processing solution is
> > applied, and eventually the best one is chosen according to some
> evaluation
> > functions.
> >
> > But I'm not sure whether this could be done in MapReduce.
> >
> > Any help would be appreciated.
> >
> > --
> > YANG, Lin
> >
>
>
>
> --
> Harsh J
>



-- 
YANG, Lin

Re: How to make different mappers execute different processing on a same data ?

Posted by Jason Yang <li...@gmail.com>.
All right, I got it~Thank you very much.

2012/9/11 Harsh J <ha...@cloudera.com>

> Hey Jason,
>
> While I am not sure on whats the best way to automatically "evaluate"
> during the execution of a job, the MultipleInputs class offers a way
> to run different map implementations within a single job for different
> input paths. You could perhaps leverage that with duplicated (or
> symlinked?) input paths.
>
> Otherwise, perhaps do all the N types of computation in a single map()
> call, and judge the time inside it at the end of all, before emitting?
>
> On Tue, Sep 11, 2012 at 9:03 AM, Jason Yang <li...@gmail.com>
> wrote:
> > Hi, all
> >
> > I've got a question about how to make different mappers execute different
> > processing on a same data?
> >
> > Here is my scenario:
> > I got to process a data, however, there multiple choices to process this
> > data and I have no idea which one is better, so I was thinking that
> maybe I
> > could execute multiple mappers, in which different processing solution is
> > applied, and eventually the best one is chosen according to some
> evaluation
> > functions.
> >
> > But I'm not sure whether this could be done in MapReduce.
> >
> > Any help would be appreciated.
> >
> > --
> > YANG, Lin
> >
>
>
>
> --
> Harsh J
>



-- 
YANG, Lin

Re: How to make different mappers execute different processing on a same data ?

Posted by Jason Yang <li...@gmail.com>.
All right, I got it~Thank you very much.

2012/9/11 Harsh J <ha...@cloudera.com>

> Hey Jason,
>
> While I am not sure on whats the best way to automatically "evaluate"
> during the execution of a job, the MultipleInputs class offers a way
> to run different map implementations within a single job for different
> input paths. You could perhaps leverage that with duplicated (or
> symlinked?) input paths.
>
> Otherwise, perhaps do all the N types of computation in a single map()
> call, and judge the time inside it at the end of all, before emitting?
>
> On Tue, Sep 11, 2012 at 9:03 AM, Jason Yang <li...@gmail.com>
> wrote:
> > Hi, all
> >
> > I've got a question about how to make different mappers execute different
> > processing on a same data?
> >
> > Here is my scenario:
> > I got to process a data, however, there multiple choices to process this
> > data and I have no idea which one is better, so I was thinking that
> maybe I
> > could execute multiple mappers, in which different processing solution is
> > applied, and eventually the best one is chosen according to some
> evaluation
> > functions.
> >
> > But I'm not sure whether this could be done in MapReduce.
> >
> > Any help would be appreciated.
> >
> > --
> > YANG, Lin
> >
>
>
>
> --
> Harsh J
>



-- 
YANG, Lin

Re: How to make different mappers execute different processing on a same data ?

Posted by Harsh J <ha...@cloudera.com>.
Hey Jason,

While I am not sure on whats the best way to automatically "evaluate"
during the execution of a job, the MultipleInputs class offers a way
to run different map implementations within a single job for different
input paths. You could perhaps leverage that with duplicated (or
symlinked?) input paths.

Otherwise, perhaps do all the N types of computation in a single map()
call, and judge the time inside it at the end of all, before emitting?

On Tue, Sep 11, 2012 at 9:03 AM, Jason Yang <li...@gmail.com> wrote:
> Hi, all
>
> I've got a question about how to make different mappers execute different
> processing on a same data?
>
> Here is my scenario:
> I got to process a data, however, there multiple choices to process this
> data and I have no idea which one is better, so I was thinking that maybe I
> could execute multiple mappers, in which different processing solution is
> applied, and eventually the best one is chosen according to some evaluation
> functions.
>
> But I'm not sure whether this could be done in MapReduce.
>
> Any help would be appreciated.
>
> --
> YANG, Lin
>



-- 
Harsh J

Re: How to make different mappers execute different processing on a same data ?

Posted by Harsh J <ha...@cloudera.com>.
Hey Jason,

While I am not sure on whats the best way to automatically "evaluate"
during the execution of a job, the MultipleInputs class offers a way
to run different map implementations within a single job for different
input paths. You could perhaps leverage that with duplicated (or
symlinked?) input paths.

Otherwise, perhaps do all the N types of computation in a single map()
call, and judge the time inside it at the end of all, before emitting?

On Tue, Sep 11, 2012 at 9:03 AM, Jason Yang <li...@gmail.com> wrote:
> Hi, all
>
> I've got a question about how to make different mappers execute different
> processing on a same data?
>
> Here is my scenario:
> I got to process a data, however, there multiple choices to process this
> data and I have no idea which one is better, so I was thinking that maybe I
> could execute multiple mappers, in which different processing solution is
> applied, and eventually the best one is chosen according to some evaluation
> functions.
>
> But I'm not sure whether this could be done in MapReduce.
>
> Any help would be appreciated.
>
> --
> YANG, Lin
>



-- 
Harsh J

Re: How to make different mappers execute different processing on a same data ?

Posted by Harsh J <ha...@cloudera.com>.
Hey Jason,

While I am not sure on whats the best way to automatically "evaluate"
during the execution of a job, the MultipleInputs class offers a way
to run different map implementations within a single job for different
input paths. You could perhaps leverage that with duplicated (or
symlinked?) input paths.

Otherwise, perhaps do all the N types of computation in a single map()
call, and judge the time inside it at the end of all, before emitting?

On Tue, Sep 11, 2012 at 9:03 AM, Jason Yang <li...@gmail.com> wrote:
> Hi, all
>
> I've got a question about how to make different mappers execute different
> processing on a same data?
>
> Here is my scenario:
> I got to process a data, however, there multiple choices to process this
> data and I have no idea which one is better, so I was thinking that maybe I
> could execute multiple mappers, in which different processing solution is
> applied, and eventually the best one is chosen according to some evaluation
> functions.
>
> But I'm not sure whether this could be done in MapReduce.
>
> Any help would be appreciated.
>
> --
> YANG, Lin
>



-- 
Harsh J

Re: How to make different mappers execute different processing on a same data ?

Posted by Harsh J <ha...@cloudera.com>.
Hey Jason,

While I am not sure on whats the best way to automatically "evaluate"
during the execution of a job, the MultipleInputs class offers a way
to run different map implementations within a single job for different
input paths. You could perhaps leverage that with duplicated (or
symlinked?) input paths.

Otherwise, perhaps do all the N types of computation in a single map()
call, and judge the time inside it at the end of all, before emitting?

On Tue, Sep 11, 2012 at 9:03 AM, Jason Yang <li...@gmail.com> wrote:
> Hi, all
>
> I've got a question about how to make different mappers execute different
> processing on a same data?
>
> Here is my scenario:
> I got to process a data, however, there multiple choices to process this
> data and I have no idea which one is better, so I was thinking that maybe I
> could execute multiple mappers, in which different processing solution is
> applied, and eventually the best one is chosen according to some evaluation
> functions.
>
> But I'm not sure whether this could be done in MapReduce.
>
> Any help would be appreciated.
>
> --
> YANG, Lin
>



-- 
Harsh J