You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Boyu Zhang <bo...@gmail.com> on 2009/09/04 20:36:47 UTC

How To Run Multiple Map & Reduce Functions In One Job

Dear All,

I am using Hadoop 0.20.0. I have an application that needs to run map-reduce
functions iteratively. Right now, the way I am doing this is new a Job for
each pass of the map-reduce. That seems cost a lot. Is there any way to run
map-reduce functions iteratively in one Job?

Thanks a lot for your time!

Boyu Zhang(Emma)

Re: How To Run Multiple Map & Reduce Functions In One Job

Posted by Boyu Zhang <bo...@gmail.com>.
OK. Thank you very much! Helps me a lot, I will try it.

Boyu

On Fri, Sep 4, 2009 at 3:25 PM, Amandeep Khurana <am...@gmail.com> wrote:

> Ah ok.. Then I think you'll have to fire separate jobs. But they can all be
> fired from inside one parent job - the method I explained earlier. Try that
> out...
>
> On Fri, Sep 4, 2009 at 12:22 PM, Boyu Zhang <bo...@gmail.com> wrote:
>
> > Yes, the output of the first iteration is the input of the second
> > iteration.
> > Actually, I am trying the page ranking problem. In the algorithm, you
> have
> > to run several iterations each using the output of previous iteration as
> > input and producing the output for latter.
> >
> > It is not a real life application, I just want to try some applications
> > with
> > iterations. Thanks a lot!
> >
> > Boyu
> >
> > On Fri, Sep 4, 2009 at 2:51 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> >
> > > Wait.. Why are you using the same mapper and reducer and calling it 10
> > > times? Is the output of the first iteration being input into the second
> > > one?
> > > What are these jobs doing? Tell a bit more about that. There might be a
> > way
> > > by which you can club some jobs together into one job and reduce the
> > > overheads...
> > >
> > >
> > > Amandeep Khurana
> > > Computer Science Graduate Student
> > > University of California, Santa Cruz
> > >
> > >
> > > On Fri, Sep 4, 2009 at 11:48 AM, Boyu Zhang <bo...@gmail.com>
> > wrote:
> > >
> > > > Dear Amandeep,
> > > >
> > > > Thanks for the fast reply. I will try the method you mentioned.
> > > >
> > > >  In my understanding, when a job is submitted, there will be a
> separate
> > > > java
> > > > process in jobtracker responsible for that job. And there will be an
> > > > initialization and cleanup cost for each job. If every iteration is a
> > new
> > > > job, they will be created sequentially by the jobtracker. Say, there
> > are
> > > 10
> > > > iterations in my code, there will be 10 jobs submitted to the
> > jobtracker.
> > > I
> > > > am just thinking is there a way to just submit 1 job,  but run 10
> > > > iterations, since they are using the same mapper and reducer classes.
> > > That
> > > > is basiclly why I think they are costly, maybe there is something
> that
> > I
> > > > misunderstood. I hope you could share it with me if I was wrong.
> > > >
> > > > Again, thanks a lot for replying!
> > > >
> > > > Boyu
> > > >
> > > > On Fri, Sep 4, 2009 at 2:39 PM, Amandeep Khurana <am...@gmail.com>
> > > wrote:
> > > >
> > > > > You can create different mapper and reducer classes and create
> > separate
> > > > job
> > > > > configs for them. You can pass these different configs to the Tool
> > > object
> > > > > in
> > > > > the same parent class... But they will essentially be different
> jobs
> > > > being
> > > > > called together from inside the same java parent class.
> > > > >
> > > > > Why do you say it costs a lot? Whats the issue??
> > > > >
> > > > >
> > > > > Amandeep Khurana
> > > > > Computer Science Graduate Student
> > > > > University of California, Santa Cruz
> > > > >
> > > > >
> > > > > On Fri, Sep 4, 2009 at 11:36 AM, Boyu Zhang <boyuzhang35@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > I am using Hadoop 0.20.0. I have an application that needs to run
> > > > > > map-reduce
> > > > > > functions iteratively. Right now, the way I am doing this is new
> a
> > > Job
> > > > > for
> > > > > > each pass of the map-reduce. That seems cost a lot. Is there any
> > way
> > > to
> > > > > run
> > > > > > map-reduce functions iteratively in one Job?
> > > > > >
> > > > > > Thanks a lot for your time!
> > > > > >
> > > > > > Boyu Zhang(Emma)
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How To Run Multiple Map & Reduce Functions In One Job

Posted by Amandeep Khurana <am...@gmail.com>.
Ah ok.. Then I think you'll have to fire separate jobs. But they can all be
fired from inside one parent job - the method I explained earlier. Try that
out...

On Fri, Sep 4, 2009 at 12:22 PM, Boyu Zhang <bo...@gmail.com> wrote:

> Yes, the output of the first iteration is the input of the second
> iteration.
> Actually, I am trying the page ranking problem. In the algorithm, you have
> to run several iterations each using the output of previous iteration as
> input and producing the output for latter.
>
> It is not a real life application, I just want to try some applications
> with
> iterations. Thanks a lot!
>
> Boyu
>
> On Fri, Sep 4, 2009 at 2:51 PM, Amandeep Khurana <am...@gmail.com> wrote:
>
> > Wait.. Why are you using the same mapper and reducer and calling it 10
> > times? Is the output of the first iteration being input into the second
> > one?
> > What are these jobs doing? Tell a bit more about that. There might be a
> way
> > by which you can club some jobs together into one job and reduce the
> > overheads...
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Fri, Sep 4, 2009 at 11:48 AM, Boyu Zhang <bo...@gmail.com>
> wrote:
> >
> > > Dear Amandeep,
> > >
> > > Thanks for the fast reply. I will try the method you mentioned.
> > >
> > >  In my understanding, when a job is submitted, there will be a separate
> > > java
> > > process in jobtracker responsible for that job. And there will be an
> > > initialization and cleanup cost for each job. If every iteration is a
> new
> > > job, they will be created sequentially by the jobtracker. Say, there
> are
> > 10
> > > iterations in my code, there will be 10 jobs submitted to the
> jobtracker.
> > I
> > > am just thinking is there a way to just submit 1 job,  but run 10
> > > iterations, since they are using the same mapper and reducer classes.
> > That
> > > is basiclly why I think they are costly, maybe there is something that
> I
> > > misunderstood. I hope you could share it with me if I was wrong.
> > >
> > > Again, thanks a lot for replying!
> > >
> > > Boyu
> > >
> > > On Fri, Sep 4, 2009 at 2:39 PM, Amandeep Khurana <am...@gmail.com>
> > wrote:
> > >
> > > > You can create different mapper and reducer classes and create
> separate
> > > job
> > > > configs for them. You can pass these different configs to the Tool
> > object
> > > > in
> > > > the same parent class... But they will essentially be different jobs
> > > being
> > > > called together from inside the same java parent class.
> > > >
> > > > Why do you say it costs a lot? Whats the issue??
> > > >
> > > >
> > > > Amandeep Khurana
> > > > Computer Science Graduate Student
> > > > University of California, Santa Cruz
> > > >
> > > >
> > > > On Fri, Sep 4, 2009 at 11:36 AM, Boyu Zhang <bo...@gmail.com>
> > > wrote:
> > > >
> > > > > Dear All,
> > > > >
> > > > > I am using Hadoop 0.20.0. I have an application that needs to run
> > > > > map-reduce
> > > > > functions iteratively. Right now, the way I am doing this is new a
> > Job
> > > > for
> > > > > each pass of the map-reduce. That seems cost a lot. Is there any
> way
> > to
> > > > run
> > > > > map-reduce functions iteratively in one Job?
> > > > >
> > > > > Thanks a lot for your time!
> > > > >
> > > > > Boyu Zhang(Emma)
> > > > >
> > > >
> > >
> >
>

Re: How To Run Multiple Map & Reduce Functions In One Job

Posted by Boyu Zhang <bo...@gmail.com>.
Yes, the output of the first iteration is the input of the second iteration.
Actually, I am trying the page ranking problem. In the algorithm, you have
to run several iterations each using the output of previous iteration as
input and producing the output for latter.

It is not a real life application, I just want to try some applications with
iterations. Thanks a lot!

Boyu

On Fri, Sep 4, 2009 at 2:51 PM, Amandeep Khurana <am...@gmail.com> wrote:

> Wait.. Why are you using the same mapper and reducer and calling it 10
> times? Is the output of the first iteration being input into the second
> one?
> What are these jobs doing? Tell a bit more about that. There might be a way
> by which you can club some jobs together into one job and reduce the
> overheads...
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Fri, Sep 4, 2009 at 11:48 AM, Boyu Zhang <bo...@gmail.com> wrote:
>
> > Dear Amandeep,
> >
> > Thanks for the fast reply. I will try the method you mentioned.
> >
> >  In my understanding, when a job is submitted, there will be a separate
> > java
> > process in jobtracker responsible for that job. And there will be an
> > initialization and cleanup cost for each job. If every iteration is a new
> > job, they will be created sequentially by the jobtracker. Say, there are
> 10
> > iterations in my code, there will be 10 jobs submitted to the jobtracker.
> I
> > am just thinking is there a way to just submit 1 job,  but run 10
> > iterations, since they are using the same mapper and reducer classes.
> That
> > is basiclly why I think they are costly, maybe there is something that I
> > misunderstood. I hope you could share it with me if I was wrong.
> >
> > Again, thanks a lot for replying!
> >
> > Boyu
> >
> > On Fri, Sep 4, 2009 at 2:39 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> >
> > > You can create different mapper and reducer classes and create separate
> > job
> > > configs for them. You can pass these different configs to the Tool
> object
> > > in
> > > the same parent class... But they will essentially be different jobs
> > being
> > > called together from inside the same java parent class.
> > >
> > > Why do you say it costs a lot? Whats the issue??
> > >
> > >
> > > Amandeep Khurana
> > > Computer Science Graduate Student
> > > University of California, Santa Cruz
> > >
> > >
> > > On Fri, Sep 4, 2009 at 11:36 AM, Boyu Zhang <bo...@gmail.com>
> > wrote:
> > >
> > > > Dear All,
> > > >
> > > > I am using Hadoop 0.20.0. I have an application that needs to run
> > > > map-reduce
> > > > functions iteratively. Right now, the way I am doing this is new a
> Job
> > > for
> > > > each pass of the map-reduce. That seems cost a lot. Is there any way
> to
> > > run
> > > > map-reduce functions iteratively in one Job?
> > > >
> > > > Thanks a lot for your time!
> > > >
> > > > Boyu Zhang(Emma)
> > > >
> > >
> >
>

Re: How To Run Multiple Map & Reduce Functions In One Job

Posted by Amandeep Khurana <am...@gmail.com>.
Wait.. Why are you using the same mapper and reducer and calling it 10
times? Is the output of the first iteration being input into the second one?
What are these jobs doing? Tell a bit more about that. There might be a way
by which you can club some jobs together into one job and reduce the
overheads...


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Fri, Sep 4, 2009 at 11:48 AM, Boyu Zhang <bo...@gmail.com> wrote:

> Dear Amandeep,
>
> Thanks for the fast reply. I will try the method you mentioned.
>
>  In my understanding, when a job is submitted, there will be a separate
> java
> process in jobtracker responsible for that job. And there will be an
> initialization and cleanup cost for each job. If every iteration is a new
> job, they will be created sequentially by the jobtracker. Say, there are 10
> iterations in my code, there will be 10 jobs submitted to the jobtracker. I
> am just thinking is there a way to just submit 1 job,  but run 10
> iterations, since they are using the same mapper and reducer classes. That
> is basiclly why I think they are costly, maybe there is something that I
> misunderstood. I hope you could share it with me if I was wrong.
>
> Again, thanks a lot for replying!
>
> Boyu
>
> On Fri, Sep 4, 2009 at 2:39 PM, Amandeep Khurana <am...@gmail.com> wrote:
>
> > You can create different mapper and reducer classes and create separate
> job
> > configs for them. You can pass these different configs to the Tool object
> > in
> > the same parent class... But they will essentially be different jobs
> being
> > called together from inside the same java parent class.
> >
> > Why do you say it costs a lot? Whats the issue??
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Fri, Sep 4, 2009 at 11:36 AM, Boyu Zhang <bo...@gmail.com>
> wrote:
> >
> > > Dear All,
> > >
> > > I am using Hadoop 0.20.0. I have an application that needs to run
> > > map-reduce
> > > functions iteratively. Right now, the way I am doing this is new a Job
> > for
> > > each pass of the map-reduce. That seems cost a lot. Is there any way to
> > run
> > > map-reduce functions iteratively in one Job?
> > >
> > > Thanks a lot for your time!
> > >
> > > Boyu Zhang(Emma)
> > >
> >
>

Re: How To Run Multiple Map & Reduce Functions In One Job

Posted by Boyu Zhang <bo...@gmail.com>.
Dear Amandeep,

Thanks for the fast reply. I will try the method you mentioned.

 In my understanding, when a job is submitted, there will be a separate java
process in jobtracker responsible for that job. And there will be an
initialization and cleanup cost for each job. If every iteration is a new
job, they will be created sequentially by the jobtracker. Say, there are 10
iterations in my code, there will be 10 jobs submitted to the jobtracker. I
am just thinking is there a way to just submit 1 job,  but run 10
iterations, since they are using the same mapper and reducer classes. That
is basiclly why I think they are costly, maybe there is something that I
misunderstood. I hope you could share it with me if I was wrong.

Again, thanks a lot for replying!

Boyu

On Fri, Sep 4, 2009 at 2:39 PM, Amandeep Khurana <am...@gmail.com> wrote:

> You can create different mapper and reducer classes and create separate job
> configs for them. You can pass these different configs to the Tool object
> in
> the same parent class... But they will essentially be different jobs being
> called together from inside the same java parent class.
>
> Why do you say it costs a lot? Whats the issue??
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Fri, Sep 4, 2009 at 11:36 AM, Boyu Zhang <bo...@gmail.com> wrote:
>
> > Dear All,
> >
> > I am using Hadoop 0.20.0. I have an application that needs to run
> > map-reduce
> > functions iteratively. Right now, the way I am doing this is new a Job
> for
> > each pass of the map-reduce. That seems cost a lot. Is there any way to
> run
> > map-reduce functions iteratively in one Job?
> >
> > Thanks a lot for your time!
> >
> > Boyu Zhang(Emma)
> >
>

Re: How To Run Multiple Map & Reduce Functions In One Job

Posted by Amandeep Khurana <am...@gmail.com>.
You can create different mapper and reducer classes and create separate job
configs for them. You can pass these different configs to the Tool object in
the same parent class... But they will essentially be different jobs being
called together from inside the same java parent class.

Why do you say it costs a lot? Whats the issue??


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Fri, Sep 4, 2009 at 11:36 AM, Boyu Zhang <bo...@gmail.com> wrote:

> Dear All,
>
> I am using Hadoop 0.20.0. I have an application that needs to run
> map-reduce
> functions iteratively. Right now, the way I am doing this is new a Job for
> each pass of the map-reduce. That seems cost a lot. Is there any way to run
> map-reduce functions iteratively in one Job?
>
> Thanks a lot for your time!
>
> Boyu Zhang(Emma)
>