You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Kayla Jay <ka...@yahoo.com> on 2008/05/22 22:46:12 UTC

Can you run multiple simultaneous hadoop jobs?

Hello.

I'm trying to figure out why I need to use HOD vs. trying to run multiple jobs at the same time on the same set of resources.  Is it possible to run multiple hadoop jobs at the same time on the same set of input data?  I tried to run different jobs on the same set of data at the same time, but it takes a while (way while) and almost appears as if it queues up and the next job has to wait and so forth before completing.

So, I tried moving onto HOD.  It's not very apparent why one would want to use HOD to run on different nodes at the same time for different jobs that access the same set of input data.  

Can anyone provide any feedback on running multiple jobs at the same time on the same set of data?  HOD use?  Why would I have to run HOD and schedule running multiple jobs at the same time on the same set of data, but within their own set of resources in the cluster?

Thanks

Re: Can you run multiple simultaneous hadoop jobs?

Posted by Brice Arnould <br...@vleu.net>.

Ted Dunning a écrit :
> Brice,
> 
> Looks like a nice piece of work.  I just spent 20 minutes looking back for
> the old bug HADOOP-2573 only to find that you already knew about that and
> had addressed it.
> 
> This could really help people make progress on improving the scheduler and
> that progress would really improve the usability of large clusters with jobs
> that vary a lot in importance and size.
> 
> And your English is just fine.  Completely understandable.
Thanks a lot for this very kind mail *^__^*

With Jaideep we are discussing of a way to describe ressources used by 
Jobs (Network, LocalStorage, CPU and so on) in order to run more 
dissimilar tasks (e.g. data intensive and cpu intensive) on the same 
node. We hope to have a concrete proposition by Tuesday. When it will be 
ready, I would really like your feedback of whether it would match your 
needs or not, and if not what kind of schedulers you would need.

Regards,
Brice

Re: Can you run multiple simultaneous hadoop jobs?

Posted by Ted Dunning <td...@veoh.com>.

Brice,

Looks like a nice piece of work.  I just spent 20 minutes looking back for
the old bug HADOOP-2573 only to find that you already knew about that and
had addressed it.

This could really help people make progress on improving the scheduler and
that progress would really improve the usability of large clusters with jobs
that vary a lot in importance and size.

And your English is just fine.  Completely understandable.

On 5/23/08 12:53 AM, "Brice Arnould" <br...@vleu.net> wrote:

> Kayla Jay a écrit :
>> I'm trying to figure out why I need to use HOD vs. trying to run multiple
>> jobs at the same time on the same set of resources.  Is it possible to run
>> multiple hadoop jobs at the same time on the same set of input data?
>> I tried to run different jobs on the same set of data at the same time,
>> but it takes a while (way while) and almost appears as if it queues up
>> and the next job has to wait and so forth before completing.
>> 
>> So, I tried moving onto HOD.  It's not very apparent why one would want
>> to use HOD to run on different nodes at the same time for different
>> jobs that access the same set of input data.
>> 
>> Can anyone provide any feedback on running multiple jobs at the same
>> time on the same set of data?  HOD use?  Why would I have to run HOD
>> and schedule running multiple jobs at the same time on the same
>> set of data, but within their own set of resources in the cluster?
> Hi !
> 
> I just contributed a new implementation of the scheduler that adds an
> option called "mapred.jobtracker.scheduler.maxRunningTasksPerJob"
> allowing you to limit the number of nodes allocated to a Job (and so not
> to use HOD).
> This limit is a hint and if some nodes have nothing to do, they will be
> allocated anyway.
> If you want to test it, the patch is available in the bug #3412
> http://issues.apache.org/jira/browse/HADOOP-3412
> It applies on TRUNK but I can make a few modifications if you want it to
> apply on a release.
> 
> Ant jar should be sufficient to build it, but please ask me if you have
> more question.
> 
> I would really appreciate your feedback about the behavior of that
> scheduler. I'm trying to solve precisely those problem resulting of
> partitioned clusters, and I'll try to do something that suit better to
> your needs if you can tell me more.
> 
> Brice
> 
> PS: Please excuse me for my English :-P

Re: Can you run multiple simultaneous hadoop jobs?

Posted by Brice Arnould <br...@vleu.net>.

Kayla Jay a écrit :
> I'm trying to figure out why I need to use HOD vs. trying to run multiple
> jobs at the same time on the same set of resources.  Is it possible to run
 > multiple hadoop jobs at the same time on the same set of input data?
> I tried to run different jobs on the same set of data at the same time,
 > but it takes a while (way while) and almost appears as if it queues up
 > and the next job has to wait and so forth before completing.
> 
> So, I tried moving onto HOD.  It's not very apparent why one would want 
 > to use HOD to run on different nodes at the same time for different
 > jobs that access the same set of input data.
> 
> Can anyone provide any feedback on running multiple jobs at the same
 > time on the same set of data?  HOD use?  Why would I have to run HOD
 > and schedule running multiple jobs at the same time on the same
 > set of data, but within their own set of resources in the cluster?
Hi !

I just contributed a new implementation of the scheduler that adds an 
option called "mapred.jobtracker.scheduler.maxRunningTasksPerJob" 
allowing you to limit the number of nodes allocated to a Job (and so not 
to use HOD).
This limit is a hint and if some nodes have nothing to do, they will be 
allocated anyway.
If you want to test it, the patch is available in the bug #3412
http://issues.apache.org/jira/browse/HADOOP-3412
It applies on TRUNK but I can make a few modifications if you want it to 
apply on a release.

Ant jar should be sufficient to build it, but please ask me if you have 
more question.

I would really appreciate your feedback about the behavior of that 
scheduler. I'm trying to solve precisely those problem resulting of 
partitioned clusters, and I'll try to do something that suit better to 
your needs if you can tell me more.

Brice

PS: Please excuse me for my English :-P

Re: Can you run multiple simultaneous hadoop jobs?

Posted by Ted Dunning <td...@veoh.com>.

You definitely can run more than one job on a hadoop cluster.  But if one of
the jobs asks to use all of the map or reduce nodes, then the other job will
have to wait for some of the nodes to free up before proceeding.

Try limiting the number of map nodes and see how that changes matters.


On 5/22/08 1:46 PM, "Kayla Jay" <ka...@yahoo.com> wrote:

> 
> Hello.
> 
> I'm trying to figure out why I need to use HOD vs. trying to run multiple jobs
> at the same time on the same set of resources.  Is it possible to run multiple
> hadoop jobs at the same time on the same set of input data?  I tried to run
> different jobs on the same set of data at the same time, but it takes a while
> (way while) and almost appears as if it queues up and the next job has to wait
> and so forth before completing.
> 
> So, I tried moving onto HOD.  It's not very apparent why one would want to use
> HOD to run on different nodes at the same time for different jobs that access
> the same set of input data.
> 
> Can anyone provide any feedback on running multiple jobs at the same time on
> the same set of data?  HOD use?  Why would I have to run HOD and schedule
> running multiple jobs at the same time on the same set of data, but within
> their own set of resources in the cluster?
> 
> Thanks
> 
> 
>