You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Shaojun Zhao <sh...@gmail.com> on 2012/01/02 08:48:32 UTC

running multiple jobs, please help

Dear all,

I have many jobs (900k) to run on many machines (4k) . All jobs are
independent, particularly, they use the same algorithm, but the input
is different. If I could build a single cluster with 4k machines, I
can simple submit all my jobs using a shell script. Critically, the
jobs will execute in a sequential fashion; latter jobs will wait until
all jobs before it finish.

Here comes the problem. Because these machines are on different
datacenters, I can not build a single cluster and submit all jobs in
the above way.

To start simple, I built many clusters, and each machine is a cluster
running in pseudo mode (I do not want to use standalone because all
machine are multi-core ones).

Now, I want to submit jobs to clusters from one machine, dynamically.
I.E., I first submit 4k jobs to the 4k clusters; then depending on
which one finishes, I will submit a new job to that cluster; and keep
going in this way until all 900k jobs are finished.

I need your advice on the following difficulty I am having: how do I
know a job j(i) on machine m(j) has finished?
Note: Since I want to submit jobs dynamically, no static job ordering
should assumed.

Thanks in advance!
-Sam