You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Paul Sutter <su...@gmail.com> on 2006/06/23 23:35:24 UTC

Foreground/background jobs

I'm looking for feedback/ideas/suggestions/warnings on the following
subject:

We really need a mechanism for foreground and background jobs. We have jobs
that take 12 to 48 hours to complete, and we have several developers who
also need to run/test reasonable sized jobs throughout the day.

Here's what we're considering:

We're thinking of having two accounts: blue and yellow, each of which with a
complete copy of Hadoop, and each of which can activate a jobtracker running
on different ports. They would each run tasktrackers on all nodes (on
different ports), and operate independently, but share one instance of DFS
because thats where all of our data is (maybe we'd run DFS out of a third
account, lets call it dfs).

The only difference is that the tasks run under yellow would be nice'd (thus
the name: Ganglia shows nice CPU as yellow and normal CPU as blue).

This would allow us to make code changes anywhere in Hadoop, or our code,
and stop and restart the cluster, independently, without blue interfering
with yellow, and vice versa.

We have enough RAM on our cluster to make this work.

Any problems with this? Any better ideas?

Thanks!

Paul Sutter