You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by John Clarke <cl...@gmail.com> on 2009/05/19 15:36:03 UTC

Suspend or scale back hadoop instance

Hi,

I am working on a project that is suited to Hadoop and so want to create a
small cluster (only 5 machines!) on our servers. The servers are however
used during the day and (mostly) idle at night.

So, I want Hadoop to run at full throttle at night and either scale back or
suspend itself during certain times.

Is it possible to do this? I've found very little information on it.

Thanks for your help!
John

Re: Suspend or scale back hadoop instance

Posted by John Clarke <cl...@gmail.com>.
The jobs will be of different sizes and some may take days to complete with
only 5 machines, so yes some will run night and day.

By scale back, I mean scale back on system resources (CPU, IO, RAM) so the
machine can be used for other tasks during the day.

I understand (as you pointed out) I can reduce the resources Hadoop uses by
editing the hadoop-env.sh and hadoop-site.xml but only at startup, there is
no way to do this on the fly so to speak. Is that correct?

I think ideally a way to suspend and continue a job is preferable to scaling
back on resources. i.e write current progress to disk in the morning and
suspend processing and then start up again where it left off at night.

Cheers,
John



2009/5/19 Kevin Weil <ke...@gmail.com>

> Will your jobs be running night and day, or just over a specified period?
> Depending on your setup, and on what you mean by "scale back" (CPU vs disk
> IO vs memory), you could potentially restart your cluster with different
> settings at different times of the day via cron.  This will kill any
> running
> jobs, so it'll only work if you can find or create a few free minutes.  But
> then you could scale back on CPU by running with HADOOP_NICENESS nonzero
> (see conf/hadoop-env.sh), you could scale back on memory by setting the
> various process memory limits low in conf/hadoop-site.xml, and you could
> scale back on datanode work entirely by setting the maximum number of
> mappers or reducers to 1 per node during the day (also in
> conf/hadoop-site.xml).
>
> Kevin
>
> On Tue, May 19, 2009 at 7:23 AM, Steve Loughran <st...@apache.org> wrote:
>
> > John Clarke wrote:
> >
> >> Hi,
> >>
> >> I am working on a project that is suited to Hadoop and so want to create
> a
> >> small cluster (only 5 machines!) on our servers. The servers are however
> >> used during the day and (mostly) idle at night.
> >>
> >> So, I want Hadoop to run at full throttle at night and either scale back
> >> or
> >> suspend itself during certain times.
> >>
> >
> > You could add/remove new task trackers on idle systems, but
> > * you don't want to take away datanodes, as there's a risk that data will
> > become unavailable.
> > * there's nothing in the scheduler to warn that machines will go away at
> a
> > certain time
> > If you only want to run the cluster at night, I'd just configure the
> entire
> > cluster to go up and down
> >
>

Re: Suspend or scale back hadoop instance

Posted by Kevin Weil <ke...@gmail.com>.
Will your jobs be running night and day, or just over a specified period?
Depending on your setup, and on what you mean by "scale back" (CPU vs disk
IO vs memory), you could potentially restart your cluster with different
settings at different times of the day via cron.  This will kill any running
jobs, so it'll only work if you can find or create a few free minutes.  But
then you could scale back on CPU by running with HADOOP_NICENESS nonzero
(see conf/hadoop-env.sh), you could scale back on memory by setting the
various process memory limits low in conf/hadoop-site.xml, and you could
scale back on datanode work entirely by setting the maximum number of
mappers or reducers to 1 per node during the day (also in
conf/hadoop-site.xml).

Kevin

On Tue, May 19, 2009 at 7:23 AM, Steve Loughran <st...@apache.org> wrote:

> John Clarke wrote:
>
>> Hi,
>>
>> I am working on a project that is suited to Hadoop and so want to create a
>> small cluster (only 5 machines!) on our servers. The servers are however
>> used during the day and (mostly) idle at night.
>>
>> So, I want Hadoop to run at full throttle at night and either scale back
>> or
>> suspend itself during certain times.
>>
>
> You could add/remove new task trackers on idle systems, but
> * you don't want to take away datanodes, as there's a risk that data will
> become unavailable.
> * there's nothing in the scheduler to warn that machines will go away at a
> certain time
> If you only want to run the cluster at night, I'd just configure the entire
> cluster to go up and down
>

Re: Suspend or scale back hadoop instance

Posted by Steve Loughran <st...@apache.org>.
John Clarke wrote:
> Hi,
> 
> I am working on a project that is suited to Hadoop and so want to create a
> small cluster (only 5 machines!) on our servers. The servers are however
> used during the day and (mostly) idle at night.
> 
> So, I want Hadoop to run at full throttle at night and either scale back or
> suspend itself during certain times.

You could add/remove new task trackers on idle systems, but
* you don't want to take away datanodes, as there's a risk that data 
will become unavailable.
* there's nothing in the scheduler to warn that machines will go away at 
a certain time
If you only want to run the cluster at night, I'd just configure the 
entire cluster to go up and down

Re: Suspend or scale back hadoop instance

Posted by John Clarke <cl...@gmail.com>.
Hi Piotr,

Thanks for the prompt reply.

If the cron script shuts down Hadoop surely it won't pick up where it left
off when it is restarted?

All the machines will be used during the day so it is not an option to turn
Hadoop off on only some of them.

John




2009/5/19 Piotr Praczyk <pi...@gmail.com>

> Hi John
>
> I don't know if there is a Hadoop support for such thing, but You can do
> this easily writing a crontab  script. It could start hadoop at specified
> hour and shut it down ( disable some nodes) at another one.
>
> There can be some problems with HDFS however ( if you disable all the nodes
> holding replicas of some blocks, the files will become inaccessible)
>
> Piotr
>
> 2009/5/19 John Clarke <cl...@gmail.com>
>
> > Hi,
> >
> > I am working on a project that is suited to Hadoop and so want to create
> a
> > small cluster (only 5 machines!) on our servers. The servers are however
> > used during the day and (mostly) idle at night.
> >
> > So, I want Hadoop to run at full throttle at night and either scale back
> or
> > suspend itself during certain times.
> >
> > Is it possible to do this? I've found very little information on it.
> >
> > Thanks for your help!
> > John
> >
>

Re: Suspend or scale back hadoop instance

Posted by Piotr Praczyk <pi...@gmail.com>.
Hi John

I don't know if there is a Hadoop support for such thing, but You can do
this easily writing a crontab  script. It could start hadoop at specified
hour and shut it down ( disable some nodes) at another one.

There can be some problems with HDFS however ( if you disable all the nodes
holding replicas of some blocks, the files will become inaccessible)

Piotr

2009/5/19 John Clarke <cl...@gmail.com>

> Hi,
>
> I am working on a project that is suited to Hadoop and so want to create a
> small cluster (only 5 machines!) on our servers. The servers are however
> used during the day and (mostly) idle at night.
>
> So, I want Hadoop to run at full throttle at night and either scale back or
> suspend itself during certain times.
>
> Is it possible to do this? I've found very little information on it.
>
> Thanks for your help!
> John
>