You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Eric Zhang <ez...@yahoo-inc.com> on 2008/01/23 02:50:48 UTC

speculative task execution and writing side-effect files

I tried to find more details on speculative task execution on hadoop 
site and mailing archive, but it didn't seem to get explained a lot.   
I'd appreciate if anybody can help me on following related questions:
1. In what situation would speculative task execution  kick in if it's 
enabled
2. how much performance gain we can generally expect from enabling of 
this feature. 
3. If I want to write out side-effect files named with unque names per 
task-attempt in the directory other than 
${mapred.output.dir}/_${taskid},  would framework discard files attemped 
by unsuccessful task attempts?
4. If I write files into subdirectories of 
${mapred.output.dir}/_${taskid} (e.g. 
${mapred.output.dir}/_${taskid}/${sub_dir}),  would framework take care 
of promoting ${sub_dir} to ${mapred.output.dir}?

Thanks a lot,

Eric

RE: Larger Clusters with Different CPUs

Posted by Xavier Stevens <Xa...@fox.com>.

How exactly do you do the per node configuration?  

Currently each machine in my cluster has an NFS mount for HADOOP_HOME so
all of the machines use the same configuration.  I am assuming I would
need to make a particular config file like hadoop-site.xml local to each
machine.  Unless there is a way to specify per machine in a single
config file.

-Xavier

-----Original Message-----
From: Ted Dunning 
Sent: Wednesday, January 23, 2008 12:11 PM
To: core-user@hadoop.apache.org
Subject: Re: Larger Clusters with Different CPUs

I don't know when it became effective, but you can configure number of
tasks per node.

I would recommend slight overloads on your boxes, btw.  Something like
9-10 and 3 tasks for the two kinds of boxes.  That gives the linux
scheduler a little bit of stuff to fill in the cracks with.  This
matters most if your maps are very short ... as one exits, it is nice to
have a replacement already running.

On 1/23/08 12:02 PM, "Xavier Stevens" <Xa...@fox.com> wrote:

> Does anyone have any suggestions/best practices when configuring sets 
> of machines with varying number of CPU cores?
> 
> Basically I have two types of machines.
> 1) 8-cores
> 2) 2-cores
> 
> And I would like to make sure that the number of tasks for the 8-cores

> is 8 and for 2-cores is 2.
> 
> How are others handling this type of situation?
> 
> -Xavier
>

Re: Larger Clusters with Different CPUs

Posted by Ted Dunning <td...@veoh.com>.

I don't know when it became effective, but you can configure number of tasks
per node.

I would recommend slight overloads on your boxes, btw.  Something like 9-10
and 3 tasks for the two kinds of boxes.  That gives the linux scheduler a
little bit of stuff to fill in the cracks with.  This matters most if your
maps are very short ... as one exits, it is nice to have a replacement
already running.

On 1/23/08 12:02 PM, "Xavier Stevens" <Xa...@fox.com> wrote:

> Does anyone have any suggestions/best practices when configuring sets of
> machines with varying number of CPU cores?
> 
> Basically I have two types of machines.
> 1) 8-cores
> 2) 2-cores
> 
> And I would like to make sure that the number of tasks for the 8-cores
> is 8 and for 2-cores is 2.
> 
> How are others handling this type of situation?
> 
> -Xavier
>

Larger Clusters with Different CPUs

Posted by Xavier Stevens <Xa...@fox.com>.

Does anyone have any suggestions/best practices when configuring sets of
machines with varying number of CPU cores?

Basically I have two types of machines.
1) 8-cores
2) 2-cores

And I would like to make sure that the number of tasks for the 8-cores
is 8 and for 2-cores is 2. 

How are others handling this type of situation?

-Xavier

RE: speculative task execution and writing side-effect files

Posted by Devaraj Das <dd...@yahoo-inc.com>.

Some of these utilization issues you raised should be addressed better when
we implement some of the global scheduler ideas discussed in - hadoop-2491,
2510 and 2573.

Please raise jiras for other issues if you see fit.

Devaraj

> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
> Sent: Wednesday, January 23, 2008 11:37 AM
> To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
> Subject: RE: speculative task execution and writing side-effect files
> 
> while there is a willing audience - will take a moment to 
> crib about speculative execution (i did try to put these in a 
> jira as well):
> 
> - currently speculative execution is focused on reducing task 
> latency - and does not care for cluster efficiency. in a busy 
> cluster, current speculative execution causes dramatic drop 
> in efficiency as tasks are launched needlessly. To wit:
> 
> - we find reduces being speculatively executed almost all the 
> time (current settings are too aggressive)
> - speculative execution does not consider the state of the 
> cluster (busy/idle) while spawning extra tasks
> - redundant tasks are not killed aggressively enough (why 
> keep duplicate tasks running when both are progressing at 
> reasonable speed?)
> 
> i am also not terribly sure about the progress counter on 
> which speculative execution is based. with compressed map 
> outputs - the reduce progress counter goes above 100% and 
> then back to 0 (this is not fixed in 0.14.4 at least) - and i 
> don't understand what impact this has on the (progress - 
> averageProgress) criteria for launching speculative tasks.
> 
> the two biggest problems we have had with job latency (and i 
> am sure different people have different experiences) - is 
> that tasks get stuck in:
> a) 'Initializing' state with 0% progress
> b) reduce copy speeds are inexplicably slow at times in both 
> these cases, restarting tasks helps - but i would much rather 
> code in special hooks for detecting these conditions rather 
> than turn on speculative execution in general. not elegant, 
> not googlish, but practical.
> 
> ironically - when people care about job latency (daytime) - 
> the cluster is really busy (and hence speculative execution 
> generally hurts) and when people don't care about job latency 
> (nighttime - batch jobs) - the cluster is relatively idle 
> (and we could afford speculative execution - but it would 
> serve no purpose).
> 
> perhaps i am totally off - would like to learn about other 
> people's experience.
> 
> 
> -----Original Message-----
> From: Devaraj Das [mailto:ddas@yahoo-inc.com]
> Sent: Tue 1/22/2008 8:22 PM
> To: core-user@hadoop.apache.org
> Subject: RE: speculative task execution and writing side-effect files
>  
> > 1. In what situation would speculative task execution  kick 
> in if it's 
> > enabled
> 
> It would be based on tasks' progress. A speculative instance 
> of a running task is launched if the task is question is 
> lagging behind the others in terms of progress it has made. 
> It also depends on whether there are available slots in the 
> cluster to execute speculative tasks (in addition to the 
> regular tasks).
> 
> > 2. how much performance gain we can
> > generally expect from enabling of this feature. 
> 
> This depends on the cluster. Speculative execution comes 
> handy when, for some reason (maybe transient or permanent), 
> some nodes are slower than the others in executing tasks. 
> Without speculative execution jobs using those nodes might 
> have a long tail. With speculative execution, there is a good 
> chance that speculative tasks would be launched on some 
> healthy nodes and they run to completion faster.
> 
> > 3. If I want to write out side-effect files named with 
> unque names per 
> > task-attempt in the directory other than 
> > ${mapred.output.dir}/_${taskid},  would framework discard files 
> > attemped by unsuccessful task attempts?
> > 4. If I write files into subdirectories of 
> > ${mapred.output.dir}/_${taskid} (e.g.
> > ${mapred.output.dir}/_${taskid}/${sub_dir}),  would framework take 
> > care of promoting ${sub_dir} to ${mapred.output.dir}?
> 
> Yes to both.
> 
> Devaraj
> 
> > -----Original Message-----
> > From: Eric Zhang [mailto:ezhang@yahoo-inc.com]
> > Sent: Wednesday, January 23, 2008 7:21 AM
> > To: core-user@hadoop.apache.org
> > Subject: speculative task execution and writing side-effect files
> > 
> > I tried to find more details on speculative task execution 
> on hadoop 
> > site and mailing archive, but it didn't seem to get explained
> > a lot.   
> > I'd appreciate if anybody can help me on following related 
> questions:
> > 1. In what situation would speculative task execution  kick 
> in if it's 
> > enabled 2. how much performance gain we can generally expect from 
> > enabling of this feature.
> > 3. If I want to write out side-effect files named with 
> unque names per 
> > task-attempt in the directory other than 
> > ${mapred.output.dir}/_${taskid},  would framework discard files 
> > attemped by unsuccessful task attempts?
> > 4. If I write files into subdirectories of 
> > ${mapred.output.dir}/_${taskid} (e.g.
> > ${mapred.output.dir}/_${taskid}/${sub_dir}),  would framework take 
> > care of promoting ${sub_dir} to ${mapred.output.dir}?
> > 
> > Thanks a lot,
> > 
> > Eric
> > 
> 
> 
>

RE: speculative task execution and writing side-effect files

Posted by Joydeep Sen Sarma <js...@facebook.com>.

while there is a willing audience - will take a moment to crib about speculative execution (i did try to put these in a jira as well):

- currently speculative execution is focused on reducing task latency - and does not care for cluster efficiency. in a busy cluster, current speculative execution causes dramatic drop in efficiency as tasks are launched needlessly. To wit:

- we find reduces being speculatively executed almost all the time (current settings are too aggressive)
- speculative execution does not consider the state of the cluster (busy/idle) while spawning extra tasks
- redundant tasks are not killed aggressively enough (why keep duplicate tasks running when both are progressing at reasonable speed?)

i am also not terribly sure about the progress counter on which speculative execution is based. with compressed map outputs - the reduce progress counter goes above 100% and then back to 0 (this is not fixed in 0.14.4 at least) - and i don't understand what impact this has on the (progress - averageProgress) criteria for launching speculative tasks.

the two biggest problems we have had with job latency (and i am sure different people have different experiences) - is that tasks get stuck in:
a) 'Initializing' state with 0% progress
b) reduce copy speeds are inexplicably slow at times
in both these cases, restarting tasks helps - but i would much rather code in special hooks for detecting these conditions rather than turn on speculative execution in general. not elegant, not googlish, but practical.

ironically - when people care about job latency (daytime) - the cluster is really busy (and hence speculative execution generally hurts) and when people don't care about job latency (nighttime - batch jobs) - the cluster is relatively idle (and we could afford speculative execution - but it would serve no purpose).

perhaps i am totally off - would like to learn about other people's experience.


-----Original Message-----
From: Devaraj Das [mailto:ddas@yahoo-inc.com]
Sent: Tue 1/22/2008 8:22 PM
To: core-user@hadoop.apache.org
Subject: RE: speculative task execution and writing side-effect files
 
> 1. In what situation would speculative task execution  kick 
> in if it's enabled

It would be based on tasks' progress. A speculative instance of a running
task is launched if the task is question is lagging behind the others in
terms of progress it has made. It also depends on whether there are
available slots in the cluster to execute speculative tasks (in addition to
the regular tasks).

> 2. how much performance gain we can 
> generally expect from enabling of this feature. 

This depends on the cluster. Speculative execution comes handy when, for
some reason (maybe transient or permanent), some nodes are slower than the
others in executing tasks. Without speculative execution jobs using those
nodes might have a long tail. With speculative execution, there is a good
chance that speculative tasks would be launched on some healthy nodes and
they run to completion faster.

> 3. If I want to write out side-effect files named with unque 
> names per task-attempt in the directory other than 
> ${mapred.output.dir}/_${taskid},  would framework discard 
> files attemped by unsuccessful task attempts?
> 4. If I write files into subdirectories of 
> ${mapred.output.dir}/_${taskid} (e.g. 
> ${mapred.output.dir}/_${taskid}/${sub_dir}),  would framework 
> take care of promoting ${sub_dir} to ${mapred.output.dir}?

Yes to both.

Devaraj

> -----Original Message-----
> From: Eric Zhang [mailto:ezhang@yahoo-inc.com] 
> Sent: Wednesday, January 23, 2008 7:21 AM
> To: core-user@hadoop.apache.org
> Subject: speculative task execution and writing side-effect files
> 
> I tried to find more details on speculative task execution on hadoop 
> site and mailing archive, but it didn't seem to get explained 
> a lot.   
> I'd appreciate if anybody can help me on following related questions:
> 1. In what situation would speculative task execution  kick 
> in if it's enabled 2. how much performance gain we can 
> generally expect from enabling of this feature. 
> 3. If I want to write out side-effect files named with unque 
> names per task-attempt in the directory other than 
> ${mapred.output.dir}/_${taskid},  would framework discard 
> files attemped by unsuccessful task attempts?
> 4. If I write files into subdirectories of 
> ${mapred.output.dir}/_${taskid} (e.g. 
> ${mapred.output.dir}/_${taskid}/${sub_dir}),  would framework 
> take care of promoting ${sub_dir} to ${mapred.output.dir}?
> 
> Thanks a lot,
> 
> Eric
>

RE: speculative task execution and writing side-effect files

Posted by Devaraj Das <dd...@yahoo-inc.com>.

> 1. In what situation would speculative task execution  kick 
> in if it's enabled

It would be based on tasks' progress. A speculative instance of a running
task is launched if the task is question is lagging behind the others in
terms of progress it has made. It also depends on whether there are
available slots in the cluster to execute speculative tasks (in addition to
the regular tasks).

> 2. how much performance gain we can 
> generally expect from enabling of this feature. 

This depends on the cluster. Speculative execution comes handy when, for
some reason (maybe transient or permanent), some nodes are slower than the
others in executing tasks. Without speculative execution jobs using those
nodes might have a long tail. With speculative execution, there is a good
chance that speculative tasks would be launched on some healthy nodes and
they run to completion faster.

> 3. If I want to write out side-effect files named with unque 
> names per task-attempt in the directory other than 
> ${mapred.output.dir}/_${taskid},  would framework discard 
> files attemped by unsuccessful task attempts?
> 4. If I write files into subdirectories of 
> ${mapred.output.dir}/_${taskid} (e.g. 
> ${mapred.output.dir}/_${taskid}/${sub_dir}),  would framework 
> take care of promoting ${sub_dir} to ${mapred.output.dir}?

Yes to both.

Devaraj

> -----Original Message-----
> From: Eric Zhang [mailto:ezhang@yahoo-inc.com] 
> Sent: Wednesday, January 23, 2008 7:21 AM
> To: core-user@hadoop.apache.org
> Subject: speculative task execution and writing side-effect files
> 
> I tried to find more details on speculative task execution on hadoop 
> site and mailing archive, but it didn't seem to get explained 
> a lot.   
> I'd appreciate if anybody can help me on following related questions:
> 1. In what situation would speculative task execution  kick 
> in if it's enabled 2. how much performance gain we can 
> generally expect from enabling of this feature. 
> 3. If I want to write out side-effect files named with unque 
> names per task-attempt in the directory other than 
> ${mapred.output.dir}/_${taskid},  would framework discard 
> files attemped by unsuccessful task attempts?
> 4. If I write files into subdirectories of 
> ${mapred.output.dir}/_${taskid} (e.g. 
> ${mapred.output.dir}/_${taskid}/${sub_dir}),  would framework 
> take care of promoting ${sub_dir} to ${mapred.output.dir}?
> 
> Thanks a lot,
> 
> Eric
>