You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Bryan A. P. Pendleton" <bp...@geekdom.net> on 2007/01/11 20:57:30 UTC

What's wrong with speculative execution, again?

I know the default was changed to "off" because of some bug. What's the
nature of the problem?

I ran a job last night that held for a long time because a job somehow got
assigned to a tasktracker that wasn't taking tasks - the task stayed as
"UNASSIGNED" in status indefinitely - I eventually killed the tasktracker,
which let the total job finish. Had speculative execution been going,
there'd've been no problem here. Not sure if this is a new bug, or somehow
related to the core speculative execution bug, but, it'd also be nice to
have speculative execution turned back on, as it really does drop the
turnaround time on jobs.

I'm now regularly running jobs that occupy ~100 CPUs for a half day or so,
and the lack of speculative execution plus the occasional wacky machine
causes the turnaround on these jobs to go up by large fractions of the total
job time, so I'd love to see this problem go (back) away.

-- 
Bryan A. P. Pendleton
Ph: (877) geek-1-bp