You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Arun C Murthy <ac...@yahoo-inc.com> on 2010/03/11 18:47:39 UTC

Re: Why hadoop jobs need setup and cleanup phases which would consume a lot of time ?

The daemons (JobTracker / TaskTracker) should not run _any_ user code  
for their security and integrity, hence the setup/cleanup tasks.

As more jobs are submitted you have very few slots on your 10-node  
cluster and hence the 'percieved' slowness - this will have the same  
effect on jobs whether setup/cleanup tasks are run or not.

Note: There is a *single* setup task at the beginning of the job and a  
*single* cleanup task at the end of the job, these are not per map- 
task or per reduce-task.

Arun

On Mar 11, 2010, at 5:56 AM, Guo Leitao wrote:

> From our test of hadoop-0.20.1 on 10 nodes, we find the setup period  
> is
> longer as more jobs are submitted. I don't know why maptask for  
> setup is
> needed, why not jobtracker or one thread takes over this work?
>
> 2010/3/11 Jeff Zhang <zj...@gmail.com>
>
>> Hi Zhou,
>>
>> I look at the source code, it seems  it is the JobTracker initiate  
>> the
>> setup
>> and cleanup task.
>> And why do you think the setup and cleanup phases consume a lot of  
>> time,
>> actually the time cost is depend on the OutputCommitter
>>
>>
>>
>>
>> On Thu, Mar 11, 2010 at 11:04 AM, Min Zhou <co...@gmail.com>  
>> wrote:
>>
>>> Hi all,
>>>
>>> Why hadoop jobs need setup and cleanup phases which would  consume a
>>> lot of time ? Why could not us archieve it like a distributed RDBMS
>>> does  a master process coordinates all salve nodes  through  socket.
>>> I think that will save plenty of time if there won't be any setups  
>>> and
>>> cleanups. What's hadoop philosophy on this?
>>>
>>> Thanks,
>>> Min
>>> --
>>> My research interests are distributed systems, parallel computing  
>>> and
>>> bytecode based virtual machine.
>>>
>>> My profile:
>>> http://www.linkedin.com/in/coderplay
>>> My blog:
>>> http://coderplay.javaeye.com
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>