You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Arun C Murthy <ac...@yahoo-inc.com> on 2011/01/09 09:35:04 UTC

Re: How do hadoop work in details

On Dec 29, 2010, at 2:43 PM, felix gao wrote:

> Hi all,
>
> I am trying to figure out how exactly happens inside the job.
>
> 1) When the jobtracker launches a task to be run, how does it impact  
> the currently running jobs if the the current running job have  
> higher, same, or lower priories using the default queue.
>
> 2) What if a low priority job is running that is holding all the  
> reducer slots and the mappers are halfway done and a high priority  
> job comes in took all the mappers but cannot complete but all the  
> reducer slots are taken by the low priority job?
>

Both 1) and 2) really depends on the scheduler you are using -  
Default, FairShare or CapacityScheduler.

With the Default scheduler, 2) is a real problem. The  
CapacityScheduler doesn't allow priorities within the same queue for  
precisely the same reason since it doesn't have preemption. I'm not  
sure if FairScheduler handles it.

> 3) when is mappers allocated on the slaves, and when is reducers  
> allocated.
>

Usually, reduces are allocated only after a certain percentage of maps  
are complete (5% by default). Use  
mapred.reduce.slowstart.completed.maps to control this. Look at  
JobInProgress.java.

> 4)Does mappers pass all the data to reducers using RPC or they write  
> their output to HDFS and the reducers pick it up.
>

Maps sort/combine their output and write to local-disk. The reduces  
then copy them (we call it the 'shuffle' phase) over http. The TT on  
which the map ran will serve the map's output via an embedded  
webserver. Look at ReduceTask.java and  
TaskTracker.MapOutputServelt.doGet.

> 5) within a job, when and where is all the io occurs.
>

Typically input to map i.e. InputFormat and output of reduce i.e.  
OutputFormat. Look at MapTask.java and ReduceTask.java.

hope this helps,
Arun

>
> I know this seems to be a lot of low level questions , if you can  
> point me to the right place to look is should be enough.
>
> Thanks,
>
> Felix
>


Re: How do hadoop work in details

Posted by felix gao <gr...@gmail.com>.
Arun,

That explains it. I am running 0.20.2 in our production right now.

I guess I will wait for it to be  finalized.

Thanks,

Felix

On Wed, Jan 12, 2011 at 6:48 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:

> Felix,
>
>  Which version of the CS i.e. hadoop are you looking at?
>
>  As I mentioned, you'll need to wait a bit for the release I'm working
> towards to use the max-limit feature. It's present in
> hadoop-0.21/hadoop-0.22 presently, but not in hadoop-0.20.
>
> Arun
>
> On Jan 12, 2011, at 4:55 PM, felix gao wrote:
>
> Anrun,
>
> I went to the doc for capacity scheduler it seems the sum of the all the
> capacities should be 100, but there isn't any options to specify the max.
>  The options are mapred.capacity-scheduler.queue.<queue-name>.capacity, mapred.capacity-scheduler.queue.<queue-name>.supports-priority,
> and mapred.capacity-scheduler.queue.<queue-name>.minimum-user-limit-percent.
>
>
> Can you show me how to configure the scheduler like you indicated? Also how
> do I assign users to that queue? For example. all jobs submitted by user
> hadoop uses the production queue and everyone else uses the default queue.
>
> Thanks,
>
> Felix
>
>
>
>
> On Wed, Jan 12, 2011 at 1:50 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>
>> The approach I run for Yahoo, for pretty much the same use case, is to use
>> the CapacityScheduler and define two queues:
>> production
>> adhoc
>>
>> Lets say you have 30% as the capacity you want for production, rest for
>> adhoc.
>> production could have 70% capacity, max-limit of 100
>> adhoc can have 30% capacity, max-limit of 70/80.
>>
>> This way adhoc jobs can take upto 70-80% of the cluster, but save some for
>> 'production' jobs at all times. You get the idea?
>>
>> I'm sure there are similar tricks for FairScheduler, I just am not
>> familiar enough with it. I'll warn you that I only run Yahoo clusters, we
>> use the CS everywhere.
>>
>> One other note: I'm in bang in the middle of releasing extensive
>> enhancements to CapacityScheduler via hadoop-0.20.100 or whatever we decide
>> to call it:
>>
>> http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html
>>
>> Arun
>>
>> On Jan 12, 2011, at 9:40 AM, felix gao wrote:
>>
>> Arun,
>>
>> The information is very helpful.  What scheduler do you suggest to when we
>> have mixed of production and adhoc jobs are running the same time using pig
>> and we would like to guarantee the SLA for production task.
>>
>> Thanks,
>>
>> Felix
>>
>> On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>>
>>>
>>> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>>>
>>>  Hi all,
>>>>
>>>> I am trying to figure out how exactly happens inside the job.
>>>>
>>>> 1) When the jobtracker launches a task to be run, how does it impact the
>>>> currently running jobs if the the current running job have higher, same, or
>>>> lower priories using the default queue.
>>>>
>>>> 2) What if a low priority job is running that is holding all the reducer
>>>> slots and the mappers are halfway done and a high priority job comes in took
>>>> all the mappers but cannot complete but all the reducer slots are taken by
>>>> the low priority job?
>>>>
>>>>
>>> Both 1) and 2) really depends on the scheduler you are using - Default,
>>> FairShare or CapacityScheduler.
>>>
>>> With the Default scheduler, 2) is a real problem. The CapacityScheduler
>>> doesn't allow priorities within the same queue for precisely the same reason
>>> since it doesn't have preemption. I'm not sure if FairScheduler handles it.
>>>
>>>
>>>  3) when is mappers allocated on the slaves, and when is reducers
>>>> allocated.
>>>>
>>>>
>>> Usually, reduces are allocated only after a certain percentage of maps
>>> are complete (5% by default). Use mapred.reduce.slowstart.completed.maps to
>>> control this. Look at JobInProgress.java.
>>>
>>>
>>>  4)Does mappers pass all the data to reducers using RPC or they write
>>>> their output to HDFS and the reducers pick it up.
>>>>
>>>>
>>> Maps sort/combine their output and write to local-disk. The reduces then
>>> copy them (we call it the 'shuffle' phase) over http. The TT on which the
>>> map ran will serve the map's output via an embedded webserver. Look at
>>> ReduceTask.java and TaskTracker.MapOutputServelt.doGet.
>>>
>>>
>>>  5) within a job, when and where is all the io occurs.
>>>>
>>>>
>>> Typically input to map i.e. InputFormat and output of reduce i.e.
>>> OutputFormat. Look at MapTask.java and ReduceTask.java.
>>>
>>> hope this helps,
>>> Arun
>>>
>>>
>>>
>>>> I know this seems to be a lot of low level questions , if you can point
>>>> me to the right place to look is should be enough.
>>>>
>>>> Thanks,
>>>>
>>>> Felix
>>>>
>>>>
>>>
>>
>>
>
>

Re: How do hadoop work in details

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Felix,

  Which version of the CS i.e. hadoop are you looking at?

  As I mentioned, you'll need to wait a bit for the release I'm  
working towards to use the max-limit feature. It's present in  
hadoop-0.21/hadoop-0.22 presently, but not in hadoop-0.20.

Arun

On Jan 12, 2011, at 4:55 PM, felix gao wrote:

> Anrun,
>
> I went to the doc for capacity scheduler it seems the sum of the all  
> the capacities should be 100, but there isn't any options to specify  
> the max.  The options are mapred.capacity-scheduler.queue.<queue- 
> name>.capacity, mapred.capacity-scheduler.queue.<queue- 
> name>.supports-priority, and mapred.capacity-scheduler.queue.<queue- 
> name>.minimum-user-limit-percent.
>
> Can you show me how to configure the scheduler like you indicated?  
> Also how do I assign users to that queue? For example. all jobs  
> submitted by user hadoop uses the production queue and everyone else  
> uses the default queue.
>
> Thanks,
>
> Felix
>
>
>
>
> On Wed, Jan 12, 2011 at 1:50 PM, Arun C Murthy <ac...@yahoo-inc.com>  
> wrote:
> The approach I run for Yahoo, for pretty much the same use case, is  
> to use the CapacityScheduler and define two queues:
> production
> adhoc
>
> Lets say you have 30% as the capacity you want for production, rest  
> for adhoc.
> production could have 70% capacity, max-limit of 100
> adhoc can have 30% capacity, max-limit of 70/80.
>
> This way adhoc jobs can take upto 70-80% of the cluster, but save  
> some for 'production' jobs at all times. You get the idea?
>
> I'm sure there are similar tricks for FairScheduler, I just am not  
> familiar enough with it. I'll warn you that I only run Yahoo  
> clusters, we use the CS everywhere.
>
> One other note: I'm in bang in the middle of releasing extensive  
> enhancements to CapacityScheduler via hadoop-0.20.100 or whatever we  
> decide to call it:
>
> http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html
>
> Arun
>
> On Jan 12, 2011, at 9:40 AM, felix gao wrote:
>
>> Arun,
>>
>> The information is very helpful.  What scheduler do you suggest to  
>> when we have mixed of production and adhoc jobs are running the  
>> same time using pig and we would like to guarantee the SLA for  
>> production task.
>>
>> Thanks,
>>
>> Felix
>>
>> On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <ac...@yahoo-inc.com>  
>> wrote:
>>
>> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>>
>> Hi all,
>>
>> I am trying to figure out how exactly happens inside the job.
>>
>> 1) When the jobtracker launches a task to be run, how does it  
>> impact the currently running jobs if the the current running job  
>> have higher, same, or lower priories using the default queue.
>>
>> 2) What if a low priority job is running that is holding all the  
>> reducer slots and the mappers are halfway done and a high priority  
>> job comes in took all the mappers but cannot complete but all the  
>> reducer slots are taken by the low priority job?
>>
>>
>> Both 1) and 2) really depends on the scheduler you are using -  
>> Default, FairShare or CapacityScheduler.
>>
>> With the Default scheduler, 2) is a real problem. The  
>> CapacityScheduler doesn't allow priorities within the same queue  
>> for precisely the same reason since it doesn't have preemption. I'm  
>> not sure if FairScheduler handles it.
>>
>>
>> 3) when is mappers allocated on the slaves, and when is reducers  
>> allocated.
>>
>>
>> Usually, reduces are allocated only after a certain percentage of  
>> maps are complete (5% by default). Use  
>> mapred.reduce.slowstart.completed.maps to control this. Look at  
>> JobInProgress.java.
>>
>>
>> 4)Does mappers pass all the data to reducers using RPC or they  
>> write their output to HDFS and the reducers pick it up.
>>
>>
>> Maps sort/combine their output and write to local-disk. The reduces  
>> then copy them (we call it the 'shuffle' phase) over http. The TT  
>> on which the map ran will serve the map's output via an embedded  
>> webserver. Look at ReduceTask.java and  
>> TaskTracker.MapOutputServelt.doGet.
>>
>>
>> 5) within a job, when and where is all the io occurs.
>>
>>
>> Typically input to map i.e. InputFormat and output of reduce i.e.  
>> OutputFormat. Look at MapTask.java and ReduceTask.java.
>>
>> hope this helps,
>> Arun
>>
>>
>>
>> I know this seems to be a lot of low level questions , if you can  
>> point me to the right place to look is should be enough.
>>
>> Thanks,
>>
>> Felix
>>
>>
>>
>
>


Re: How do hadoop work in details

Posted by felix gao <gr...@gmail.com>.
Anrun,

I went to the doc for capacity scheduler it seems the sum of the all the
capacities should be 100, but there isn't any options to specify the max.
 The options are
mapred.capacity-scheduler.queue.<queue-name>.capacity,
mapred.capacity-scheduler.queue.<queue-name>.supports-priority,
and mapred.capacity-scheduler.queue.<queue-name>.minimum-user-limit-percent.


Can you show me how to configure the scheduler like you indicated? Also how
do I assign users to that queue? For example. all jobs submitted by user
hadoop uses the production queue and everyone else uses the default queue.

Thanks,

Felix




On Wed, Jan 12, 2011 at 1:50 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:

> The approach I run for Yahoo, for pretty much the same use case, is to use
> the CapacityScheduler and define two queues:
> production
> adhoc
>
> Lets say you have 30% as the capacity you want for production, rest for
> adhoc.
> production could have 70% capacity, max-limit of 100
> adhoc can have 30% capacity, max-limit of 70/80.
>
> This way adhoc jobs can take upto 70-80% of the cluster, but save some for
> 'production' jobs at all times. You get the idea?
>
> I'm sure there are similar tricks for FairScheduler, I just am not familiar
> enough with it. I'll warn you that I only run Yahoo clusters, we use the CS
> everywhere.
>
> One other note: I'm in bang in the middle of releasing extensive
> enhancements to CapacityScheduler via hadoop-0.20.100 or whatever we decide
> to call it:
>
> http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html
>
> Arun
>
> On Jan 12, 2011, at 9:40 AM, felix gao wrote:
>
> Arun,
>
> The information is very helpful.  What scheduler do you suggest to when we
> have mixed of production and adhoc jobs are running the same time using pig
> and we would like to guarantee the SLA for production task.
>
> Thanks,
>
> Felix
>
> On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>
>>
>> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>>
>>  Hi all,
>>>
>>> I am trying to figure out how exactly happens inside the job.
>>>
>>> 1) When the jobtracker launches a task to be run, how does it impact the
>>> currently running jobs if the the current running job have higher, same, or
>>> lower priories using the default queue.
>>>
>>> 2) What if a low priority job is running that is holding all the reducer
>>> slots and the mappers are halfway done and a high priority job comes in took
>>> all the mappers but cannot complete but all the reducer slots are taken by
>>> the low priority job?
>>>
>>>
>> Both 1) and 2) really depends on the scheduler you are using - Default,
>> FairShare or CapacityScheduler.
>>
>> With the Default scheduler, 2) is a real problem. The CapacityScheduler
>> doesn't allow priorities within the same queue for precisely the same reason
>> since it doesn't have preemption. I'm not sure if FairScheduler handles it.
>>
>>
>>  3) when is mappers allocated on the slaves, and when is reducers
>>> allocated.
>>>
>>>
>> Usually, reduces are allocated only after a certain percentage of maps are
>> complete (5% by default). Use mapred.reduce.slowstart.completed.maps to
>> control this. Look at JobInProgress.java.
>>
>>
>>  4)Does mappers pass all the data to reducers using RPC or they write
>>> their output to HDFS and the reducers pick it up.
>>>
>>>
>> Maps sort/combine their output and write to local-disk. The reduces then
>> copy them (we call it the 'shuffle' phase) over http. The TT on which the
>> map ran will serve the map's output via an embedded webserver. Look at
>> ReduceTask.java and TaskTracker.MapOutputServelt.doGet.
>>
>>
>>  5) within a job, when and where is all the io occurs.
>>>
>>>
>> Typically input to map i.e. InputFormat and output of reduce i.e.
>> OutputFormat. Look at MapTask.java and ReduceTask.java.
>>
>> hope this helps,
>> Arun
>>
>>
>>
>>> I know this seems to be a lot of low level questions , if you can point
>>> me to the right place to look is should be enough.
>>>
>>> Thanks,
>>>
>>> Felix
>>>
>>>
>>
>
>

Re: How do hadoop work in details

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
The approach I run for Yahoo, for pretty much the same use case, is to  
use the CapacityScheduler and define two queues:
production
adhoc

Lets say you have 30% as the capacity you want for production, rest  
for adhoc.
production could have 70% capacity, max-limit of 100
adhoc can have 30% capacity, max-limit of 70/80.

This way adhoc jobs can take upto 70-80% of the cluster, but save some  
for 'production' jobs at all times. You get the idea?

I'm sure there are similar tricks for FairScheduler, I just am not  
familiar enough with it. I'll warn you that I only run Yahoo clusters,  
we use the CS everywhere.

One other note: I'm in bang in the middle of releasing extensive  
enhancements to CapacityScheduler via hadoop-0.20.100 or whatever we  
decide to call it:

http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html

Arun

On Jan 12, 2011, at 9:40 AM, felix gao wrote:

> Arun,
>
> The information is very helpful.  What scheduler do you suggest to  
> when we have mixed of production and adhoc jobs are running the same  
> time using pig and we would like to guarantee the SLA for production  
> task.
>
> Thanks,
>
> Felix
>
> On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <ac...@yahoo-inc.com>  
> wrote:
>
> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>
> Hi all,
>
> I am trying to figure out how exactly happens inside the job.
>
> 1) When the jobtracker launches a task to be run, how does it impact  
> the currently running jobs if the the current running job have  
> higher, same, or lower priories using the default queue.
>
> 2) What if a low priority job is running that is holding all the  
> reducer slots and the mappers are halfway done and a high priority  
> job comes in took all the mappers but cannot complete but all the  
> reducer slots are taken by the low priority job?
>
>
> Both 1) and 2) really depends on the scheduler you are using -  
> Default, FairShare or CapacityScheduler.
>
> With the Default scheduler, 2) is a real problem. The  
> CapacityScheduler doesn't allow priorities within the same queue for  
> precisely the same reason since it doesn't have preemption. I'm not  
> sure if FairScheduler handles it.
>
>
> 3) when is mappers allocated on the slaves, and when is reducers  
> allocated.
>
>
> Usually, reduces are allocated only after a certain percentage of  
> maps are complete (5% by default). Use  
> mapred.reduce.slowstart.completed.maps to control this. Look at  
> JobInProgress.java.
>
>
> 4)Does mappers pass all the data to reducers using RPC or they write  
> their output to HDFS and the reducers pick it up.
>
>
> Maps sort/combine their output and write to local-disk. The reduces  
> then copy them (we call it the 'shuffle' phase) over http. The TT on  
> which the map ran will serve the map's output via an embedded  
> webserver. Look at ReduceTask.java and  
> TaskTracker.MapOutputServelt.doGet.
>
>
> 5) within a job, when and where is all the io occurs.
>
>
> Typically input to map i.e. InputFormat and output of reduce i.e.  
> OutputFormat. Look at MapTask.java and ReduceTask.java.
>
> hope this helps,
> Arun
>
>
>
> I know this seems to be a lot of low level questions , if you can  
> point me to the right place to look is should be enough.
>
> Thanks,
>
> Felix
>
>
>


Re: How do hadoop work in details

Posted by felix gao <gr...@gmail.com>.
Arun,

The information is very helpful.  What scheduler do you suggest to when we
have mixed of production and adhoc jobs are running the same time using pig
and we would like to guarantee the SLA for production task.

Thanks,

Felix

On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:

>
> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>
>  Hi all,
>>
>> I am trying to figure out how exactly happens inside the job.
>>
>> 1) When the jobtracker launches a task to be run, how does it impact the
>> currently running jobs if the the current running job have higher, same, or
>> lower priories using the default queue.
>>
>> 2) What if a low priority job is running that is holding all the reducer
>> slots and the mappers are halfway done and a high priority job comes in took
>> all the mappers but cannot complete but all the reducer slots are taken by
>> the low priority job?
>>
>>
> Both 1) and 2) really depends on the scheduler you are using - Default,
> FairShare or CapacityScheduler.
>
> With the Default scheduler, 2) is a real problem. The CapacityScheduler
> doesn't allow priorities within the same queue for precisely the same reason
> since it doesn't have preemption. I'm not sure if FairScheduler handles it.
>
>
>  3) when is mappers allocated on the slaves, and when is reducers
>> allocated.
>>
>>
> Usually, reduces are allocated only after a certain percentage of maps are
> complete (5% by default). Use mapred.reduce.slowstart.completed.maps to
> control this. Look at JobInProgress.java.
>
>
>  4)Does mappers pass all the data to reducers using RPC or they write their
>> output to HDFS and the reducers pick it up.
>>
>>
> Maps sort/combine their output and write to local-disk. The reduces then
> copy them (we call it the 'shuffle' phase) over http. The TT on which the
> map ran will serve the map's output via an embedded webserver. Look at
> ReduceTask.java and TaskTracker.MapOutputServelt.doGet.
>
>
>  5) within a job, when and where is all the io occurs.
>>
>>
> Typically input to map i.e. InputFormat and output of reduce i.e.
> OutputFormat. Look at MapTask.java and ReduceTask.java.
>
> hope this helps,
> Arun
>
>
>
>> I know this seems to be a lot of low level questions , if you can point me
>> to the right place to look is should be enough.
>>
>> Thanks,
>>
>> Felix
>>
>>
>