You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Arun C Murthy <ac...@yahoo-inc.com> on 2011/01/09 09:35:04 UTC
Re: How do hadoop work in details
On Dec 29, 2010, at 2:43 PM, felix gao wrote:
> Hi all,
>
> I am trying to figure out how exactly happens inside the job.
>
> 1) When the jobtracker launches a task to be run, how does it impact
> the currently running jobs if the the current running job have
> higher, same, or lower priories using the default queue.
>
> 2) What if a low priority job is running that is holding all the
> reducer slots and the mappers are halfway done and a high priority
> job comes in took all the mappers but cannot complete but all the
> reducer slots are taken by the low priority job?
>
Both 1) and 2) really depends on the scheduler you are using -
Default, FairShare or CapacityScheduler.
With the Default scheduler, 2) is a real problem. The
CapacityScheduler doesn't allow priorities within the same queue for
precisely the same reason since it doesn't have preemption. I'm not
sure if FairScheduler handles it.
> 3) when is mappers allocated on the slaves, and when is reducers
> allocated.
>
Usually, reduces are allocated only after a certain percentage of maps
are complete (5% by default). Use
mapred.reduce.slowstart.completed.maps to control this. Look at
JobInProgress.java.
> 4)Does mappers pass all the data to reducers using RPC or they write
> their output to HDFS and the reducers pick it up.
>
Maps sort/combine their output and write to local-disk. The reduces
then copy them (we call it the 'shuffle' phase) over http. The TT on
which the map ran will serve the map's output via an embedded
webserver. Look at ReduceTask.java and
TaskTracker.MapOutputServelt.doGet.
> 5) within a job, when and where is all the io occurs.
>
Typically input to map i.e. InputFormat and output of reduce i.e.
OutputFormat. Look at MapTask.java and ReduceTask.java.
hope this helps,
Arun
>
> I know this seems to be a lot of low level questions , if you can
> point me to the right place to look is should be enough.
>
> Thanks,
>
> Felix
>
Re: How do hadoop work in details
Posted by felix gao <gr...@gmail.com>.
Arun,
That explains it. I am running 0.20.2 in our production right now.
I guess I will wait for it to be finalized.
Thanks,
Felix
On Wed, Jan 12, 2011 at 6:48 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
> Felix,
>
> Which version of the CS i.e. hadoop are you looking at?
>
> As I mentioned, you'll need to wait a bit for the release I'm working
> towards to use the max-limit feature. It's present in
> hadoop-0.21/hadoop-0.22 presently, but not in hadoop-0.20.
>
> Arun
>
> On Jan 12, 2011, at 4:55 PM, felix gao wrote:
>
> Anrun,
>
> I went to the doc for capacity scheduler it seems the sum of the all the
> capacities should be 100, but there isn't any options to specify the max.
> The options are mapred.capacity-scheduler.queue.<queue-name>.capacity, mapred.capacity-scheduler.queue.<queue-name>.supports-priority,
> and mapred.capacity-scheduler.queue.<queue-name>.minimum-user-limit-percent.
>
>
> Can you show me how to configure the scheduler like you indicated? Also how
> do I assign users to that queue? For example. all jobs submitted by user
> hadoop uses the production queue and everyone else uses the default queue.
>
> Thanks,
>
> Felix
>
>
>
>
> On Wed, Jan 12, 2011 at 1:50 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>
>> The approach I run for Yahoo, for pretty much the same use case, is to use
>> the CapacityScheduler and define two queues:
>> production
>> adhoc
>>
>> Lets say you have 30% as the capacity you want for production, rest for
>> adhoc.
>> production could have 70% capacity, max-limit of 100
>> adhoc can have 30% capacity, max-limit of 70/80.
>>
>> This way adhoc jobs can take upto 70-80% of the cluster, but save some for
>> 'production' jobs at all times. You get the idea?
>>
>> I'm sure there are similar tricks for FairScheduler, I just am not
>> familiar enough with it. I'll warn you that I only run Yahoo clusters, we
>> use the CS everywhere.
>>
>> One other note: I'm in bang in the middle of releasing extensive
>> enhancements to CapacityScheduler via hadoop-0.20.100 or whatever we decide
>> to call it:
>>
>> http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html
>>
>> Arun
>>
>> On Jan 12, 2011, at 9:40 AM, felix gao wrote:
>>
>> Arun,
>>
>> The information is very helpful. What scheduler do you suggest to when we
>> have mixed of production and adhoc jobs are running the same time using pig
>> and we would like to guarantee the SLA for production task.
>>
>> Thanks,
>>
>> Felix
>>
>> On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>>
>>>
>>> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>>>
>>> Hi all,
>>>>
>>>> I am trying to figure out how exactly happens inside the job.
>>>>
>>>> 1) When the jobtracker launches a task to be run, how does it impact the
>>>> currently running jobs if the the current running job have higher, same, or
>>>> lower priories using the default queue.
>>>>
>>>> 2) What if a low priority job is running that is holding all the reducer
>>>> slots and the mappers are halfway done and a high priority job comes in took
>>>> all the mappers but cannot complete but all the reducer slots are taken by
>>>> the low priority job?
>>>>
>>>>
>>> Both 1) and 2) really depends on the scheduler you are using - Default,
>>> FairShare or CapacityScheduler.
>>>
>>> With the Default scheduler, 2) is a real problem. The CapacityScheduler
>>> doesn't allow priorities within the same queue for precisely the same reason
>>> since it doesn't have preemption. I'm not sure if FairScheduler handles it.
>>>
>>>
>>> 3) when is mappers allocated on the slaves, and when is reducers
>>>> allocated.
>>>>
>>>>
>>> Usually, reduces are allocated only after a certain percentage of maps
>>> are complete (5% by default). Use mapred.reduce.slowstart.completed.maps to
>>> control this. Look at JobInProgress.java.
>>>
>>>
>>> 4)Does mappers pass all the data to reducers using RPC or they write
>>>> their output to HDFS and the reducers pick it up.
>>>>
>>>>
>>> Maps sort/combine their output and write to local-disk. The reduces then
>>> copy them (we call it the 'shuffle' phase) over http. The TT on which the
>>> map ran will serve the map's output via an embedded webserver. Look at
>>> ReduceTask.java and TaskTracker.MapOutputServelt.doGet.
>>>
>>>
>>> 5) within a job, when and where is all the io occurs.
>>>>
>>>>
>>> Typically input to map i.e. InputFormat and output of reduce i.e.
>>> OutputFormat. Look at MapTask.java and ReduceTask.java.
>>>
>>> hope this helps,
>>> Arun
>>>
>>>
>>>
>>>> I know this seems to be a lot of low level questions , if you can point
>>>> me to the right place to look is should be enough.
>>>>
>>>> Thanks,
>>>>
>>>> Felix
>>>>
>>>>
>>>
>>
>>
>
>
Re: How do hadoop work in details
Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Felix,
Which version of the CS i.e. hadoop are you looking at?
As I mentioned, you'll need to wait a bit for the release I'm
working towards to use the max-limit feature. It's present in
hadoop-0.21/hadoop-0.22 presently, but not in hadoop-0.20.
Arun
On Jan 12, 2011, at 4:55 PM, felix gao wrote:
> Anrun,
>
> I went to the doc for capacity scheduler it seems the sum of the all
> the capacities should be 100, but there isn't any options to specify
> the max. The options are mapred.capacity-scheduler.queue.<queue-
> name>.capacity, mapred.capacity-scheduler.queue.<queue-
> name>.supports-priority, and mapred.capacity-scheduler.queue.<queue-
> name>.minimum-user-limit-percent.
>
> Can you show me how to configure the scheduler like you indicated?
> Also how do I assign users to that queue? For example. all jobs
> submitted by user hadoop uses the production queue and everyone else
> uses the default queue.
>
> Thanks,
>
> Felix
>
>
>
>
> On Wed, Jan 12, 2011 at 1:50 PM, Arun C Murthy <ac...@yahoo-inc.com>
> wrote:
> The approach I run for Yahoo, for pretty much the same use case, is
> to use the CapacityScheduler and define two queues:
> production
> adhoc
>
> Lets say you have 30% as the capacity you want for production, rest
> for adhoc.
> production could have 70% capacity, max-limit of 100
> adhoc can have 30% capacity, max-limit of 70/80.
>
> This way adhoc jobs can take upto 70-80% of the cluster, but save
> some for 'production' jobs at all times. You get the idea?
>
> I'm sure there are similar tricks for FairScheduler, I just am not
> familiar enough with it. I'll warn you that I only run Yahoo
> clusters, we use the CS everywhere.
>
> One other note: I'm in bang in the middle of releasing extensive
> enhancements to CapacityScheduler via hadoop-0.20.100 or whatever we
> decide to call it:
>
> http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html
>
> Arun
>
> On Jan 12, 2011, at 9:40 AM, felix gao wrote:
>
>> Arun,
>>
>> The information is very helpful. What scheduler do you suggest to
>> when we have mixed of production and adhoc jobs are running the
>> same time using pig and we would like to guarantee the SLA for
>> production task.
>>
>> Thanks,
>>
>> Felix
>>
>> On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <ac...@yahoo-inc.com>
>> wrote:
>>
>> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>>
>> Hi all,
>>
>> I am trying to figure out how exactly happens inside the job.
>>
>> 1) When the jobtracker launches a task to be run, how does it
>> impact the currently running jobs if the the current running job
>> have higher, same, or lower priories using the default queue.
>>
>> 2) What if a low priority job is running that is holding all the
>> reducer slots and the mappers are halfway done and a high priority
>> job comes in took all the mappers but cannot complete but all the
>> reducer slots are taken by the low priority job?
>>
>>
>> Both 1) and 2) really depends on the scheduler you are using -
>> Default, FairShare or CapacityScheduler.
>>
>> With the Default scheduler, 2) is a real problem. The
>> CapacityScheduler doesn't allow priorities within the same queue
>> for precisely the same reason since it doesn't have preemption. I'm
>> not sure if FairScheduler handles it.
>>
>>
>> 3) when is mappers allocated on the slaves, and when is reducers
>> allocated.
>>
>>
>> Usually, reduces are allocated only after a certain percentage of
>> maps are complete (5% by default). Use
>> mapred.reduce.slowstart.completed.maps to control this. Look at
>> JobInProgress.java.
>>
>>
>> 4)Does mappers pass all the data to reducers using RPC or they
>> write their output to HDFS and the reducers pick it up.
>>
>>
>> Maps sort/combine their output and write to local-disk. The reduces
>> then copy them (we call it the 'shuffle' phase) over http. The TT
>> on which the map ran will serve the map's output via an embedded
>> webserver. Look at ReduceTask.java and
>> TaskTracker.MapOutputServelt.doGet.
>>
>>
>> 5) within a job, when and where is all the io occurs.
>>
>>
>> Typically input to map i.e. InputFormat and output of reduce i.e.
>> OutputFormat. Look at MapTask.java and ReduceTask.java.
>>
>> hope this helps,
>> Arun
>>
>>
>>
>> I know this seems to be a lot of low level questions , if you can
>> point me to the right place to look is should be enough.
>>
>> Thanks,
>>
>> Felix
>>
>>
>>
>
>
Re: How do hadoop work in details
Posted by felix gao <gr...@gmail.com>.
Anrun,
I went to the doc for capacity scheduler it seems the sum of the all the
capacities should be 100, but there isn't any options to specify the max.
The options are
mapred.capacity-scheduler.queue.<queue-name>.capacity,
mapred.capacity-scheduler.queue.<queue-name>.supports-priority,
and mapred.capacity-scheduler.queue.<queue-name>.minimum-user-limit-percent.
Can you show me how to configure the scheduler like you indicated? Also how
do I assign users to that queue? For example. all jobs submitted by user
hadoop uses the production queue and everyone else uses the default queue.
Thanks,
Felix
On Wed, Jan 12, 2011 at 1:50 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
> The approach I run for Yahoo, for pretty much the same use case, is to use
> the CapacityScheduler and define two queues:
> production
> adhoc
>
> Lets say you have 30% as the capacity you want for production, rest for
> adhoc.
> production could have 70% capacity, max-limit of 100
> adhoc can have 30% capacity, max-limit of 70/80.
>
> This way adhoc jobs can take upto 70-80% of the cluster, but save some for
> 'production' jobs at all times. You get the idea?
>
> I'm sure there are similar tricks for FairScheduler, I just am not familiar
> enough with it. I'll warn you that I only run Yahoo clusters, we use the CS
> everywhere.
>
> One other note: I'm in bang in the middle of releasing extensive
> enhancements to CapacityScheduler via hadoop-0.20.100 or whatever we decide
> to call it:
>
> http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html
>
> Arun
>
> On Jan 12, 2011, at 9:40 AM, felix gao wrote:
>
> Arun,
>
> The information is very helpful. What scheduler do you suggest to when we
> have mixed of production and adhoc jobs are running the same time using pig
> and we would like to guarantee the SLA for production task.
>
> Thanks,
>
> Felix
>
> On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>
>>
>> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>>
>> Hi all,
>>>
>>> I am trying to figure out how exactly happens inside the job.
>>>
>>> 1) When the jobtracker launches a task to be run, how does it impact the
>>> currently running jobs if the the current running job have higher, same, or
>>> lower priories using the default queue.
>>>
>>> 2) What if a low priority job is running that is holding all the reducer
>>> slots and the mappers are halfway done and a high priority job comes in took
>>> all the mappers but cannot complete but all the reducer slots are taken by
>>> the low priority job?
>>>
>>>
>> Both 1) and 2) really depends on the scheduler you are using - Default,
>> FairShare or CapacityScheduler.
>>
>> With the Default scheduler, 2) is a real problem. The CapacityScheduler
>> doesn't allow priorities within the same queue for precisely the same reason
>> since it doesn't have preemption. I'm not sure if FairScheduler handles it.
>>
>>
>> 3) when is mappers allocated on the slaves, and when is reducers
>>> allocated.
>>>
>>>
>> Usually, reduces are allocated only after a certain percentage of maps are
>> complete (5% by default). Use mapred.reduce.slowstart.completed.maps to
>> control this. Look at JobInProgress.java.
>>
>>
>> 4)Does mappers pass all the data to reducers using RPC or they write
>>> their output to HDFS and the reducers pick it up.
>>>
>>>
>> Maps sort/combine their output and write to local-disk. The reduces then
>> copy them (we call it the 'shuffle' phase) over http. The TT on which the
>> map ran will serve the map's output via an embedded webserver. Look at
>> ReduceTask.java and TaskTracker.MapOutputServelt.doGet.
>>
>>
>> 5) within a job, when and where is all the io occurs.
>>>
>>>
>> Typically input to map i.e. InputFormat and output of reduce i.e.
>> OutputFormat. Look at MapTask.java and ReduceTask.java.
>>
>> hope this helps,
>> Arun
>>
>>
>>
>>> I know this seems to be a lot of low level questions , if you can point
>>> me to the right place to look is should be enough.
>>>
>>> Thanks,
>>>
>>> Felix
>>>
>>>
>>
>
>
Re: How do hadoop work in details
Posted by Arun C Murthy <ac...@yahoo-inc.com>.
The approach I run for Yahoo, for pretty much the same use case, is to
use the CapacityScheduler and define two queues:
production
adhoc
Lets say you have 30% as the capacity you want for production, rest
for adhoc.
production could have 70% capacity, max-limit of 100
adhoc can have 30% capacity, max-limit of 70/80.
This way adhoc jobs can take upto 70-80% of the cluster, but save some
for 'production' jobs at all times. You get the idea?
I'm sure there are similar tricks for FairScheduler, I just am not
familiar enough with it. I'll warn you that I only run Yahoo clusters,
we use the CS everywhere.
One other note: I'm in bang in the middle of releasing extensive
enhancements to CapacityScheduler via hadoop-0.20.100 or whatever we
decide to call it:
http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html
Arun
On Jan 12, 2011, at 9:40 AM, felix gao wrote:
> Arun,
>
> The information is very helpful. What scheduler do you suggest to
> when we have mixed of production and adhoc jobs are running the same
> time using pig and we would like to guarantee the SLA for production
> task.
>
> Thanks,
>
> Felix
>
> On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <ac...@yahoo-inc.com>
> wrote:
>
> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>
> Hi all,
>
> I am trying to figure out how exactly happens inside the job.
>
> 1) When the jobtracker launches a task to be run, how does it impact
> the currently running jobs if the the current running job have
> higher, same, or lower priories using the default queue.
>
> 2) What if a low priority job is running that is holding all the
> reducer slots and the mappers are halfway done and a high priority
> job comes in took all the mappers but cannot complete but all the
> reducer slots are taken by the low priority job?
>
>
> Both 1) and 2) really depends on the scheduler you are using -
> Default, FairShare or CapacityScheduler.
>
> With the Default scheduler, 2) is a real problem. The
> CapacityScheduler doesn't allow priorities within the same queue for
> precisely the same reason since it doesn't have preemption. I'm not
> sure if FairScheduler handles it.
>
>
> 3) when is mappers allocated on the slaves, and when is reducers
> allocated.
>
>
> Usually, reduces are allocated only after a certain percentage of
> maps are complete (5% by default). Use
> mapred.reduce.slowstart.completed.maps to control this. Look at
> JobInProgress.java.
>
>
> 4)Does mappers pass all the data to reducers using RPC or they write
> their output to HDFS and the reducers pick it up.
>
>
> Maps sort/combine their output and write to local-disk. The reduces
> then copy them (we call it the 'shuffle' phase) over http. The TT on
> which the map ran will serve the map's output via an embedded
> webserver. Look at ReduceTask.java and
> TaskTracker.MapOutputServelt.doGet.
>
>
> 5) within a job, when and where is all the io occurs.
>
>
> Typically input to map i.e. InputFormat and output of reduce i.e.
> OutputFormat. Look at MapTask.java and ReduceTask.java.
>
> hope this helps,
> Arun
>
>
>
> I know this seems to be a lot of low level questions , if you can
> point me to the right place to look is should be enough.
>
> Thanks,
>
> Felix
>
>
>
Re: How do hadoop work in details
Posted by felix gao <gr...@gmail.com>.
Arun,
The information is very helpful. What scheduler do you suggest to when we
have mixed of production and adhoc jobs are running the same time using pig
and we would like to guarantee the SLA for production task.
Thanks,
Felix
On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>
> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>
> Hi all,
>>
>> I am trying to figure out how exactly happens inside the job.
>>
>> 1) When the jobtracker launches a task to be run, how does it impact the
>> currently running jobs if the the current running job have higher, same, or
>> lower priories using the default queue.
>>
>> 2) What if a low priority job is running that is holding all the reducer
>> slots and the mappers are halfway done and a high priority job comes in took
>> all the mappers but cannot complete but all the reducer slots are taken by
>> the low priority job?
>>
>>
> Both 1) and 2) really depends on the scheduler you are using - Default,
> FairShare or CapacityScheduler.
>
> With the Default scheduler, 2) is a real problem. The CapacityScheduler
> doesn't allow priorities within the same queue for precisely the same reason
> since it doesn't have preemption. I'm not sure if FairScheduler handles it.
>
>
> 3) when is mappers allocated on the slaves, and when is reducers
>> allocated.
>>
>>
> Usually, reduces are allocated only after a certain percentage of maps are
> complete (5% by default). Use mapred.reduce.slowstart.completed.maps to
> control this. Look at JobInProgress.java.
>
>
> 4)Does mappers pass all the data to reducers using RPC or they write their
>> output to HDFS and the reducers pick it up.
>>
>>
> Maps sort/combine their output and write to local-disk. The reduces then
> copy them (we call it the 'shuffle' phase) over http. The TT on which the
> map ran will serve the map's output via an embedded webserver. Look at
> ReduceTask.java and TaskTracker.MapOutputServelt.doGet.
>
>
> 5) within a job, when and where is all the io occurs.
>>
>>
> Typically input to map i.e. InputFormat and output of reduce i.e.
> OutputFormat. Look at MapTask.java and ReduceTask.java.
>
> hope this helps,
> Arun
>
>
>
>> I know this seems to be a lot of low level questions , if you can point me
>> to the right place to look is should be enough.
>>
>> Thanks,
>>
>> Felix
>>
>>
>