You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Rahul Bhattacharjee <ra...@gmail.com> on 2013/05/11 17:01:28 UTC

Hadoop schedulers!

Hi,

I was going through the job schedulers of Hadoop and could not see any
major operational difference between the capacity scheduler and the fair
share scheduler apart from the fact that fair share scheduler supports
preemption and capacity scheduler doesn't.

Another thing is the former creates logical pools based on certain
attribute like username , user group etc and the later has a notion of job
queues. Can someone point me to any other major differences between these
two types of schedulers.

Another question in this regard is the capacity scheduler uses a FIFO
queue.So its still possible that a high priority long running job using all
the capacity allocated to the queue to block all the other jobs after it in
the queue.I think this is the expected behavior , but wanted to confirm.

Thanks,
Rahul

Re: Hadoop schedulers!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks a lot for the replies , it was really helpful.


On Tue, May 14, 2013 at 1:02 AM, Alok Kumar <al...@gmail.com> wrote:

> Hi,
>
> As the name suggest, Fair-scheduler does a fair allocation of slot to the
> jobs.
> Let say, you have 10 map slots in your cluster and it is occupied by a
> job-1 which requires 30 map slot to finish. But the same time, another
> job-2 require only 2 map slots to finish - Here slots will be provided to
> job-2 to get finished quickly while job-1 will be keep running.
>
>
>
> On Tue, May 14, 2013 at 12:02 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Any pointer to my question.
>>
>> There is another question , kind-of dumb , but just wanted to clarify.
>>
>> Say in a FIFO scheduler or a capacity scheduler , if there are slots
>> available and the first job doesn't need all of the available slots , then
>> the job next in the queue is scheduled for execution or that still waits
>> for the first job to finish?
>>
>
> - Jobs don't wait for all the slots to get freed. Execution will start as
> soon as it get a slot. However, Hadoop does its best to allot a slot where
> job can achieve data locality.
>
>
>
>>  Thanks,
>> Rahul
>>
>>
>> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I was going through the job schedulers of Hadoop and could not see any
>>> major operational difference between the capacity scheduler and the fair
>>> share scheduler apart from the fact that fair share scheduler supports
>>> preemption and capacity scheduler doesn't.
>>>
>>> Another thing is the former creates logical pools based on certain
>>> attribute like username , user group etc and the later has a notion of job
>>> queues. Can someone point me to any other major differences between these
>>> two types of schedulers.
>>>
>>> Another question in this regard is the capacity scheduler uses a FIFO
>>> queue.So its still possible that a high priority long running job using all
>>> the capacity allocated to the queue to block all the other jobs after it in
>>> the queue.I think this is the expected behavior , but wanted to confirm.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>>
>
> Thanks
> --
> Alok
>

Re: Hadoop schedulers!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks a lot for the replies , it was really helpful.


On Tue, May 14, 2013 at 1:02 AM, Alok Kumar <al...@gmail.com> wrote:

> Hi,
>
> As the name suggest, Fair-scheduler does a fair allocation of slot to the
> jobs.
> Let say, you have 10 map slots in your cluster and it is occupied by a
> job-1 which requires 30 map slot to finish. But the same time, another
> job-2 require only 2 map slots to finish - Here slots will be provided to
> job-2 to get finished quickly while job-1 will be keep running.
>
>
>
> On Tue, May 14, 2013 at 12:02 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Any pointer to my question.
>>
>> There is another question , kind-of dumb , but just wanted to clarify.
>>
>> Say in a FIFO scheduler or a capacity scheduler , if there are slots
>> available and the first job doesn't need all of the available slots , then
>> the job next in the queue is scheduled for execution or that still waits
>> for the first job to finish?
>>
>
> - Jobs don't wait for all the slots to get freed. Execution will start as
> soon as it get a slot. However, Hadoop does its best to allot a slot where
> job can achieve data locality.
>
>
>
>>  Thanks,
>> Rahul
>>
>>
>> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I was going through the job schedulers of Hadoop and could not see any
>>> major operational difference between the capacity scheduler and the fair
>>> share scheduler apart from the fact that fair share scheduler supports
>>> preemption and capacity scheduler doesn't.
>>>
>>> Another thing is the former creates logical pools based on certain
>>> attribute like username , user group etc and the later has a notion of job
>>> queues. Can someone point me to any other major differences between these
>>> two types of schedulers.
>>>
>>> Another question in this regard is the capacity scheduler uses a FIFO
>>> queue.So its still possible that a high priority long running job using all
>>> the capacity allocated to the queue to block all the other jobs after it in
>>> the queue.I think this is the expected behavior , but wanted to confirm.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>>
>
> Thanks
> --
> Alok
>

Re: Hadoop schedulers!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks a lot for the replies , it was really helpful.


On Tue, May 14, 2013 at 1:02 AM, Alok Kumar <al...@gmail.com> wrote:

> Hi,
>
> As the name suggest, Fair-scheduler does a fair allocation of slot to the
> jobs.
> Let say, you have 10 map slots in your cluster and it is occupied by a
> job-1 which requires 30 map slot to finish. But the same time, another
> job-2 require only 2 map slots to finish - Here slots will be provided to
> job-2 to get finished quickly while job-1 will be keep running.
>
>
>
> On Tue, May 14, 2013 at 12:02 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Any pointer to my question.
>>
>> There is another question , kind-of dumb , but just wanted to clarify.
>>
>> Say in a FIFO scheduler or a capacity scheduler , if there are slots
>> available and the first job doesn't need all of the available slots , then
>> the job next in the queue is scheduled for execution or that still waits
>> for the first job to finish?
>>
>
> - Jobs don't wait for all the slots to get freed. Execution will start as
> soon as it get a slot. However, Hadoop does its best to allot a slot where
> job can achieve data locality.
>
>
>
>>  Thanks,
>> Rahul
>>
>>
>> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I was going through the job schedulers of Hadoop and could not see any
>>> major operational difference between the capacity scheduler and the fair
>>> share scheduler apart from the fact that fair share scheduler supports
>>> preemption and capacity scheduler doesn't.
>>>
>>> Another thing is the former creates logical pools based on certain
>>> attribute like username , user group etc and the later has a notion of job
>>> queues. Can someone point me to any other major differences between these
>>> two types of schedulers.
>>>
>>> Another question in this regard is the capacity scheduler uses a FIFO
>>> queue.So its still possible that a high priority long running job using all
>>> the capacity allocated to the queue to block all the other jobs after it in
>>> the queue.I think this is the expected behavior , but wanted to confirm.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>>
>
> Thanks
> --
> Alok
>

Re: Hadoop schedulers!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks a lot for the replies , it was really helpful.


On Tue, May 14, 2013 at 1:02 AM, Alok Kumar <al...@gmail.com> wrote:

> Hi,
>
> As the name suggest, Fair-scheduler does a fair allocation of slot to the
> jobs.
> Let say, you have 10 map slots in your cluster and it is occupied by a
> job-1 which requires 30 map slot to finish. But the same time, another
> job-2 require only 2 map slots to finish - Here slots will be provided to
> job-2 to get finished quickly while job-1 will be keep running.
>
>
>
> On Tue, May 14, 2013 at 12:02 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Any pointer to my question.
>>
>> There is another question , kind-of dumb , but just wanted to clarify.
>>
>> Say in a FIFO scheduler or a capacity scheduler , if there are slots
>> available and the first job doesn't need all of the available slots , then
>> the job next in the queue is scheduled for execution or that still waits
>> for the first job to finish?
>>
>
> - Jobs don't wait for all the slots to get freed. Execution will start as
> soon as it get a slot. However, Hadoop does its best to allot a slot where
> job can achieve data locality.
>
>
>
>>  Thanks,
>> Rahul
>>
>>
>> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I was going through the job schedulers of Hadoop and could not see any
>>> major operational difference between the capacity scheduler and the fair
>>> share scheduler apart from the fact that fair share scheduler supports
>>> preemption and capacity scheduler doesn't.
>>>
>>> Another thing is the former creates logical pools based on certain
>>> attribute like username , user group etc and the later has a notion of job
>>> queues. Can someone point me to any other major differences between these
>>> two types of schedulers.
>>>
>>> Another question in this regard is the capacity scheduler uses a FIFO
>>> queue.So its still possible that a high priority long running job using all
>>> the capacity allocated to the queue to block all the other jobs after it in
>>> the queue.I think this is the expected behavior , but wanted to confirm.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>>
>
> Thanks
> --
> Alok
>

Re: Hadoop schedulers!

Posted by Alok Kumar <al...@gmail.com>.

Hi,

As the name suggest, Fair-scheduler does a fair allocation of slot to the
jobs.
Let say, you have 10 map slots in your cluster and it is occupied by a
job-1 which requires 30 map slot to finish. But the same time, another
job-2 require only 2 map slots to finish - Here slots will be provided to
job-2 to get finished quickly while job-1 will be keep running.



On Tue, May 14, 2013 at 12:02 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Any pointer to my question.
>
> There is another question , kind-of dumb , but just wanted to clarify.
>
> Say in a FIFO scheduler or a capacity scheduler , if there are slots
> available and the first job doesn't need all of the available slots , then
> the job next in the queue is scheduled for execution or that still waits
> for the first job to finish?
>

- Jobs don't wait for all the slots to get freed. Execution will start as
soon as it get a slot. However, Hadoop does its best to allot a slot where
job can achieve data locality.



> Thanks,
> Rahul
>
>
> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi,
>>
>> I was going through the job schedulers of Hadoop and could not see any
>> major operational difference between the capacity scheduler and the fair
>> share scheduler apart from the fact that fair share scheduler supports
>> preemption and capacity scheduler doesn't.
>>
>> Another thing is the former creates logical pools based on certain
>> attribute like username , user group etc and the later has a notion of job
>> queues. Can someone point me to any other major differences between these
>> two types of schedulers.
>>
>> Another question in this regard is the capacity scheduler uses a FIFO
>> queue.So its still possible that a high priority long running job using all
>> the capacity allocated to the queue to block all the other jobs after it in
>> the queue.I think this is the expected behavior , but wanted to confirm.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Thanks
-- 
Alok

Re: Hadoop schedulers!

Posted by Alok Kumar <al...@gmail.com>.

Hi,

As the name suggest, Fair-scheduler does a fair allocation of slot to the
jobs.
Let say, you have 10 map slots in your cluster and it is occupied by a
job-1 which requires 30 map slot to finish. But the same time, another
job-2 require only 2 map slots to finish - Here slots will be provided to
job-2 to get finished quickly while job-1 will be keep running.



On Tue, May 14, 2013 at 12:02 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Any pointer to my question.
>
> There is another question , kind-of dumb , but just wanted to clarify.
>
> Say in a FIFO scheduler or a capacity scheduler , if there are slots
> available and the first job doesn't need all of the available slots , then
> the job next in the queue is scheduled for execution or that still waits
> for the first job to finish?
>

- Jobs don't wait for all the slots to get freed. Execution will start as
soon as it get a slot. However, Hadoop does its best to allot a slot where
job can achieve data locality.



> Thanks,
> Rahul
>
>
> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi,
>>
>> I was going through the job schedulers of Hadoop and could not see any
>> major operational difference between the capacity scheduler and the fair
>> share scheduler apart from the fact that fair share scheduler supports
>> preemption and capacity scheduler doesn't.
>>
>> Another thing is the former creates logical pools based on certain
>> attribute like username , user group etc and the later has a notion of job
>> queues. Can someone point me to any other major differences between these
>> two types of schedulers.
>>
>> Another question in this regard is the capacity scheduler uses a FIFO
>> queue.So its still possible that a high priority long running job using all
>> the capacity allocated to the queue to block all the other jobs after it in
>> the queue.I think this is the expected behavior , but wanted to confirm.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Thanks
-- 
Alok

Re: Hadoop schedulers!

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Rahul,

You're right that the schedulers have evolved to support many of the same
features.  To your second question, I haven't looked in detail at the FIFO
or capacity schedulers, but for the FIFO mode in the fair scheduler, jobs
next in the queue will get slots if the first one isn't using them all.  As
Harsh says, the fair scheduler queues can also work in the way that
capacity scheduler queues do.

-Sandy


On Mon, May 13, 2013 at 11:32 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Any pointer to my question.
>
> There is another question , kind-of dumb , but just wanted to clarify.
>
> Say in a FIFO scheduler or a capacity scheduler , if there are slots
> available and the first job doesn't need all of the available slots , then
> the job next in the queue is scheduled for execution or that still waits
> for the first job to finish?
>
> Thanks,
> Rahul
>
>
> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi,
>>
>> I was going through the job schedulers of Hadoop and could not see any
>> major operational difference between the capacity scheduler and the fair
>> share scheduler apart from the fact that fair share scheduler supports
>> preemption and capacity scheduler doesn't.
>>
>> Another thing is the former creates logical pools based on certain
>> attribute like username , user group etc and the later has a notion of job
>> queues. Can someone point me to any other major differences between these
>> two types of schedulers.
>>
>> Another question in this regard is the capacity scheduler uses a FIFO
>> queue.So its still possible that a high priority long running job using all
>> the capacity allocated to the queue to block all the other jobs after it in
>> the queue.I think this is the expected behavior , but wanted to confirm.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Re: Hadoop schedulers!

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Rahul,

You're right that the schedulers have evolved to support many of the same
features.  To your second question, I haven't looked in detail at the FIFO
or capacity schedulers, but for the FIFO mode in the fair scheduler, jobs
next in the queue will get slots if the first one isn't using them all.  As
Harsh says, the fair scheduler queues can also work in the way that
capacity scheduler queues do.

-Sandy


On Mon, May 13, 2013 at 11:32 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Any pointer to my question.
>
> There is another question , kind-of dumb , but just wanted to clarify.
>
> Say in a FIFO scheduler or a capacity scheduler , if there are slots
> available and the first job doesn't need all of the available slots , then
> the job next in the queue is scheduled for execution or that still waits
> for the first job to finish?
>
> Thanks,
> Rahul
>
>
> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi,
>>
>> I was going through the job schedulers of Hadoop and could not see any
>> major operational difference between the capacity scheduler and the fair
>> share scheduler apart from the fact that fair share scheduler supports
>> preemption and capacity scheduler doesn't.
>>
>> Another thing is the former creates logical pools based on certain
>> attribute like username , user group etc and the later has a notion of job
>> queues. Can someone point me to any other major differences between these
>> two types of schedulers.
>>
>> Another question in this regard is the capacity scheduler uses a FIFO
>> queue.So its still possible that a high priority long running job using all
>> the capacity allocated to the queue to block all the other jobs after it in
>> the queue.I think this is the expected behavior , but wanted to confirm.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Re: Hadoop schedulers!

Posted by Alok Kumar <al...@gmail.com>.

Hi,

As the name suggest, Fair-scheduler does a fair allocation of slot to the
jobs.
Let say, you have 10 map slots in your cluster and it is occupied by a
job-1 which requires 30 map slot to finish. But the same time, another
job-2 require only 2 map slots to finish - Here slots will be provided to
job-2 to get finished quickly while job-1 will be keep running.



On Tue, May 14, 2013 at 12:02 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Any pointer to my question.
>
> There is another question , kind-of dumb , but just wanted to clarify.
>
> Say in a FIFO scheduler or a capacity scheduler , if there are slots
> available and the first job doesn't need all of the available slots , then
> the job next in the queue is scheduled for execution or that still waits
> for the first job to finish?
>

- Jobs don't wait for all the slots to get freed. Execution will start as
soon as it get a slot. However, Hadoop does its best to allot a slot where
job can achieve data locality.



> Thanks,
> Rahul
>
>
> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi,
>>
>> I was going through the job schedulers of Hadoop and could not see any
>> major operational difference between the capacity scheduler and the fair
>> share scheduler apart from the fact that fair share scheduler supports
>> preemption and capacity scheduler doesn't.
>>
>> Another thing is the former creates logical pools based on certain
>> attribute like username , user group etc and the later has a notion of job
>> queues. Can someone point me to any other major differences between these
>> two types of schedulers.
>>
>> Another question in this regard is the capacity scheduler uses a FIFO
>> queue.So its still possible that a high priority long running job using all
>> the capacity allocated to the queue to block all the other jobs after it in
>> the queue.I think this is the expected behavior , but wanted to confirm.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Thanks
-- 
Alok

Re: Hadoop schedulers!

Posted by Alok Kumar <al...@gmail.com>.

Hi,

As the name suggest, Fair-scheduler does a fair allocation of slot to the
jobs.
Let say, you have 10 map slots in your cluster and it is occupied by a
job-1 which requires 30 map slot to finish. But the same time, another
job-2 require only 2 map slots to finish - Here slots will be provided to
job-2 to get finished quickly while job-1 will be keep running.



On Tue, May 14, 2013 at 12:02 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Any pointer to my question.
>
> There is another question , kind-of dumb , but just wanted to clarify.
>
> Say in a FIFO scheduler or a capacity scheduler , if there are slots
> available and the first job doesn't need all of the available slots , then
> the job next in the queue is scheduled for execution or that still waits
> for the first job to finish?
>

- Jobs don't wait for all the slots to get freed. Execution will start as
soon as it get a slot. However, Hadoop does its best to allot a slot where
job can achieve data locality.



> Thanks,
> Rahul
>
>
> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi,
>>
>> I was going through the job schedulers of Hadoop and could not see any
>> major operational difference between the capacity scheduler and the fair
>> share scheduler apart from the fact that fair share scheduler supports
>> preemption and capacity scheduler doesn't.
>>
>> Another thing is the former creates logical pools based on certain
>> attribute like username , user group etc and the later has a notion of job
>> queues. Can someone point me to any other major differences between these
>> two types of schedulers.
>>
>> Another question in this regard is the capacity scheduler uses a FIFO
>> queue.So its still possible that a high priority long running job using all
>> the capacity allocated to the queue to block all the other jobs after it in
>> the queue.I think this is the expected behavior , but wanted to confirm.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Thanks
-- 
Alok

Re: Hadoop schedulers!

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Rahul,

You're right that the schedulers have evolved to support many of the same
features.  To your second question, I haven't looked in detail at the FIFO
or capacity schedulers, but for the FIFO mode in the fair scheduler, jobs
next in the queue will get slots if the first one isn't using them all.  As
Harsh says, the fair scheduler queues can also work in the way that
capacity scheduler queues do.

-Sandy


On Mon, May 13, 2013 at 11:32 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Any pointer to my question.
>
> There is another question , kind-of dumb , but just wanted to clarify.
>
> Say in a FIFO scheduler or a capacity scheduler , if there are slots
> available and the first job doesn't need all of the available slots , then
> the job next in the queue is scheduled for execution or that still waits
> for the first job to finish?
>
> Thanks,
> Rahul
>
>
> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi,
>>
>> I was going through the job schedulers of Hadoop and could not see any
>> major operational difference between the capacity scheduler and the fair
>> share scheduler apart from the fact that fair share scheduler supports
>> preemption and capacity scheduler doesn't.
>>
>> Another thing is the former creates logical pools based on certain
>> attribute like username , user group etc and the later has a notion of job
>> queues. Can someone point me to any other major differences between these
>> two types of schedulers.
>>
>> Another question in this regard is the capacity scheduler uses a FIFO
>> queue.So its still possible that a high priority long running job using all
>> the capacity allocated to the queue to block all the other jobs after it in
>> the queue.I think this is the expected behavior , but wanted to confirm.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Re: Hadoop schedulers!

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Rahul,

You're right that the schedulers have evolved to support many of the same
features.  To your second question, I haven't looked in detail at the FIFO
or capacity schedulers, but for the FIFO mode in the fair scheduler, jobs
next in the queue will get slots if the first one isn't using them all.  As
Harsh says, the fair scheduler queues can also work in the way that
capacity scheduler queues do.

-Sandy


On Mon, May 13, 2013 at 11:32 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Any pointer to my question.
>
> There is another question , kind-of dumb , but just wanted to clarify.
>
> Say in a FIFO scheduler or a capacity scheduler , if there are slots
> available and the first job doesn't need all of the available slots , then
> the job next in the queue is scheduled for execution or that still waits
> for the first job to finish?
>
> Thanks,
> Rahul
>
>
> On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi,
>>
>> I was going through the job schedulers of Hadoop and could not see any
>> major operational difference between the capacity scheduler and the fair
>> share scheduler apart from the fact that fair share scheduler supports
>> preemption and capacity scheduler doesn't.
>>
>> Another thing is the former creates logical pools based on certain
>> attribute like username , user group etc and the later has a notion of job
>> queues. Can someone point me to any other major differences between these
>> two types of schedulers.
>>
>> Another question in this regard is the capacity scheduler uses a FIFO
>> queue.So its still possible that a high priority long running job using all
>> the capacity allocated to the queue to block all the other jobs after it in
>> the queue.I think this is the expected behavior , but wanted to confirm.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>

Re: Hadoop schedulers!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Any pointer to my question.

There is another question , kind-of dumb , but just wanted to clarify.

Say in a FIFO scheduler or a capacity scheduler , if there are slots
available and the first job doesn't need all of the available slots , then
the job next in the queue is scheduled for execution or that still waits
for the first job to finish?

Thanks,
Rahul


On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi,
>
> I was going through the job schedulers of Hadoop and could not see any
> major operational difference between the capacity scheduler and the fair
> share scheduler apart from the fact that fair share scheduler supports
> preemption and capacity scheduler doesn't.
>
> Another thing is the former creates logical pools based on certain
> attribute like username , user group etc and the later has a notion of job
> queues. Can someone point me to any other major differences between these
> two types of schedulers.
>
> Another question in this regard is the capacity scheduler uses a FIFO
> queue.So its still possible that a high priority long running job using all
> the capacity allocated to the queue to block all the other jobs after it in
> the queue.I think this is the expected behavior , but wanted to confirm.
>
> Thanks,
> Rahul
>
>
>

Re: Hadoop schedulers!

Posted by Harsh J <ha...@cloudera.com>.

Hi,

On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> I was going through the job schedulers of Hadoop and could not see any major
> operational difference between the capacity scheduler and the fair share
> scheduler apart from the fact that fair share scheduler supports preemption
> and capacity scheduler doesn't.

I'd suggest reading design of both schedulers. The preemption feature
is not the only difference - there is also differences in how the
queues behave and how the tasks from various lined jobs are picked for
scheduling (i.e. the base algorithm).

> Another thing is the former creates logical pools based on certain attribute
> like username , user group etc and the later has a notion of job queues. Can
> someone point me to any other major differences between these two types of
> schedulers.

Note that FairScheduler can also reuse the queues concept if you point
the pool name property at the queue name property config.

> Another question in this regard is the capacity scheduler uses a FIFO
> queue.So its still possible that a high priority long running job using all
> the capacity allocated to the queue to block all the other jobs after it in
> the queue.I think this is the expected behavior , but wanted to confirm.

I think this is the case, yes, if all the capacity has been soaked up
currently. However, the CS doesn't wait on job completions to schedule
next jobs if slots are free (like say, in the last wave).

> Thanks,
> Rahul
>
>

--
Harsh J

Re: Hadoop schedulers!

Posted by Harsh J <ha...@cloudera.com>.

Hi,

On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> I was going through the job schedulers of Hadoop and could not see any major
> operational difference between the capacity scheduler and the fair share
> scheduler apart from the fact that fair share scheduler supports preemption
> and capacity scheduler doesn't.

I'd suggest reading design of both schedulers. The preemption feature
is not the only difference - there is also differences in how the
queues behave and how the tasks from various lined jobs are picked for
scheduling (i.e. the base algorithm).

> Another thing is the former creates logical pools based on certain attribute
> like username , user group etc and the later has a notion of job queues. Can
> someone point me to any other major differences between these two types of
> schedulers.

Note that FairScheduler can also reuse the queues concept if you point
the pool name property at the queue name property config.

> Another question in this regard is the capacity scheduler uses a FIFO
> queue.So its still possible that a high priority long running job using all
> the capacity allocated to the queue to block all the other jobs after it in
> the queue.I think this is the expected behavior , but wanted to confirm.

I think this is the case, yes, if all the capacity has been soaked up
currently. However, the CS doesn't wait on job completions to schedule
next jobs if slots are free (like say, in the last wave).

> Thanks,
> Rahul
>
>

--
Harsh J

Re: Hadoop schedulers!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Any pointer to my question.

There is another question , kind-of dumb , but just wanted to clarify.

Say in a FIFO scheduler or a capacity scheduler , if there are slots
available and the first job doesn't need all of the available slots , then
the job next in the queue is scheduled for execution or that still waits
for the first job to finish?

Thanks,
Rahul


On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi,
>
> I was going through the job schedulers of Hadoop and could not see any
> major operational difference between the capacity scheduler and the fair
> share scheduler apart from the fact that fair share scheduler supports
> preemption and capacity scheduler doesn't.
>
> Another thing is the former creates logical pools based on certain
> attribute like username , user group etc and the later has a notion of job
> queues. Can someone point me to any other major differences between these
> two types of schedulers.
>
> Another question in this regard is the capacity scheduler uses a FIFO
> queue.So its still possible that a high priority long running job using all
> the capacity allocated to the queue to block all the other jobs after it in
> the queue.I think this is the expected behavior , but wanted to confirm.
>
> Thanks,
> Rahul
>
>
>

Re: Hadoop schedulers!

Posted by Harsh J <ha...@cloudera.com>.

Hi,

On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> I was going through the job schedulers of Hadoop and could not see any major
> operational difference between the capacity scheduler and the fair share
> scheduler apart from the fact that fair share scheduler supports preemption
> and capacity scheduler doesn't.

I'd suggest reading design of both schedulers. The preemption feature
is not the only difference - there is also differences in how the
queues behave and how the tasks from various lined jobs are picked for
scheduling (i.e. the base algorithm).

> Another thing is the former creates logical pools based on certain attribute
> like username , user group etc and the later has a notion of job queues. Can
> someone point me to any other major differences between these two types of
> schedulers.

Note that FairScheduler can also reuse the queues concept if you point
the pool name property at the queue name property config.

> Another question in this regard is the capacity scheduler uses a FIFO
> queue.So its still possible that a high priority long running job using all
> the capacity allocated to the queue to block all the other jobs after it in
> the queue.I think this is the expected behavior , but wanted to confirm.

I think this is the case, yes, if all the capacity has been soaked up
currently. However, the CS doesn't wait on job completions to schedule
next jobs if slots are free (like say, in the last wave).

> Thanks,
> Rahul
>
>

--
Harsh J

Re: Hadoop schedulers!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Any pointer to my question.

There is another question , kind-of dumb , but just wanted to clarify.

Say in a FIFO scheduler or a capacity scheduler , if there are slots
available and the first job doesn't need all of the available slots , then
the job next in the queue is scheduled for execution or that still waits
for the first job to finish?

Thanks,
Rahul


On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi,
>
> I was going through the job schedulers of Hadoop and could not see any
> major operational difference between the capacity scheduler and the fair
> share scheduler apart from the fact that fair share scheduler supports
> preemption and capacity scheduler doesn't.
>
> Another thing is the former creates logical pools based on certain
> attribute like username , user group etc and the later has a notion of job
> queues. Can someone point me to any other major differences between these
> two types of schedulers.
>
> Another question in this regard is the capacity scheduler uses a FIFO
> queue.So its still possible that a high priority long running job using all
> the capacity allocated to the queue to block all the other jobs after it in
> the queue.I think this is the expected behavior , but wanted to confirm.
>
> Thanks,
> Rahul
>
>
>

Re: Hadoop schedulers!

Posted by Harsh J <ha...@cloudera.com>.

Hi,

On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> I was going through the job schedulers of Hadoop and could not see any major
> operational difference between the capacity scheduler and the fair share
> scheduler apart from the fact that fair share scheduler supports preemption
> and capacity scheduler doesn't.

I'd suggest reading design of both schedulers. The preemption feature
is not the only difference - there is also differences in how the
queues behave and how the tasks from various lined jobs are picked for
scheduling (i.e. the base algorithm).

> Another thing is the former creates logical pools based on certain attribute
> like username , user group etc and the later has a notion of job queues. Can
> someone point me to any other major differences between these two types of
> schedulers.

Note that FairScheduler can also reuse the queues concept if you point
the pool name property at the queue name property config.

> Another question in this regard is the capacity scheduler uses a FIFO
> queue.So its still possible that a high priority long running job using all
> the capacity allocated to the queue to block all the other jobs after it in
> the queue.I think this is the expected behavior , but wanted to confirm.

I think this is the case, yes, if all the capacity has been soaked up
currently. However, the CS doesn't wait on job completions to schedule
next jobs if slots are free (like say, in the last wave).

> Thanks,
> Rahul
>
>

--
Harsh J

Re: Hadoop schedulers!

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Any pointer to my question.

There is another question , kind-of dumb , but just wanted to clarify.

Say in a FIFO scheduler or a capacity scheduler , if there are slots
available and the first job doesn't need all of the available slots , then
the job next in the queue is scheduled for execution or that still waits
for the first job to finish?

Thanks,
Rahul


On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi,
>
> I was going through the job schedulers of Hadoop and could not see any
> major operational difference between the capacity scheduler and the fair
> share scheduler apart from the fact that fair share scheduler supports
> preemption and capacity scheduler doesn't.
>
> Another thing is the former creates logical pools based on certain
> attribute like username , user group etc and the later has a notion of job
> queues. Can someone point me to any other major differences between these
> two types of schedulers.
>
> Another question in this regard is the capacity scheduler uses a FIFO
> queue.So its still possible that a high priority long running job using all
> the capacity allocated to the queue to block all the other jobs after it in
> the queue.I think this is the expected behavior , but wanted to confirm.
>
> Thanks,
> Rahul
>
>
>