You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2013/09/09 02:21:51 UTC

Scheduler question

Do the Hadoop 2.0 YARN scheduler(s) deal with situations like the following?
Hadoop cluster of 10 nodes, with 8GB each available for containers.  There is only one queue.
Application A requests 100 4GB containers.  It initially, or after a little while, gets 20 containers.
Later, application B requests 1 8GB container.
Suppose that App-A's containers each take a few minutes.  At some point one will complete.  When that happens, will the scheduler immediately allocate another 4GB container to App-A?  If so will App-B ever get its container until App-A is almost done?
Thanks
John


RE: Scheduler question

Posted by John Lilley <jo...@redpoint.net>.
Thanks!  That makes perfect sense.
john

From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Monday, September 09, 2013 4:17 AM
To: user@hadoop.apache.org
Subject: Re: Scheduler question

Hi John,

YARN schedulers handle this with the concept of "reservations".  Scheduling decisions occur on node heartbeats.  When a node that is full heartbeats, the next application that should be able to place a container on it gets to place a "reservation" on it.  Each node has space for a single reservation.  Containers for other applications will not be placed on the node until a reservation is fulfilled.

If you are using the Fair Scheduler (Capacity Scheduler works similarly, but I'm not sure on the specifics), this means that app B would get containers far before app A completed, but not soon either.  After app A gets its 20 containers, it would get reservations as well on the nodes. After one of app A's containers finishes on a node, it would get to place another container on that node to fulfill its reservation.  Then app B would get a reservation on that node.  Then no containers would be placed on that node until app B is able to place one, which would be after both of app A's containers finish.

It's also possible to configure the schedulers to use preemption to make this kind of thing go a lot faster.

Does that make some sense?

-Sandy

On Mon, Sep 9, 2013 at 7:21 AM, John Lilley <jo...@redpoint.net>> wrote:
Do the Hadoop 2.0 YARN scheduler(s) deal with situations like the following?
Hadoop cluster of 10 nodes, with 8GB each available for containers.  There is only one queue.
Application A requests 100 4GB containers.  It initially, or after a little while, gets 20 containers.
Later, application B requests 1 8GB container.
Suppose that App-A's containers each take a few minutes.  At some point one will complete.  When that happens, will the scheduler immediately allocate another 4GB container to App-A?  If so will App-B ever get its container until App-A is almost done?
Thanks
John



RE: Scheduler question

Posted by John Lilley <jo...@redpoint.net>.
Thanks!  That makes perfect sense.
john

From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Monday, September 09, 2013 4:17 AM
To: user@hadoop.apache.org
Subject: Re: Scheduler question

Hi John,

YARN schedulers handle this with the concept of "reservations".  Scheduling decisions occur on node heartbeats.  When a node that is full heartbeats, the next application that should be able to place a container on it gets to place a "reservation" on it.  Each node has space for a single reservation.  Containers for other applications will not be placed on the node until a reservation is fulfilled.

If you are using the Fair Scheduler (Capacity Scheduler works similarly, but I'm not sure on the specifics), this means that app B would get containers far before app A completed, but not soon either.  After app A gets its 20 containers, it would get reservations as well on the nodes. After one of app A's containers finishes on a node, it would get to place another container on that node to fulfill its reservation.  Then app B would get a reservation on that node.  Then no containers would be placed on that node until app B is able to place one, which would be after both of app A's containers finish.

It's also possible to configure the schedulers to use preemption to make this kind of thing go a lot faster.

Does that make some sense?

-Sandy

On Mon, Sep 9, 2013 at 7:21 AM, John Lilley <jo...@redpoint.net>> wrote:
Do the Hadoop 2.0 YARN scheduler(s) deal with situations like the following?
Hadoop cluster of 10 nodes, with 8GB each available for containers.  There is only one queue.
Application A requests 100 4GB containers.  It initially, or after a little while, gets 20 containers.
Later, application B requests 1 8GB container.
Suppose that App-A's containers each take a few minutes.  At some point one will complete.  When that happens, will the scheduler immediately allocate another 4GB container to App-A?  If so will App-B ever get its container until App-A is almost done?
Thanks
John



RE: Scheduler question

Posted by John Lilley <jo...@redpoint.net>.
Thanks!  That makes perfect sense.
john

From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Monday, September 09, 2013 4:17 AM
To: user@hadoop.apache.org
Subject: Re: Scheduler question

Hi John,

YARN schedulers handle this with the concept of "reservations".  Scheduling decisions occur on node heartbeats.  When a node that is full heartbeats, the next application that should be able to place a container on it gets to place a "reservation" on it.  Each node has space for a single reservation.  Containers for other applications will not be placed on the node until a reservation is fulfilled.

If you are using the Fair Scheduler (Capacity Scheduler works similarly, but I'm not sure on the specifics), this means that app B would get containers far before app A completed, but not soon either.  After app A gets its 20 containers, it would get reservations as well on the nodes. After one of app A's containers finishes on a node, it would get to place another container on that node to fulfill its reservation.  Then app B would get a reservation on that node.  Then no containers would be placed on that node until app B is able to place one, which would be after both of app A's containers finish.

It's also possible to configure the schedulers to use preemption to make this kind of thing go a lot faster.

Does that make some sense?

-Sandy

On Mon, Sep 9, 2013 at 7:21 AM, John Lilley <jo...@redpoint.net>> wrote:
Do the Hadoop 2.0 YARN scheduler(s) deal with situations like the following?
Hadoop cluster of 10 nodes, with 8GB each available for containers.  There is only one queue.
Application A requests 100 4GB containers.  It initially, or after a little while, gets 20 containers.
Later, application B requests 1 8GB container.
Suppose that App-A's containers each take a few minutes.  At some point one will complete.  When that happens, will the scheduler immediately allocate another 4GB container to App-A?  If so will App-B ever get its container until App-A is almost done?
Thanks
John



RE: Scheduler question

Posted by John Lilley <jo...@redpoint.net>.
Thanks!  That makes perfect sense.
john

From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Monday, September 09, 2013 4:17 AM
To: user@hadoop.apache.org
Subject: Re: Scheduler question

Hi John,

YARN schedulers handle this with the concept of "reservations".  Scheduling decisions occur on node heartbeats.  When a node that is full heartbeats, the next application that should be able to place a container on it gets to place a "reservation" on it.  Each node has space for a single reservation.  Containers for other applications will not be placed on the node until a reservation is fulfilled.

If you are using the Fair Scheduler (Capacity Scheduler works similarly, but I'm not sure on the specifics), this means that app B would get containers far before app A completed, but not soon either.  After app A gets its 20 containers, it would get reservations as well on the nodes. After one of app A's containers finishes on a node, it would get to place another container on that node to fulfill its reservation.  Then app B would get a reservation on that node.  Then no containers would be placed on that node until app B is able to place one, which would be after both of app A's containers finish.

It's also possible to configure the schedulers to use preemption to make this kind of thing go a lot faster.

Does that make some sense?

-Sandy

On Mon, Sep 9, 2013 at 7:21 AM, John Lilley <jo...@redpoint.net>> wrote:
Do the Hadoop 2.0 YARN scheduler(s) deal with situations like the following?
Hadoop cluster of 10 nodes, with 8GB each available for containers.  There is only one queue.
Application A requests 100 4GB containers.  It initially, or after a little while, gets 20 containers.
Later, application B requests 1 8GB container.
Suppose that App-A's containers each take a few minutes.  At some point one will complete.  When that happens, will the scheduler immediately allocate another 4GB container to App-A?  If so will App-B ever get its container until App-A is almost done?
Thanks
John



Re: Scheduler question

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi John,

YARN schedulers handle this with the concept of "reservations".  Scheduling
decisions occur on node heartbeats.  When a node that is full heartbeats,
the next application that should be able to place a container on it gets to
place a "reservation" on it.  Each node has space for a single reservation.
 Containers for other applications will not be placed on the node until a
reservation is fulfilled.

If you are using the Fair Scheduler (Capacity Scheduler works similarly,
but I'm not sure on the specifics), this means that app B would get
containers far before app A completed, but not soon either.  After app A
gets its 20 containers, it would get reservations as well on the nodes.
After one of app A's containers finishes on a node, it would get to place
another container on that node to fulfill its reservation.  Then app B
would get a reservation on that node.  Then no containers would be placed
on that node until app B is able to place one, which would be after both of
app A's containers finish.

It's also possible to configure the schedulers to use preemption to make
this kind of thing go a lot faster.

Does that make some sense?

-Sandy


On Mon, Sep 9, 2013 at 7:21 AM, John Lilley <jo...@redpoint.net>wrote:

>  Do the Hadoop 2.0 YARN scheduler(s) deal with situations like the
> following?****
>
> Hadoop cluster of 10 nodes, with 8GB each available for containers.  There
> is only one queue.****
>
> Application A requests 100 4GB containers.  It initially, or after a
> little while, gets 20 containers.****
>
> Later, application B requests 1 8GB container.****
>
> Suppose that App-A’s containers each take a few minutes.  At some point
> one will complete.  When that happens, will the scheduler immediately
> allocate another 4GB container to App-A?  If so will App-B ever get its
> container until App-A is almost done?****
>
> Thanks****
>
> John****
>
> ** **
>

Re: Scheduler question

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi John,

YARN schedulers handle this with the concept of "reservations".  Scheduling
decisions occur on node heartbeats.  When a node that is full heartbeats,
the next application that should be able to place a container on it gets to
place a "reservation" on it.  Each node has space for a single reservation.
 Containers for other applications will not be placed on the node until a
reservation is fulfilled.

If you are using the Fair Scheduler (Capacity Scheduler works similarly,
but I'm not sure on the specifics), this means that app B would get
containers far before app A completed, but not soon either.  After app A
gets its 20 containers, it would get reservations as well on the nodes.
After one of app A's containers finishes on a node, it would get to place
another container on that node to fulfill its reservation.  Then app B
would get a reservation on that node.  Then no containers would be placed
on that node until app B is able to place one, which would be after both of
app A's containers finish.

It's also possible to configure the schedulers to use preemption to make
this kind of thing go a lot faster.

Does that make some sense?

-Sandy


On Mon, Sep 9, 2013 at 7:21 AM, John Lilley <jo...@redpoint.net>wrote:

>  Do the Hadoop 2.0 YARN scheduler(s) deal with situations like the
> following?****
>
> Hadoop cluster of 10 nodes, with 8GB each available for containers.  There
> is only one queue.****
>
> Application A requests 100 4GB containers.  It initially, or after a
> little while, gets 20 containers.****
>
> Later, application B requests 1 8GB container.****
>
> Suppose that App-A’s containers each take a few minutes.  At some point
> one will complete.  When that happens, will the scheduler immediately
> allocate another 4GB container to App-A?  If so will App-B ever get its
> container until App-A is almost done?****
>
> Thanks****
>
> John****
>
> ** **
>

Re: Scheduler question

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi John,

YARN schedulers handle this with the concept of "reservations".  Scheduling
decisions occur on node heartbeats.  When a node that is full heartbeats,
the next application that should be able to place a container on it gets to
place a "reservation" on it.  Each node has space for a single reservation.
 Containers for other applications will not be placed on the node until a
reservation is fulfilled.

If you are using the Fair Scheduler (Capacity Scheduler works similarly,
but I'm not sure on the specifics), this means that app B would get
containers far before app A completed, but not soon either.  After app A
gets its 20 containers, it would get reservations as well on the nodes.
After one of app A's containers finishes on a node, it would get to place
another container on that node to fulfill its reservation.  Then app B
would get a reservation on that node.  Then no containers would be placed
on that node until app B is able to place one, which would be after both of
app A's containers finish.

It's also possible to configure the schedulers to use preemption to make
this kind of thing go a lot faster.

Does that make some sense?

-Sandy


On Mon, Sep 9, 2013 at 7:21 AM, John Lilley <jo...@redpoint.net>wrote:

>  Do the Hadoop 2.0 YARN scheduler(s) deal with situations like the
> following?****
>
> Hadoop cluster of 10 nodes, with 8GB each available for containers.  There
> is only one queue.****
>
> Application A requests 100 4GB containers.  It initially, or after a
> little while, gets 20 containers.****
>
> Later, application B requests 1 8GB container.****
>
> Suppose that App-A’s containers each take a few minutes.  At some point
> one will complete.  When that happens, will the scheduler immediately
> allocate another 4GB container to App-A?  If so will App-B ever get its
> container until App-A is almost done?****
>
> Thanks****
>
> John****
>
> ** **
>

Re: Scheduler question

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi John,

YARN schedulers handle this with the concept of "reservations".  Scheduling
decisions occur on node heartbeats.  When a node that is full heartbeats,
the next application that should be able to place a container on it gets to
place a "reservation" on it.  Each node has space for a single reservation.
 Containers for other applications will not be placed on the node until a
reservation is fulfilled.

If you are using the Fair Scheduler (Capacity Scheduler works similarly,
but I'm not sure on the specifics), this means that app B would get
containers far before app A completed, but not soon either.  After app A
gets its 20 containers, it would get reservations as well on the nodes.
After one of app A's containers finishes on a node, it would get to place
another container on that node to fulfill its reservation.  Then app B
would get a reservation on that node.  Then no containers would be placed
on that node until app B is able to place one, which would be after both of
app A's containers finish.

It's also possible to configure the schedulers to use preemption to make
this kind of thing go a lot faster.

Does that make some sense?

-Sandy


On Mon, Sep 9, 2013 at 7:21 AM, John Lilley <jo...@redpoint.net>wrote:

>  Do the Hadoop 2.0 YARN scheduler(s) deal with situations like the
> following?****
>
> Hadoop cluster of 10 nodes, with 8GB each available for containers.  There
> is only one queue.****
>
> Application A requests 100 4GB containers.  It initially, or after a
> little while, gets 20 containers.****
>
> Later, application B requests 1 8GB container.****
>
> Suppose that App-A’s containers each take a few minutes.  At some point
> one will complete.  When that happens, will the scheduler immediately
> allocate another 4GB container to App-A?  If so will App-B ever get its
> container until App-A is almost done?****
>
> Thanks****
>
> John****
>
> ** **
>