You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Jun Feng Liu <li...@cn.ibm.com> on 2014/08/08 04:50:45 UTC

Re: Fine-Grained Scheduler on Yarn

Any one know the answer?
 
Best Regards
 
Jun Feng Liu
IBM China Systems & Technology Laboratory in Beijing



Phone: 86-10-82452683 
E-mail: liujunf@cn.ibm.com


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

 



Jun Feng Liu/China/IBM 
2014/08/07 15:37

To
dev@spark.apache.org, 
cc

Subject
Fine-Grained Scheduler on Yarn





Hi, there

Just aware right now Spark only support fine grained scheduler on Mesos 
with MesosSchedulerBackend. The Yarn schedule sounds like only works on 
coarse-grained model. Is there any plan to implement fine-grained 
scheduler for YARN? Or there is any technical issue block us to do that.
 
Best Regards
 
Jun Feng Liu
IBM China Systems & Technology Laboratory in Beijing



Phone: 86-10-82452683 
E-mail: liujunf@cn.ibm.com


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

 

Re: Fine-Grained Scheduler on Yarn

Posted by Sandy Ryza <sa...@cloudera.com>.
I think that would be useful work.  I don't know the minute details of this
code, but in general TaskSchedulerImpl keeps track of pending tasks.  Tasks
are organized into TaskSets, each of which corresponds to a particular
stage.  Each TaskSet has a TaskSetManager, which directly tracks the
pending tasks for that stage.

-Sandy


On Fri, Aug 8, 2014 at 12:37 AM, Jun Feng Liu <li...@cn.ibm.com> wrote:

> Yes, I think we need both level resource control (container numbers and
> dynamically change container resources), which can make the resource
> utilization much more effective, especially when we have more types work
> load share the same infrastructure.
>
> Is there anyway I can observe the tasks backlog in schedulerbackend?
> Sounds like scheduler backend be triggered during new taskset submitted. I
> did not figured if there is a way to check the whole backlog tasks inside
> it. I am interesting to implement some policy in schedulerbackend and test
> to see how useful it is going to be.
>
> Best Regards
>
>
> *Jun Feng Liu*
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  [image: 2D barcode - encoded with contact information] *Phone: *86-10-82452683
>
> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
> [image: IBM]
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>
>  *Sandy Ryza <sandy.ryza@cloudera.com <sa...@cloudera.com>>*
>
> 2014/08/08 15:14
>   To
> Jun Feng Liu/China/IBM@IBMCN,
> cc
> Patrick Wendell <pw...@gmail.com>, "dev@spark.apache.org" <
> dev@spark.apache.org>
> Subject
> Re: Fine-Grained Scheduler on Yarn
>
>
>
>
> Hi Jun,
>
> Spark currently doesn't have that feature, i.e. it aims for a fixed number
> of executors per application regardless of resource usage, but it's
> definitely worth considering.  We could start more executors when we have a
> large backlog of tasks and shut some down when we're underutilized.
>
> The fine-grained task scheduling is blocked on work from YARN that will
> allow changing the CPU allocation of a YARN container dynamically.  The
> relevant JIRA for this dependency is YARN-1197, though YARN-1488 might
> serve this purpose as well if it comes first.
>
> -Sandy
>
>
> On Thu, Aug 7, 2014 at 10:56 PM, Jun Feng Liu <li...@cn.ibm.com> wrote:
>
> > Thanks for echo on this. Possible to adjust resource based on container
> > numbers? e.g to allocate more container when driver need more resources
> and
> > return some resource by delete some container when parts of container
> > already have enough cores/memory
> >
> > Best Regards
> >
> >
> > *Jun Feng Liu*
>
> >
> > IBM China Systems & Technology Laboratory in Beijing
> >
> >   ------------------------------
>
> >  [image: 2D barcode - encoded with contact information]
> > *Phone: *86-10-82452683
> > * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
>
> > [image: IBM]
> >
> > BLD 28,ZGC Software Park
> > No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> > China
> >
> >
> >
> >
> >
> >  *Patrick Wendell <pwendell@gmail.com <pw...@gmail.com>>*
>
> >
> > 2014/08/08 13:10
> >   To
> > Jun Feng Liu/China/IBM@IBMCN,
> > cc
> > "dev@spark.apache.org" <de...@spark.apache.org>
> > Subject
> > Re: Fine-Grained Scheduler on Yarn
> >
> >
> >
> >
> > Hey sorry about that - what I said was the opposite of what is true.
> >
> > The current YARN mode is equivalent to "coarse grained" mesos. There is
> no
> > fine-grained scheduling on YARN at the moment. I'm not sure YARN supports
> > scheduling in units other than containers. Fine-grained scheduling
> requires
> > scheduling at the granularity of individual cores.
> >
> >
> > On Thu, Aug 7, 2014 at 9:43 PM, Patrick Wendell <*pwendell@gmail.com*
>
> > <pw...@gmail.com>> wrote:
> > The current YARN is equivalent to what is called "fine grained" mode in
> > Mesos. The scheduling of tasks happens totally inside of the Spark
> driver.
> >
> >
> > On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu <*liujunf@cn.ibm.com*
>
> > <li...@cn.ibm.com>> wrote:
> > Any one know the answer?
> > Best Regards
> >
> >
> > * Jun Feng Liu*
>
> >
> > IBM China Systems & Technology Laboratory in Beijing
> >
> >   ------------------------------
> >  *Phone: *86-10-82452683
> > * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
>
> >
> >
> > BLD 28,ZGC Software Park
> > No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> > China
> >
> >
> >
> >
> >   *Jun Feng Liu/China/IBM*
> >
> > 2014/08/07 15:37
> >
> >   To
> > *dev@spark.apache.org* <de...@spark.apache.org>,
>
> > cc
> >   Subject
> > Fine-Grained Scheduler on Yarn
> >
> >
> >
> >
> >
> > Hi, there
> >
> > Just aware right now Spark only support fine grained scheduler on Mesos
> > with MesosSchedulerBackend. The Yarn schedule sounds like only works on
> > coarse-grained model. Is there any plan to implement fine-grained
> scheduler
> > for YARN? Or there is any technical issue block us to do that.
> >
> > Best Regards
> >
> >
> > * Jun Feng Liu*
>
> >
> > IBM China Systems & Technology Laboratory in Beijing
> >
> >   ------------------------------
> >  *Phone: *86-10-82452683
> > * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
>
> >
> >
> > BLD 28,ZGC Software Park
> > No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> > China
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Fine-Grained Scheduler on Yarn

Posted by Jun Feng Liu <li...@cn.ibm.com>.
Yes, I think we need both level resource control (container numbers and 
dynamically change container resources), which can make the resource 
utilization much more effective, especially when we have more types work 
load share the same infrastructure. 

Is there anyway I can observe the tasks backlog in schedulerbackend? 
Sounds like scheduler backend be triggered during new taskset submitted. I 
did not figured if there is a way to check the whole backlog tasks inside 
it. I am interesting to implement some policy in schedulerbackend and test 
to see how useful it is going to be.
 
Best Regards
 
Jun Feng Liu
IBM China Systems & Technology Laboratory in Beijing



Phone: 86-10-82452683 
E-mail: liujunf@cn.ibm.com


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

 



Sandy Ryza <sa...@cloudera.com> 
2014/08/08 15:14

To
Jun Feng Liu/China/IBM@IBMCN, 
cc
Patrick Wendell <pw...@gmail.com>, "dev@spark.apache.org" 
<de...@spark.apache.org>
Subject
Re: Fine-Grained Scheduler on Yarn






Hi Jun,

Spark currently doesn't have that feature, i.e. it aims for a fixed number
of executors per application regardless of resource usage, but it's
definitely worth considering.  We could start more executors when we have 
a
large backlog of tasks and shut some down when we're underutilized.

The fine-grained task scheduling is blocked on work from YARN that will
allow changing the CPU allocation of a YARN container dynamically.  The
relevant JIRA for this dependency is YARN-1197, though YARN-1488 might
serve this purpose as well if it comes first.

-Sandy


On Thu, Aug 7, 2014 at 10:56 PM, Jun Feng Liu <li...@cn.ibm.com> wrote:

> Thanks for echo on this. Possible to adjust resource based on container
> numbers? e.g to allocate more container when driver need more resources 
and
> return some resource by delete some container when parts of container
> already have enough cores/memory
>
> Best Regards
>
>
> *Jun Feng Liu*
>
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  [image: 2D barcode - encoded with contact information]
> *Phone: *86-10-82452683
> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
> [image: IBM]
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>
>  *Patrick Wendell <pwendell@gmail.com <pw...@gmail.com>>*
>
> 2014/08/08 13:10
>   To
> Jun Feng Liu/China/IBM@IBMCN,
> cc
> "dev@spark.apache.org" <de...@spark.apache.org>
> Subject
> Re: Fine-Grained Scheduler on Yarn
>
>
>
>
> Hey sorry about that - what I said was the opposite of what is true.
>
> The current YARN mode is equivalent to "coarse grained" mesos. There is 
no
> fine-grained scheduling on YARN at the moment. I'm not sure YARN 
supports
> scheduling in units other than containers. Fine-grained scheduling 
requires
> scheduling at the granularity of individual cores.
>
>
> On Thu, Aug 7, 2014 at 9:43 PM, Patrick Wendell <*pwendell@gmail.com*
> <pw...@gmail.com>> wrote:
> The current YARN is equivalent to what is called "fine grained" mode in
> Mesos. The scheduling of tasks happens totally inside of the Spark 
driver.
>
>
> On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu <*liujunf@cn.ibm.com*
> <li...@cn.ibm.com>> wrote:
> Any one know the answer?
> Best Regards
>
>
> * Jun Feng Liu*
>
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  *Phone: *86-10-82452683
> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
>
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>   *Jun Feng Liu/China/IBM*
>
> 2014/08/07 15:37
>
>   To
> *dev@spark.apache.org* <de...@spark.apache.org>,
> cc
>   Subject
> Fine-Grained Scheduler on Yarn
>
>
>
>
>
> Hi, there
>
> Just aware right now Spark only support fine grained scheduler on Mesos
> with MesosSchedulerBackend. The Yarn schedule sounds like only works on
> coarse-grained model. Is there any plan to implement fine-grained 
scheduler
> for YARN? Or there is any technical issue block us to do that.
>
> Best Regards
>
>
> * Jun Feng Liu*
>
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  *Phone: *86-10-82452683
> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
>
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>
>
>


Re: Fine-Grained Scheduler on Yarn

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Jun,

Spark currently doesn't have that feature, i.e. it aims for a fixed number
of executors per application regardless of resource usage, but it's
definitely worth considering.  We could start more executors when we have a
large backlog of tasks and shut some down when we're underutilized.

The fine-grained task scheduling is blocked on work from YARN that will
allow changing the CPU allocation of a YARN container dynamically.  The
relevant JIRA for this dependency is YARN-1197, though YARN-1488 might
serve this purpose as well if it comes first.

-Sandy


On Thu, Aug 7, 2014 at 10:56 PM, Jun Feng Liu <li...@cn.ibm.com> wrote:

> Thanks for echo on this. Possible to adjust resource based on container
> numbers? e.g to allocate more container when driver need more resources and
> return some resource by delete some container when parts of container
> already have enough cores/memory
>
> Best Regards
>
>
> *Jun Feng Liu*
>
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  [image: 2D barcode - encoded with contact information]
> *Phone: *86-10-82452683
> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
> [image: IBM]
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>
>  *Patrick Wendell <pwendell@gmail.com <pw...@gmail.com>>*
>
> 2014/08/08 13:10
>   To
> Jun Feng Liu/China/IBM@IBMCN,
> cc
> "dev@spark.apache.org" <de...@spark.apache.org>
> Subject
> Re: Fine-Grained Scheduler on Yarn
>
>
>
>
> Hey sorry about that - what I said was the opposite of what is true.
>
> The current YARN mode is equivalent to "coarse grained" mesos. There is no
> fine-grained scheduling on YARN at the moment. I'm not sure YARN supports
> scheduling in units other than containers. Fine-grained scheduling requires
> scheduling at the granularity of individual cores.
>
>
> On Thu, Aug 7, 2014 at 9:43 PM, Patrick Wendell <*pwendell@gmail.com*
> <pw...@gmail.com>> wrote:
> The current YARN is equivalent to what is called "fine grained" mode in
> Mesos. The scheduling of tasks happens totally inside of the Spark driver.
>
>
> On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu <*liujunf@cn.ibm.com*
> <li...@cn.ibm.com>> wrote:
> Any one know the answer?
> Best Regards
>
>
> * Jun Feng Liu*
>
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  *Phone: *86-10-82452683
> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
>
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>   *Jun Feng Liu/China/IBM*
>
> 2014/08/07 15:37
>
>   To
> *dev@spark.apache.org* <de...@spark.apache.org>,
> cc
>   Subject
> Fine-Grained Scheduler on Yarn
>
>
>
>
>
> Hi, there
>
> Just aware right now Spark only support fine grained scheduler on Mesos
> with MesosSchedulerBackend. The Yarn schedule sounds like only works on
> coarse-grained model. Is there any plan to implement fine-grained scheduler
> for YARN? Or there is any technical issue block us to do that.
>
> Best Regards
>
>
> * Jun Feng Liu*
>
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  *Phone: *86-10-82452683
> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
>
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>
>
>

Re: Fine-Grained Scheduler on Yarn

Posted by Jun Feng Liu <li...@cn.ibm.com>.
Thanks for echo on this. Possible to adjust resource based on container 
numbers? e.g to allocate more container when driver need more resources 
and return some resource by delete some container when parts of container 
already have enough cores/memory 
 
Best Regards
 
Jun Feng Liu
IBM China Systems & Technology Laboratory in Beijing



Phone: 86-10-82452683 
E-mail: liujunf@cn.ibm.com


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

 



Patrick Wendell <pw...@gmail.com> 
2014/08/08 13:10

To
Jun Feng Liu/China/IBM@IBMCN, 
cc
"dev@spark.apache.org" <de...@spark.apache.org>
Subject
Re: Fine-Grained Scheduler on Yarn






Hey sorry about that - what I said was the opposite of what is true.

The current YARN mode is equivalent to "coarse grained" mesos. There is no 
fine-grained scheduling on YARN at the moment. I'm not sure YARN supports 
scheduling in units other than containers. Fine-grained scheduling 
requires scheduling at the granularity of individual cores.


On Thu, Aug 7, 2014 at 9:43 PM, Patrick Wendell <pw...@gmail.com> 
wrote:
The current YARN is equivalent to what is called "fine grained" mode in 
Mesos. The scheduling of tasks happens totally inside of the Spark driver.


On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu <li...@cn.ibm.com> wrote:
Any one know the answer?
Best Regards 
  
Jun Feng Liu
IBM China Systems & Technology Laboratory in Beijing 



Phone: 86-10-82452683 
E-mail: liujunf@cn.ibm.com 


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

  



Jun Feng Liu/China/IBM 
2014/08/07 15:37 


To
dev@spark.apache.org, 
cc

Subject
Fine-Grained Scheduler on Yarn







Hi, there 

Just aware right now Spark only support fine grained scheduler on Mesos 
with MesosSchedulerBackend. The Yarn schedule sounds like only works on 
coarse-grained model. Is there any plan to implement fine-grained 
scheduler for YARN? Or there is any technical issue block us to do that.
Best Regards 
  
Jun Feng Liu
IBM China Systems & Technology Laboratory in Beijing 



Phone: 86-10-82452683 
E-mail: liujunf@cn.ibm.com 


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

  



Re: Fine-Grained Scheduler on Yarn

Posted by Patrick Wendell <pw...@gmail.com>.
Hey sorry about that - what I said was the opposite of what is true.

The current YARN mode is equivalent to "coarse grained" mesos. There is no
fine-grained scheduling on YARN at the moment. I'm not sure YARN supports
scheduling in units other than containers. Fine-grained scheduling requires
scheduling at the granularity of individual cores.


On Thu, Aug 7, 2014 at 9:43 PM, Patrick Wendell <pw...@gmail.com> wrote:

> The current YARN is equivalent to what is called "fine grained" mode in
> Mesos. The scheduling of tasks happens totally inside of the Spark driver.
>
>
> On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu <li...@cn.ibm.com> wrote:
>
>> Any one know the answer?
>>
>> Best Regards
>>
>>
>> *Jun Feng Liu*
>> IBM China Systems & Technology Laboratory in Beijing
>>
>>   ------------------------------
>>  [image: 2D barcode - encoded with contact information] *Phone: *86-10-82452683
>>
>> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
>> [image: IBM]
>>
>> BLD 28,ZGC Software Park
>> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
>> China
>>
>>
>>
>>
>>
>>  *Jun Feng Liu/China/IBM*
>>
>> 2014/08/07 15:37
>>   To
>> dev@spark.apache.org,
>> cc
>>   Subject
>> Fine-Grained Scheduler on Yarn
>>
>>
>>
>> Hi, there
>>
>> Just aware right now Spark only support fine grained scheduler on Mesos
>> with MesosSchedulerBackend. The Yarn schedule sounds like only works on
>> coarse-grained model. Is there any plan to implement fine-grained scheduler
>> for YARN? Or there is any technical issue block us to do that.
>>
>> Best Regards
>>
>>
>> *Jun Feng Liu*
>> IBM China Systems & Technology Laboratory in Beijing
>>
>>   ------------------------------
>>  [image: 2D barcode - encoded with contact information] *Phone: *86-10-82452683
>>
>> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
>> [image: IBM]
>>
>> BLD 28,ZGC Software Park
>> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
>> China
>>
>>
>>
>>
>>
>

Re: Fine-Grained Scheduler on Yarn

Posted by Patrick Wendell <pw...@gmail.com>.
The current YARN is equivalent to what is called "fine grained" mode in
Mesos. The scheduling of tasks happens totally inside of the Spark driver.


On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu <li...@cn.ibm.com> wrote:

> Any one know the answer?
>
> Best Regards
>
>
> *Jun Feng Liu*
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  [image: 2D barcode - encoded with contact information] *Phone: *86-10-82452683
>
> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
> [image: IBM]
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>
>  *Jun Feng Liu/China/IBM*
>
> 2014/08/07 15:37
>   To
> dev@spark.apache.org,
> cc
>   Subject
> Fine-Grained Scheduler on Yarn
>
>
>
> Hi, there
>
> Just aware right now Spark only support fine grained scheduler on Mesos
> with MesosSchedulerBackend. The Yarn schedule sounds like only works on
> coarse-grained model. Is there any plan to implement fine-grained scheduler
> for YARN? Or there is any technical issue block us to do that.
>
> Best Regards
>
>
> *Jun Feng Liu*
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  [image: 2D barcode - encoded with contact information] *Phone: *86-10-82452683
>
> * E-mail:* *liujunf@cn.ibm.com* <li...@cn.ibm.com>
> [image: IBM]
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>