You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Shao, Saisai" <sa...@intel.com> on 2015/02/02 09:24:18 UTC

Questions about Spark standalone resource scheduler

Hi all,

I have some questions about the future development of Spark's standalone resource scheduler. We've heard some users have the requirements to have multi-tenant support in standalone mode, like multi-user management, resource management and isolation, whitelist of users. Seems current Spark standalone do not support such kind of functionalities, while resource schedulers like Yarn offers such kind of advanced managements, I'm not sure what's the future target of standalone resource scheduler, will it only target on simple implementation, and for advanced usage shift to YARN? Or will it plan to add some simple multi-tenant related functionalities?

Thanks a lot for your comments.

BR
Jerry

RE: Questions about Spark standalone resource scheduler

Posted by "Shao, Saisai" <sa...@intel.com>.
Hi Patrick,

Thanks a lot for your detailed explanation. For now we have such requirements: whitelist the application submitter, user resources (CPU, MEMORY) quotas, resources allocations in Spark Standalone mode. These are quite specific requirements for production-use, generally these problem will become whether we need to offer a more advanced resource scheduler compared to current simple FIFO one. I think our aim is to not provide a general resource scheduler like Mesos/Yarn, we only support Spark, but we hope to add some Mesos/Yarn functionalities to better use of Spark standalone mode.

I admitted that resource scheduler may have some overlaps with cloud manager, whether to offer a powerful scheduler or use cloud manager is really a dilemma.

I think we can break down to some small features to improve the standalone mode. What's your opinion?

Thanks
Jerry

-----Original Message-----
From: Patrick Wendell [mailto:pwendell@gmail.com] 
Sent: Monday, February 2, 2015 4:49 PM
To: Shao, Saisai
Cc: dev@spark.apache.org; user@spark.apache.org
Subject: Re: Questions about Spark standalone resource scheduler

Hey Jerry,

I think standalone mode will still add more features over time, but the goal isn't really for it to become equivalent to what Mesos/YARN are today. Or at least, I doubt Spark Standalone will ever attempt to manage _other_ frameworks outside of Spark and become a general purpose resource manager.

In terms of having better support for multi tenancy, meaning multiple
*Spark* instances, this is something I think could be in scope in the future. For instance, we added H/A to the standalone scheduler a while back, because it let us support H/A streaming apps in a totally native way. It's a trade off of adding new features and keeping the scheduler very simple and easy to use. We've tended to bias towards simplicity as the main goal, since this is something we want to be really easy "out of the box".

One thing to point out, a lot of people use the standalone mode with some coarser grained scheduler, such as running in a cloud service. In this case they really just want a simple "inner" cluster manager. This may even be the majority of all Spark installations. This is slightly different than Hadoop environments, where they might just want nice integration into the existing Hadoop stack via something like YARN.

- Patrick

On Mon, Feb 2, 2015 at 12:24 AM, Shao, Saisai <sa...@intel.com> wrote:
> Hi all,
>
>
>
> I have some questions about the future development of Spark's 
> standalone resource scheduler. We've heard some users have the 
> requirements to have multi-tenant support in standalone mode, like 
> multi-user management, resource management and isolation, whitelist of 
> users. Seems current Spark standalone do not support such kind of 
> functionalities, while resource schedulers like Yarn offers such kind 
> of advanced managements, I'm not sure what's the future target of 
> standalone resource scheduler, will it only target on simple 
> implementation, and for advanced usage shift to YARN? Or will it plan to add some simple multi-tenant related functionalities?
>
>
>
> Thanks a lot for your comments.
>
>
>
> BR
>
> Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


RE: Questions about Spark standalone resource scheduler

Posted by "Shao, Saisai" <sa...@intel.com>.
Hi Patrick,

Thanks a lot for your detailed explanation. For now we have such requirements: whitelist the application submitter, user resources (CPU, MEMORY) quotas, resources allocations in Spark Standalone mode. These are quite specific requirements for production-use, generally these problem will become whether we need to offer a more advanced resource scheduler compared to current simple FIFO one. I think our aim is to not provide a general resource scheduler like Mesos/Yarn, we only support Spark, but we hope to add some Mesos/Yarn functionalities to better use of Spark standalone mode.

I admitted that resource scheduler may have some overlaps with cloud manager, whether to offer a powerful scheduler or use cloud manager is really a dilemma.

I think we can break down to some small features to improve the standalone mode. What's your opinion?

Thanks
Jerry

-----Original Message-----
From: Patrick Wendell [mailto:pwendell@gmail.com] 
Sent: Monday, February 2, 2015 4:49 PM
To: Shao, Saisai
Cc: dev@spark.apache.org; user@spark.apache.org
Subject: Re: Questions about Spark standalone resource scheduler

Hey Jerry,

I think standalone mode will still add more features over time, but the goal isn't really for it to become equivalent to what Mesos/YARN are today. Or at least, I doubt Spark Standalone will ever attempt to manage _other_ frameworks outside of Spark and become a general purpose resource manager.

In terms of having better support for multi tenancy, meaning multiple
*Spark* instances, this is something I think could be in scope in the future. For instance, we added H/A to the standalone scheduler a while back, because it let us support H/A streaming apps in a totally native way. It's a trade off of adding new features and keeping the scheduler very simple and easy to use. We've tended to bias towards simplicity as the main goal, since this is something we want to be really easy "out of the box".

One thing to point out, a lot of people use the standalone mode with some coarser grained scheduler, such as running in a cloud service. In this case they really just want a simple "inner" cluster manager. This may even be the majority of all Spark installations. This is slightly different than Hadoop environments, where they might just want nice integration into the existing Hadoop stack via something like YARN.

- Patrick

On Mon, Feb 2, 2015 at 12:24 AM, Shao, Saisai <sa...@intel.com> wrote:
> Hi all,
>
>
>
> I have some questions about the future development of Spark's 
> standalone resource scheduler. We've heard some users have the 
> requirements to have multi-tenant support in standalone mode, like 
> multi-user management, resource management and isolation, whitelist of 
> users. Seems current Spark standalone do not support such kind of 
> functionalities, while resource schedulers like Yarn offers such kind 
> of advanced managements, I'm not sure what's the future target of 
> standalone resource scheduler, will it only target on simple 
> implementation, and for advanced usage shift to YARN? Or will it plan to add some simple multi-tenant related functionalities?
>
>
>
> Thanks a lot for your comments.
>
>
>
> BR
>
> Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Questions about Spark standalone resource scheduler

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Jerry,

I think standalone mode will still add more features over time, but
the goal isn't really for it to become equivalent to what Mesos/YARN
are today. Or at least, I doubt Spark Standalone will ever attempt to
manage _other_ frameworks outside of Spark and become a general
purpose resource manager.

In terms of having better support for multi tenancy, meaning multiple
*Spark* instances, this is something I think could be in scope in the
future. For instance, we added H/A to the standalone scheduler a while
back, because it let us support H/A streaming apps in a totally native
way. It's a trade off of adding new features and keeping the scheduler
very simple and easy to use. We've tended to bias towards simplicity
as the main goal, since this is something we want to be really easy
"out of the box".

One thing to point out, a lot of people use the standalone mode with
some coarser grained scheduler, such as running in a cloud service. In
this case they really just want a simple "inner" cluster manager. This
may even be the majority of all Spark installations. This is slightly
different than Hadoop environments, where they might just want nice
integration into the existing Hadoop stack via something like YARN.

- Patrick

On Mon, Feb 2, 2015 at 12:24 AM, Shao, Saisai <sa...@intel.com> wrote:
> Hi all,
>
>
>
> I have some questions about the future development of Spark's standalone
> resource scheduler. We've heard some users have the requirements to have
> multi-tenant support in standalone mode, like multi-user management,
> resource management and isolation, whitelist of users. Seems current Spark
> standalone do not support such kind of functionalities, while resource
> schedulers like Yarn offers such kind of advanced managements, I'm not sure
> what's the future target of standalone resource scheduler, will it only
> target on simple implementation, and for advanced usage shift to YARN? Or
> will it plan to add some simple multi-tenant related functionalities?
>
>
>
> Thanks a lot for your comments.
>
>
>
> BR
>
> Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Questions about Spark standalone resource scheduler

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Jerry,

I think standalone mode will still add more features over time, but
the goal isn't really for it to become equivalent to what Mesos/YARN
are today. Or at least, I doubt Spark Standalone will ever attempt to
manage _other_ frameworks outside of Spark and become a general
purpose resource manager.

In terms of having better support for multi tenancy, meaning multiple
*Spark* instances, this is something I think could be in scope in the
future. For instance, we added H/A to the standalone scheduler a while
back, because it let us support H/A streaming apps in a totally native
way. It's a trade off of adding new features and keeping the scheduler
very simple and easy to use. We've tended to bias towards simplicity
as the main goal, since this is something we want to be really easy
"out of the box".

One thing to point out, a lot of people use the standalone mode with
some coarser grained scheduler, such as running in a cloud service. In
this case they really just want a simple "inner" cluster manager. This
may even be the majority of all Spark installations. This is slightly
different than Hadoop environments, where they might just want nice
integration into the existing Hadoop stack via something like YARN.

- Patrick

On Mon, Feb 2, 2015 at 12:24 AM, Shao, Saisai <sa...@intel.com> wrote:
> Hi all,
>
>
>
> I have some questions about the future development of Spark's standalone
> resource scheduler. We've heard some users have the requirements to have
> multi-tenant support in standalone mode, like multi-user management,
> resource management and isolation, whitelist of users. Seems current Spark
> standalone do not support such kind of functionalities, while resource
> schedulers like Yarn offers such kind of advanced managements, I'm not sure
> what's the future target of standalone resource scheduler, will it only
> target on simple implementation, and for advanced usage shift to YARN? Or
> will it plan to add some simple multi-tenant related functionalities?
>
>
>
> Thanks a lot for your comments.
>
>
>
> BR
>
> Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org