You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Romi Kuntsman <ro...@totango.com> on 2015/08/24 08:41:32 UTC

How to remove worker node but let it finish first?

Hi,
I have a spark standalone cluster with 100s of applications per day, and it
changes size (more or less workers) at various hours. The driver runs on a
separate machine outside the spark cluster.

When a job is running and it's worker is killed (because at that hour the
number of workers is reduced), it sometimes fails, instead of
redistributing the work to other workers.

How is it possible to decomission a worker, so that it doesn't receive any
new work, but does finish all existing work before shutting down?

Thanks!

Re: How to remove worker node but let it finish first?

Posted by Romi Kuntsman <ro...@totango.com>.
It's only available in Mesos?
I'm using spark standalone cluster, is there anything about it there?

On Fri, Aug 28, 2015 at 8:51 AM Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> You can create a custom mesos framework for your requirement, to get you
> started you can check this out
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/
>
> Thanks
> Best Regards
>
> On Mon, Aug 24, 2015 at 12:11 PM, Romi Kuntsman <ro...@totango.com> wrote:
>
>> Hi,
>> I have a spark standalone cluster with 100s of applications per day, and
>> it changes size (more or less workers) at various hours. The driver runs on
>> a separate machine outside the spark cluster.
>>
>> When a job is running and it's worker is killed (because at that hour the
>> number of workers is reduced), it sometimes fails, instead of
>> redistributing the work to other workers.
>>
>> How is it possible to decomission a worker, so that it doesn't receive
>> any new work, but does finish all existing work before shutting down?
>>
>> Thanks!
>>
>
>

Re: How to remove worker node but let it finish first?

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
You can create a custom mesos framework for your requirement, to get you
started you can check this out
http://mesos.apache.org/documentation/latest/app-framework-development-guide/

Thanks
Best Regards

On Mon, Aug 24, 2015 at 12:11 PM, Romi Kuntsman <ro...@totango.com> wrote:

> Hi,
> I have a spark standalone cluster with 100s of applications per day, and
> it changes size (more or less workers) at various hours. The driver runs on
> a separate machine outside the spark cluster.
>
> When a job is running and it's worker is killed (because at that hour the
> number of workers is reduced), it sometimes fails, instead of
> redistributing the work to other workers.
>
> How is it possible to decomission a worker, so that it doesn't receive any
> new work, but does finish all existing work before shutting down?
>
> Thanks!
>