You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Sung Hwan Chung <co...@gmail.com> on 2016/04/06 21:24:18 UTC

Executor shutdown hooks?

Hi,

I'm looking for ways to add shutdown hooks to executors : i.e., when a Job
is forcefully terminated before it finishes.

The scenario goes likes this : executors are running a long running job
within a 'map' function. The user decides to terminate the job, then the
mappers should perform some cleanups before going offline.

What would be the best way to do this?

Re: Executor shutdown hooks?

Posted by Hemant Bhanawat <he...@gmail.com>.
As part of PR https://github.com/apache/spark/pull/11723, I have added a
killAllTasks function that can be used to kill (rather interrupt)
individual tasks before an executor exits. If this PR is accepted, for
doing task level cleanups, we can add a call to this function before
executor exits. The exit thread will wait for a certain period of time
before the executor jvm exits to allow proper cleanups of the tasks.

Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811>
www.snappydata.io

On Thu, Apr 7, 2016 at 6:08 AM, Reynold Xin <rx...@databricks.com> wrote:

>
> On Wed, Apr 6, 2016 at 4:39 PM, Sung Hwan Chung <co...@cs.stanford.edu>
> wrote:
>
>> My option so far seems to be using JVM's shutdown hook, but I was
>> wondering if Spark itself had an API for tasks.
>>
>
> Spark would be using that under the hood anyway, so you might as well just
> use the jvm shutdown hook directly.
>
>

Re: Executor shutdown hooks?

Posted by Hemant Bhanawat <he...@gmail.com>.
As part of PR https://github.com/apache/spark/pull/11723, I have added a
killAllTasks function that can be used to kill (rather interrupt)
individual tasks before an executor exits. If this PR is accepted, for
doing task level cleanups, we can add a call to this function before
executor exits. The exit thread will wait for a certain period of time
before the executor jvm exits to allow proper cleanups of the tasks.

Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811>
www.snappydata.io

On Thu, Apr 7, 2016 at 6:08 AM, Reynold Xin <rx...@databricks.com> wrote:

>
> On Wed, Apr 6, 2016 at 4:39 PM, Sung Hwan Chung <co...@cs.stanford.edu>
> wrote:
>
>> My option so far seems to be using JVM's shutdown hook, but I was
>> wondering if Spark itself had an API for tasks.
>>
>
> Spark would be using that under the hood anyway, so you might as well just
> use the jvm shutdown hook directly.
>
>

Re: Executor shutdown hooks?

Posted by Reynold Xin <rx...@databricks.com>.
On Wed, Apr 6, 2016 at 4:39 PM, Sung Hwan Chung <co...@cs.stanford.edu>
wrote:

> My option so far seems to be using JVM's shutdown hook, but I was
> wondering if Spark itself had an API for tasks.
>

Spark would be using that under the hood anyway, so you might as well just
use the jvm shutdown hook directly.

Re: Executor shutdown hooks?

Posted by Reynold Xin <rx...@databricks.com>.
On Wed, Apr 6, 2016 at 4:39 PM, Sung Hwan Chung <co...@cs.stanford.edu>
wrote:

> My option so far seems to be using JVM's shutdown hook, but I was
> wondering if Spark itself had an API for tasks.
>

Spark would be using that under the hood anyway, so you might as well just
use the jvm shutdown hook directly.

Re: Executor shutdown hooks?

Posted by Sung Hwan Chung <co...@cs.stanford.edu>.
What I meant is 'application'. I.e., when we manually terminate an
application that was submitted via spark-submit.
When we manually kill an application, it seems that individual tasks do not
receive the interruptException.

That interruptException seems to work iff we cancel the job through
sc.cancellJob or cancelAllJobs while the application is still alive.

My option so far seems to be using JVM's shutdown hook, but I was wondering
if Spark itself had an API for tasks.

On Wed, Apr 6, 2016 at 7:36 PM, Mark Hamstra <ma...@clearstorydata.com>
wrote:

> Why would the Executors shutdown when the Job is terminated?  Executors
> are bound to Applications, not Jobs.  Furthermore,
> unless spark.job.interruptOnCancel is set to true, canceling the Job at the
> Application and DAGScheduler level won't actually interrupt the Tasks
> running on the Executors.  If you do have interruptOnCancel set, then you
> can catch the interrupt exception within the Task.
>
> On Wed, Apr 6, 2016 at 12:24 PM, Sung Hwan Chung <co...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm looking for ways to add shutdown hooks to executors : i.e., when a
>> Job is forcefully terminated before it finishes.
>>
>> The scenario goes likes this : executors are running a long running job
>> within a 'map' function. The user decides to terminate the job, then the
>> mappers should perform some cleanups before going offline.
>>
>> What would be the best way to do this?
>>
>
>

Re: Executor shutdown hooks?

Posted by Sung Hwan Chung <co...@cs.stanford.edu>.
What I meant is 'application'. I.e., when we manually terminate an
application that was submitted via spark-submit.
When we manually kill an application, it seems that individual tasks do not
receive the interruptException.

That interruptException seems to work iff we cancel the job through
sc.cancellJob or cancelAllJobs while the application is still alive.

My option so far seems to be using JVM's shutdown hook, but I was wondering
if Spark itself had an API for tasks.

On Wed, Apr 6, 2016 at 7:36 PM, Mark Hamstra <ma...@clearstorydata.com>
wrote:

> Why would the Executors shutdown when the Job is terminated?  Executors
> are bound to Applications, not Jobs.  Furthermore,
> unless spark.job.interruptOnCancel is set to true, canceling the Job at the
> Application and DAGScheduler level won't actually interrupt the Tasks
> running on the Executors.  If you do have interruptOnCancel set, then you
> can catch the interrupt exception within the Task.
>
> On Wed, Apr 6, 2016 at 12:24 PM, Sung Hwan Chung <co...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm looking for ways to add shutdown hooks to executors : i.e., when a
>> Job is forcefully terminated before it finishes.
>>
>> The scenario goes likes this : executors are running a long running job
>> within a 'map' function. The user decides to terminate the job, then the
>> mappers should perform some cleanups before going offline.
>>
>> What would be the best way to do this?
>>
>
>

Re: Executor shutdown hooks?

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Why would the Executors shutdown when the Job is terminated?  Executors are
bound to Applications, not Jobs.  Furthermore,
unless spark.job.interruptOnCancel is set to true, canceling the Job at the
Application and DAGScheduler level won't actually interrupt the Tasks
running on the Executors.  If you do have interruptOnCancel set, then you
can catch the interrupt exception within the Task.

On Wed, Apr 6, 2016 at 12:24 PM, Sung Hwan Chung <co...@gmail.com> wrote:

> Hi,
>
> I'm looking for ways to add shutdown hooks to executors : i.e., when a Job
> is forcefully terminated before it finishes.
>
> The scenario goes likes this : executors are running a long running job
> within a 'map' function. The user decides to terminate the job, then the
> mappers should perform some cleanups before going offline.
>
> What would be the best way to do this?
>

Re: Executor shutdown hooks?

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Why would the Executors shutdown when the Job is terminated?  Executors are
bound to Applications, not Jobs.  Furthermore,
unless spark.job.interruptOnCancel is set to true, canceling the Job at the
Application and DAGScheduler level won't actually interrupt the Tasks
running on the Executors.  If you do have interruptOnCancel set, then you
can catch the interrupt exception within the Task.

On Wed, Apr 6, 2016 at 12:24 PM, Sung Hwan Chung <co...@gmail.com> wrote:

> Hi,
>
> I'm looking for ways to add shutdown hooks to executors : i.e., when a Job
> is forcefully terminated before it finishes.
>
> The scenario goes likes this : executors are running a long running job
> within a 'map' function. The user decides to terminate the job, then the
> mappers should perform some cleanups before going offline.
>
> What would be the best way to do this?
>