You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Walter rakoff <wa...@gmail.com> on 2016/10/26 19:26:44 UTC

Executor shutdown hook and initialization

Hello,

Is there a way I can add an init() call when an executor is created? I'd
like to initialize a few connections that are part of my singleton object.
Preferably this happens before it runs the first task
On the same line, how can I provide an shutdown hook that cleans up these
connections on termination.

Thanks
Walt

Re: Executor shutdown hook and initialization

Posted by "Chawla,Sumit " <su...@gmail.com>.
Hi Sean

Could you please elaborate on how can this be done on a per partition
basis?

Regards
Sumit Chawla


On Thu, Oct 27, 2016 at 7:44 AM, Walter rakoff <wa...@gmail.com>
wrote:

> Thanks for the info Sean.
>
> I'm initializing them in a singleton but Scala objects are evaluated
> lazily.
> So it gets initialized only when the first task is run(and makes use of
> the object).
> Plan is to start a background thread in the object that does periodic
> cache refresh too.
> I'm trying to see if this init can be done right when executor is created.
>
> Btw, this is for a Spark streaming app. So doing this per partition during
> each batch isn't ideal.
> I'd like to keep them(connect & cache) across batches.
>
> Finally, how do I setup the shutdown hook on an executor? Except for
> operations on RDD everything else is executed in the driver.
> All I can think of is something like this
>     sc.makeRDD((1 until sc.defaultParallelism), sc.defaultParallelism)
>        .foreachPartition(sys.ShutdownHookThread { Singleton.DoCleanup() }
> )
>
> Walt
>
> On Thu, Oct 27, 2016 at 3:05 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> Init is easy -- initialize them in your singleton.
>> Shutdown is harder; a shutdown hook is probably the only reliable way to
>> go.
>> Global state is not ideal in Spark. Consider initializing things like
>> connections per partition, and open/close them with the lifecycle of a
>> computation on a partition instead.
>>
>> On Wed, Oct 26, 2016 at 9:27 PM Walter rakoff <wa...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Is there a way I can add an init() call when an executor is created? I'd
>>> like to initialize a few connections that are part of my singleton object.
>>> Preferably this happens before it runs the first task
>>> On the same line, how can I provide an shutdown hook that cleans up
>>> these connections on termination.
>>>
>>> Thanks
>>> Walt
>>>
>>
>

Re: Executor shutdown hook and initialization

Posted by Sean Owen <so...@cloudera.com>.
Have a look at this ancient JIRA for a lot more discussion about this:
https://issues.apache.org/jira/browse/SPARK-650 You have exactly the same
issue described by another user. For your context, your approach is sound.

You can set a shutdown hook using the normal Java Runtime API. You may not
even need it; if your only resource is some data in memory or a daemon
thread it will take care of itself.

You can also consider rearchitecting to avoid needing global state.

Per-partition resource management is easy. You just use mapPartitions, open
reosurces per partition at the start of the function, and close them in a
finally block, and do your work on the iterator over data in between.

On Thu, Oct 27, 2016 at 3:44 PM Walter rakoff <wa...@gmail.com>
wrote:

> Thanks for the info Sean.
>
> I'm initializing them in a singleton but Scala objects are evaluated
> lazily.
> So it gets initialized only when the first task is run(and makes use of
> the object).
> Plan is to start a background thread in the object that does periodic
> cache refresh too.
> I'm trying to see if this init can be done right when executor is created.
>
> Btw, this is for a Spark streaming app. So doing this per partition during
> each batch isn't ideal.
> I'd like to keep them(connect & cache) across batches.
>
> Finally, how do I setup the shutdown hook on an executor? Except for
> operations on RDD everything else is executed in the driver.
> All I can think of is something like this
>     sc.makeRDD((1 until sc.defaultParallelism), sc.defaultParallelism)
>        .foreachPartition(sys.ShutdownHookThread { Singleton.DoCleanup() } )
>
> Walt
>
>

Re: Executor shutdown hook and initialization

Posted by Walter rakoff <wa...@gmail.com>.
Thanks for the info Sean.

I'm initializing them in a singleton but Scala objects are evaluated lazily.
So it gets initialized only when the first task is run(and makes use of the
object).
Plan is to start a background thread in the object that does periodic cache
refresh too.
I'm trying to see if this init can be done right when executor is created.

Btw, this is for a Spark streaming app. So doing this per partition during
each batch isn't ideal.
I'd like to keep them(connect & cache) across batches.

Finally, how do I setup the shutdown hook on an executor? Except for
operations on RDD everything else is executed in the driver.
All I can think of is something like this
    sc.makeRDD((1 until sc.defaultParallelism), sc.defaultParallelism)
       .foreachPartition(sys.ShutdownHookThread { Singleton.DoCleanup() } )

Walt

On Thu, Oct 27, 2016 at 3:05 AM, Sean Owen <so...@cloudera.com> wrote:

> Init is easy -- initialize them in your singleton.
> Shutdown is harder; a shutdown hook is probably the only reliable way to
> go.
> Global state is not ideal in Spark. Consider initializing things like
> connections per partition, and open/close them with the lifecycle of a
> computation on a partition instead.
>
> On Wed, Oct 26, 2016 at 9:27 PM Walter rakoff <wa...@gmail.com>
> wrote:
>
>> Hello,
>>
>> Is there a way I can add an init() call when an executor is created? I'd
>> like to initialize a few connections that are part of my singleton object.
>> Preferably this happens before it runs the first task
>> On the same line, how can I provide an shutdown hook that cleans up these
>> connections on termination.
>>
>> Thanks
>> Walt
>>
>

Re: Executor shutdown hook and initialization

Posted by Sean Owen <so...@cloudera.com>.
Init is easy -- initialize them in your singleton.
Shutdown is harder; a shutdown hook is probably the only reliable way to go.
Global state is not ideal in Spark. Consider initializing things like
connections per partition, and open/close them with the lifecycle of a
computation on a partition instead.

On Wed, Oct 26, 2016 at 9:27 PM Walter rakoff <wa...@gmail.com>
wrote:

> Hello,
>
> Is there a way I can add an init() call when an executor is created? I'd
> like to initialize a few connections that are part of my singleton object.
> Preferably this happens before it runs the first task
> On the same line, how can I provide an shutdown hook that cleans up these
> connections on termination.
>
> Thanks
> Walt
>