You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashic Mahtab <as...@live.com> on 2014/12/03 18:59:38 UTC

Best way to have some singleton per worker



Hello,
I was wondering what the best way is to have some form of 
singleton per worker...something that'll be instantiated once at the 
start of a job for each worker, and shut down when all the work on that 
node is completed. For instance, say I have a client library that 
initiates a single session to a central server, and you're meant to 
reuse that for all little bits of work, and finally close the session 
when all the work items are done. Opening and closing sessions for each 
data item would be wasteful (and in some cases, cause havoc for the 
client).

I've been doing this with foreachPartition (i.e. have 
the parameters for creating the singleton outside the loop, do a 
foreachPartition, create the instance, loop over entries in the 
partition, close the partition), but it's quite cludgy. Is there a 
pattern by which I can have an instance of something nonserializable on 
each worker?

Regards,
Ashic.
 		 	   		  

Re: Best way to have some singleton per worker

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,

On Thu, Dec 4, 2014 at 2:59 AM, Ashic Mahtab <as...@live.com> wrote:
>
> I've been doing this with foreachPartition (i.e. have the parameters for
> creating the singleton outside the loop, do a foreachPartition, create the
> instance, loop over entries in the partition, close the partition), but
> it's quite cludgy. Is there a pattern by which I can have an instance of
> something nonserializable on each worker?
>

I think the pattern you describe is the standard way of doing this, several
people on this list (including me) have used it for database access etc.

Tobias