You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashic Mahtab <as...@live.com> on 2014/12/03 18:59:38 UTC
Best way to have some singleton per worker
Hello,
I was wondering what the best way is to have some form of
singleton per worker...something that'll be instantiated once at the
start of a job for each worker, and shut down when all the work on that
node is completed. For instance, say I have a client library that
initiates a single session to a central server, and you're meant to
reuse that for all little bits of work, and finally close the session
when all the work items are done. Opening and closing sessions for each
data item would be wasteful (and in some cases, cause havoc for the
client).
I've been doing this with foreachPartition (i.e. have
the parameters for creating the singleton outside the loop, do a
foreachPartition, create the instance, loop over entries in the
partition, close the partition), but it's quite cludgy. Is there a
pattern by which I can have an instance of something nonserializable on
each worker?
Regards,
Ashic.
Re: Best way to have some singleton per worker
Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,
On Thu, Dec 4, 2014 at 2:59 AM, Ashic Mahtab <as...@live.com> wrote:
>
> I've been doing this with foreachPartition (i.e. have the parameters for
> creating the singleton outside the loop, do a foreachPartition, create the
> instance, loop over entries in the partition, close the partition), but
> it's quite cludgy. Is there a pattern by which I can have an instance of
> something nonserializable on each worker?
>
I think the pattern you describe is the standard way of doing this, several
people on this list (including me) have used it for database access etc.
Tobias