You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Starch, Michael D (398M)" <Mi...@jpl.nasa.gov> on 2015/10/14 19:18:31 UTC

Reusing Spark Functions

All,

Is a Function object in Spark reused on a given executor, or is sent and deserialized with each new task?

On my project, we have functions that incur a very large setup cost, but then could be called many times.  Currently, I am using object deserialization to run this intensive setup,  I am wondering if this function is reused (within the context of the executor), or I am I deserializing this object over and over again for each task sent to a given worker.

Are there other ways to share objects between tasks on the same executor?

Many thanks,

Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Reusing Spark Functions

Posted by Michael Armbrust <mi...@databricks.com>.

Unless its a broadcast variable, a new copy will be deserialized for every
task.

On Wed, Oct 14, 2015 at 10:18 AM, Starch, Michael D (398M) <
Michael.D.Starch@jpl.nasa.gov> wrote:

> All,
>
> Is a Function object in Spark reused on a given executor, or is sent and
> deserialized with each new task?
>
> On my project, we have functions that incur a very large setup cost, but
> then could be called many times.  Currently, I am using object
> deserialization to run this intensive setup,  I am wondering if this
> function is reused (within the context of the executor), or I am I
> deserializing this object over and over again for each task sent to a given
> worker.
>
> Are there other ways to share objects between tasks on the same executor?
>
> Many thanks,
>
> Michael
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>