You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Aniket Bhatnagar <an...@gmail.com> on 2015/01/12 14:35:07 UTC

Discussion | SparkContext 's setJobGroup and clearJobGroup should return a new instance of SparkContext

Hi spark committers

I would like to discuss the possibility of changing the signature
of SparkContext 's setJobGroup and clearJobGroup functions to return a
replica of SparkContext with the job group set/unset instead of mutating
the original context. I am building a spark job server and I am assigning
job groups before passing control to user provided logic that uses spark
context to define and execute a job (very much like job-server). The issue
is that I can't reliably know when to clear the job group as user defined
code can use futures to submit multiple tasks in parallel. In fact, I am
even allowing users to return a future from their function on which spark
server can register callbacks to know when the user defined job is
complete. Now, if I set the job group before passing control to user
function and wait on future to complete so that I can clear the job group,
I can no longer use that SparkContext for any other job. This means I will
have to lock on the SparkContext which seems like a bad idea. Therefore, my
proposal would be to return new instance of SparkContext (a replica with
just job group set/unset) that can further be used in concurrent
environment safely. I am also happy mutating the original SparkContext just
not break backward compatibility as long as the returned SparkContext is
not affected by set/unset of job groups on original SparkContext.

Thoughts please?

Thanks,
Aniket

Re: Discussion | SparkContext 's setJobGroup and clearJobGroup should return a new instance of SparkContext

Posted by Erik Erlandson <ej...@redhat.com>.
setJobGroup needs fixing:
https://issues.apache.org/jira/browse/SPARK-4514

I'm interested in any community input on what the semantics or design "ought" to be changed to.


----- Original Message -----
> Hi spark committers
> 
> I would like to discuss the possibility of changing the signature
> of SparkContext 's setJobGroup and clearJobGroup functions to return a
> replica of SparkContext with the job group set/unset instead of mutating
> the original context. I am building a spark job server and I am assigning
> job groups before passing control to user provided logic that uses spark
> context to define and execute a job (very much like job-server). The issue
> is that I can't reliably know when to clear the job group as user defined
> code can use futures to submit multiple tasks in parallel. In fact, I am
> even allowing users to return a future from their function on which spark
> server can register callbacks to know when the user defined job is
> complete. Now, if I set the job group before passing control to user
> function and wait on future to complete so that I can clear the job group,
> I can no longer use that SparkContext for any other job. This means I will
> have to lock on the SparkContext which seems like a bad idea. Therefore, my
> proposal would be to return new instance of SparkContext (a replica with
> just job group set/unset) that can further be used in concurrent
> environment safely. I am also happy mutating the original SparkContext just
> not break backward compatibility as long as the returned SparkContext is
> not affected by set/unset of job groups on original SparkContext.
> 
> Thoughts please?
> 
> Thanks,
> Aniket
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org