You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michal Klos (JIRA)" <ji...@apache.org> on 2015/04/07 19:58:12 UTC

[jira] [Commented] (SPARK-4783) System.exit() calls in SparkContext disrupt applications embedding Spark

    [ https://issues.apache.org/jira/browse/SPARK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483625#comment-14483625 ] 

Michal Klos commented on SPARK-4783:
------------------------------------

We are running into this exact issue. We have a driver application that has other responsibilities beyond submitting spark work and we don't want it to die if there is an issue with the cluster. The cluster can be recovered or a new one can be spun up with the same DNS, and we can start fresh with a new context. But, in the meantime, we want the driver app to continue running and carry on its other business. 

Specifically for us, we are having issues with this exit:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L409-L411

We are considering patching it, but not sure whether or not this will cause other problems or if there was a good reason it hasn't been removed yet.

> System.exit() calls in SparkContext disrupt applications embedding Spark
> ------------------------------------------------------------------------
>
>                 Key: SPARK-4783
>                 URL: https://issues.apache.org/jira/browse/SPARK-4783
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: David Semeria
>
> A common architectural choice for integrating Spark within a larger application is to employ a gateway to handle Spark jobs. The gateway is a server which contains one or more long-running sparkcontexts.
> A typical server is created with the following pseudo code:
> var continue = true
> while (continue){
>  try {
>     server.run() 
>   } catch (e) {
>   continue = log_and_examine_error(e)
> }
> The problem is that sparkcontext frequently calls System.exit when it encounters a problem which means the server can only be re-spawned at the process level, which is much more messy than the simple code above.
> Therefore, I believe it makes sense to replace all System.exit calls in sparkcontext with the throwing of a fatal error. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org