You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mark Hamstra (JIRA)" <ji...@apache.org> on 2014/05/01 00:35:15 UTC

[jira] [Commented] (SPARK-1620) Uncaught exception from Akka scheduler

    [ https://issues.apache.org/jira/browse/SPARK-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986175#comment-13986175 ] 

Mark Hamstra commented on SPARK-1620:
-------------------------------------

Another two instances of the problem that actually aren't a problem at the moment: In deploy.worker.Worker and deploy.client.AppClient, tryRegisterAllMasters() can throw exceptions (e.g., from Master.toAkkaUrl(masterUrl)), and those exception would go unhandled in the calls from within the Akka scheduler -- i.e. within an invocation of registerWithMaster, all but the first call to tryRegisterAllMasters.  Right now, any later call to tryRegisterAllMasters() that would throw an exception should already have thrown in the first call that occurs outside the scheduled thread, so we should never get to the problem case.  If in the future, however, that behavior would change so that tryRegisterAllMasters() could succeed on the first call but throw within the later, scheduled calls (or if code added within the scheduled retryTimer could throw an exception) then the exception thrown from the scheduler thread will not be caught. 

> Uncaught exception from Akka scheduler
> --------------------------------------
>
>                 Key: SPARK-1620
>                 URL: https://issues.apache.org/jira/browse/SPARK-1620
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: Mark Hamstra
>            Priority: Blocker
>
> I've been looking at this one in the context of a BlockManagerMaster that OOMs and doesn't respond to heartBeat(), but I suspect that there may be problems elsewhere where we use Akka's scheduler.
> The basic nature of the problem is that we are expecting exceptions thrown from a scheduled function to be caught in the thread where _ActorSystem_.scheduler.schedule() or scheduleOnce() has been called.  In fact, the scheduled function runs on its own thread, so any exceptions that it throws are not caught in the thread that called schedule() -- e.g., unanswered BlockManager heartBeats (scheduled in BlockManager#initialize) that end up throwing exceptions in BlockManagerMaster#askDriverWithReply do not cause those exceptions to be handled by the Executor thread's UncaughtExceptionHandler. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)