You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by CodingCat <gi...@git.apache.org> on 2014/05/05 03:37:38 UTC

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

GitHub user CodingCat opened a pull request:

    https://github.com/apache/spark/pull/639

    SPARK-1686: keep schedule() calling in the main thread

    https://issues.apache.org/jira/browse/SPARK-1686
    
    moved from original JIRA (by @markhamstra):
    
    In deploy.master.Master, the completeRecovery method is the last thing to be called when a standalone Master is recovering from failure. It is responsible for resetting some state, relaunching drivers, and eventually resuming its scheduling duties.
    
    There are currently four places in Master.scala where completeRecovery is called. Three of them are from within the actor's receive method, and aren't problems. The last starts from within receive when the ElectedLeader message is received, but the actual completeRecovery() call is made from the Akka scheduler. That means that it will execute on a different scheduler thread, and Master itself will end up running (i.e., schedule() ) from that Akka scheduler thread. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/CodingCat/spark SPARK-1686

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #639
    
----
commit 02b37ca7e9256628e31cd56ba21d7e579c1233d6
Author: CodingCat <zh...@gmail.com>
Date:   2014-05-05T01:43:03Z

    keep schedule() calling in the main thread

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/639


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42707168
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42695051
  
    Is it true that the uncaught exception handler set within the ActorSystem doesn't catch exceptions in scheduled functions? I was just reading the Akka source code for this very reason, and it seems like the uncaughtExceptionHandler should be used by the scheduler: https://github.com/akka/akka/blob/master/akka-actor/src/main/scala/akka/actor/ActorSystem.scala#L676 (this threadFactory is created with the uncaughtExceptionHandler).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42156144
  
    I was thinking about this JIRA earlier, but the solution I had in mind was to change the schedule to send a "CompleteRecovery" message instead of having the completeRecovery() method send a TriggerSchedule method. Incidentally, the CompleteRecovery message already exists, because I totally meant to use it...
    
    I'm not sure I fully understand how schedule behaves in the face of restart, though. Really, we don't want CompleteRecovery coming from a prior version of the actor (i.e., between restarts). Would cancelling the timer in the postStop() method accomplish this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42727777
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42187762
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14664/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/639#discussion_r12490622
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala ---
    @@ -171,10 +177,13 @@ private[spark] class Master(
           logInfo("I have been elected leader! New state: " + state)
           if (state == RecoveryState.RECOVERING) {
             beginRecovery(storedApps, storedDrivers, storedWorkers)
    -        context.system.scheduler.scheduleOnce(WORKER_TIMEOUT millis) { completeRecovery() }
    +        recoverCallable = context.system.scheduler.scheduleOnce(WORKER_TIMEOUT millis)
    --- End diff --
    
    There's actually an alternate syntax for just sending a message, I think it's something like
    scheduleOnce(WORKER_TIMEOUT millis, self, CompleteRecovery).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42195672
  
    @CodingCat The documentation on Akka's handling of exceptions thrown from scheduled functions isn't entirely clear, so before re-opening SPARK-1620 I wrote various pieces of test code to discover exactly what the behavior is.  From what I can tell, a default uncaught exception handler set in the thread that calls scheduler.schedule won't catch exceptions from a scheduled function, but it will catch at least some exceptions thrown by Akka itself (e.g. if the scheduled function closes over a reference that becomes unreachable in the scheduled function.)  Similarly, an uncaught exception handler set within the ActorSystem (as we do in IndestructibleActorSystem) won't catch exceptions thrown by scheduled functions.
    
    What will work is an explicit try-catch in the scheduled code (which can allow the scheduled event to happen again) or setting an uncaught exception handler explicitly on the scheduled thread (which will not allow any further scheduled events after the first thrown exception.)  The latter approach is about as generalizable as we can get, I think, and is the approach I took in https://github.com/apache/spark/pull/622.  That PR could be extended to cover the scheduling of completeRecovery() as well, but maybe the fix for the rest of SPARK-1686 will make that unnecessary.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42707308
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14852/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42182222
  
    for the future update of the PR
    
    I think for this specific case, what @aarondav proposed is a pretty good, as we can keep the running of CompleteRecovery in the main thread
    
    for schedule(), I think we also need to ensure schedule is always called in the main thread


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42707178
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42184222
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42711073
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14853/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42187761
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42711071
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42707606
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42726702
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42183982
  
    I discarded the idea of doing real scheduling work in akka thread for now, as I found that it will change the current behaviour of Master...(in some places, user may expect that the scheduling work is done immediately after they registerWorker or something else)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/639#discussion_r12496483
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala ---
    @@ -104,6 +104,8 @@ private[spark] class Master(
     
       var leaderElectionAgent: ActorRef = _
     
    +  private var recoverCallable: Cancellable = _
    --- End diff --
    
    nit: Perhaps this would be better named "recoveryCompletionTask" (we use the term "task" in a couple other places in the code base)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42707590
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42731800
  
    LGTM, merging into master and branch-1.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42726700
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42726685
  
    @aarondav thanks for the comments, just addressed it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/639#discussion_r12490657
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/master/MasterMessages.scala ---
    @@ -39,4 +39,6 @@ private[master] object MasterMessages {
       case object RequestWebUIPort
     
       case class WebUIPortResponse(webUIBoundPort: Int)
    +
    +  case object TriggerSchedule
    --- End diff --
    
    I think this is no longer used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42181908
  
    @markhamstra , thank you very much for the comments and I'm sorry that I didn't explain clearly in the description of the PR, let me try to explain why I come up with this solution, 
    
    first, I went through the instances you mentioned in SPARK-1620 before I submit this patch, also I read the API document of Akka...I didn't find a general way (maybe I missed something) to capture exception in the caller thread...so I think we need to have a case-specific solution for each instance there, or we should say that we need to be careful when use Akka.scheduler.schedule() to do some timer task...but I'm not sure if I'm correct on this point, so I didn't address exception handle issue in this PR
    
    second, I'm aware of the change of the behaviour in my solution (the scheduling is called asynchronously). Actually in my opinion, it is even safe to remove the call (I guess, this line is here is just for more frequent scheduling?), because schedule() is also called in  other places (relaunchDriver()). (But now, I realized that that call of schedule() in relaunchDriver() is also a potential problem, as it's called in completeRecovery, maybe we should change that call to TriggerSchedule also) Or, we should refactor schedule() method to be just send TriggerSchedule to the actor, and update the TriggerSchedule handle to do real schedule work, in this way we can prevent the future risk of calling schedule() in another thread
    
    
    @aarondav , I think sending a completeRecovery message is simpler than what I proposed above... I guess we can capture the return value of akka.scheduler.schedule() and cancel that in postStop as you proposed....
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42695854
  
    @aarondav That's what I thought too when looking at the Akka code, and that's why I closed the JIRA for a while.  After writing some test code, though, it really didn't look to me like the uncaught exception handler was effective for the scheduled functions, so I re-opened 1620 and created https://github.com/apache/spark/pull/622.  However, I'm still not clear on just why I'm seeing the behavior (or lack thereof) that I am, so I'd certainly welcome any insights you can arrive at by looking deeper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42153755
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42153762
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42715116
  
    @markhamstra Absolutely agree.
    
    @CodingCat The test failure is unrelated, I submitted #716 to fix it. Had one last minor comment, other than that LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42155235
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14653/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42184209
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42155233
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42707307
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42705008
  
    @markhamstra  Further testing (test code [here](https://gist.github.com/aarondav/ca1f0cdcd50727f89c0d)) seems to indicate that the uncaughtExceptionHandler is **not** called in any of these cases:
    
    - Error in actor receive (causes a LocalActorRefProvider guardian failure and shuts down the system)
    - Error in scheduler (special TaskInvocation logic seems to catch this)
    - Error in future using actor's context.dispatcher (not propagated anywhere I can see)
    
    The last one was especially surprising, but further investigation indicates that the ExecutionContext in context.dispatcher is unrelated to the ThreadFactory. Custom specification of a default ExecutionContext was added in Akka 2.3.0 version (https://github.com/akka/akka/commit/4b2d98c5cbf4bccb2de34e26f009510a98f725a0).
    
    TL;DR It's pretty hard to get a custom exception handler that logs any errors in Akka-related threads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42727778
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14865/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42709359
  
    @aarondav Your test code is similar to mine, as are your conclusions.  Somebody really needs to do a systematic inventory of Akka exception handling throughout Spark, including actor restart-ability.  This PR, https://github.com/apache/spark/pull/622, https://github.com/apache/spark/pull/186 and https://github.com/apache/spark/pull/637 are a start, but I don't think that thery are the end of the matter (cf. SPARK-1769, SPARK-1771, SPARK-1772.)  Most of my Spark work right now, though, will be focused on SPARK-1021.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42154867
  
    Okay, first, I didn't start work on this JIRA myself because I haven't had the time to come to a complete understanding of the current and intended functionality, nor of the scope of the problem from running completeRecovery() on an Akka scheduler thread, so you shouldn't assume that the JIRA fully describes the problem or that solving just what it describes constitutes a complete fix.  For example, any exception currently thrown by completeRecovery() when run by the scheduler will not be caught because of SPARK-1620, so a complete fix to the use of the scheduler from `case ElectedLeader` will also have to incorporate whatever we end up doing to fix that issue.
    
    Second, what you've done in this PR significantly changes the results of calling completeRecovery(), and there is at least a minor problem in doing so.  Previously, schedule() would be called synchronously after setting RecoveryState.ALIVE and before logging the completion of recovery.  Now you're causing recovery() to be called asynchronously when the TriggerSchedule event makes its way to the head of the message queue.  That will potentially be long after RecoveryState.ALIVE has been set, the completion of recovery has been logged, and other events are processed with the new RecoveryState even though schedule() has not yet been run.  The out-of-order logging is at least a minor problem.  The processing of other events before schedule() has been run is potentially a major problem -- but like I said, I haven't yet worked through exactly how things are supposed to work, so I'm also not certain how big of a problem this PR would create.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---