You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by nirandaperera <gi...@git.apache.org> on 2016/04/19 23:28:21 UTC

[GitHub] spark pull request: fixing SPARK-14736 Deadlock in registering app...

GitHub user nirandaperera opened a pull request:

    https://github.com/apache/spark/pull/12506

    fixing SPARK-14736 Deadlock in registering applications while the Mas…

    …ter is in the RECOVERING mode
    
    ## What changes were proposed in this pull request?
    
    this PR fixes the issue SPARK-14736 Deadlock in registering applications while the Master is in the RECOVERING mode. 
    Proposed solution is to keep the registering apps in a separate list when the Master is in the RECOVERING mode and once the recovery is complete, these apps will be registered back. Pls refer the JIRA for more information 
    
    ## How was this patch tested?
    
    I have tested the patch manually
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nirandaperera/spark SPARK-14736

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12506
    
----
commit 17e7949b427e89984ed8dee19ea2804503feba66
Author: niranda perera <ni...@gmail.com>
Date:   2016-04-19T21:07:23Z

    fixing SPARK-14736 Deadlock in registering applications while the Master is in the RECOVERING mode

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by nirandaperera <gi...@git.apache.org>.
Github user nirandaperera commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-213277995
  
    Great! can we get this PR merged then? Is there anything else I should do in order to get this merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12506: [SPARK-14736][core] Deadlock in registering applications...

Posted by nirandaperera <gi...@git.apache.org>.
Github user nirandaperera commented on the issue:

    https://github.com/apache/spark/pull/12506
  
    @squito hey. Thanks for the heads up. let me check on that and try to come up with a test 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-213046545
  
    No, you are right - this is called only from the event loop - which should ensure thread safety.
    I misread where the re-registeration was happening as outside of the event loop.
    Please ignore my comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12506: [SPARK-14736][core] Deadlock in registering applications...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/12506
  
    @aarondav might be the right person to look at this -- looks like he wrote most of the original code around the "RECOVERING" state way back in 2013.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-217956345
  
    **[Test build #58162 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58162/consoleFull)** for PR 12506 at commit [`17e7949`](https://github.com/apache/spark/commit/17e7949b427e89984ed8dee19ea2804503feba66).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by nirandaperera <gi...@git.apache.org>.
Github user nirandaperera commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-212841502
  
    @mridulm You are correct. But if we check the register application method, 
    
    `  private def registerApplication(app: ApplicationInfo): Unit = {
        val appAddress = app.driver.address
        if (addressToApp.contains(appAddress)) {
          logInfo("Attempted to re-register application at same address: " + appAddress)
          return
        }
    
        applicationMetricsSystem.registerSource(app.appSource)
        apps += app
        idToApp(app.id) = app
        endpointToApp(app.driver) = app
        addressToApp(appAddress) = app
        waitingApps += app
      }`
    
    there are similar array buffers which are getting updated without being synchronized. That is why I omitted making the waitingAppsWhileRecovering synchronized. 
    am I doing anything wrong there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12506: [SPARK-14736][core] Deadlock in registering applications...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/12506
  
    Are you still working on this? @nirandaperera 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-217987643
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12506: [SPARK-14736][core] Deadlock in registering appli...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12506


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: fixing SPARK-14736 Deadlock in registering app...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-212145151
  
    You will need to make this thread safe - the applications are added/re-registered from separate threads, right ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-217987644
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58162/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-217955523
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-213601332
  
    Hi @nirandaperera , I'm not too sure about the Master recovery process so I can't really comment on your code, but it would make a much stronger case for this PR if you could include a test that fails without this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-222049425
  
    @andrewor14 did you review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12506: [SPARK-14736][core] Deadlock in registering applications...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/12506
  
    Looks reasonable to me but I don't know this code very well. Also @kayousterhout maybe or @squito 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12506: [SPARK-14736][core] Deadlock in registering applications...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/12506
  
    unfortunately I'm not very knowledgeable here either.  I agree that this change looks reasonable, but also wish there was a test case for it.  I don't think there is any good integration test framework, and it seems there aren't any tests for recovery state now, so you'd have to build that out yourself.  One possibility -- "local-cluster" mode is closely related to a standalone cluster -- maybe that could be used to create a test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by nirandaperera <gi...@git.apache.org>.
Github user nirandaperera commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-212257535
  
    @mridulm do you mean to say, the method `private def registerApplication(app: ApplicationInfo): Unit = {` needs to synchronized? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: fixing SPARK-14736 Deadlock in registering app...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-212179447
  
    Maybe we should correct the title just like the others. Also, the title looks truncated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-217987382
  
    **[Test build #58162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58162/consoleFull)** for PR 12506 at commit [`17e7949`](https://github.com/apache/spark/commit/17e7949b427e89984ed8dee19ea2804503feba66).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: fixing SPARK-14736 Deadlock in registering app...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-212138821
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by nirandaperera <gi...@git.apache.org>.
Github user nirandaperera commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-214120146
  
    @BryanCutler I was looking for some unit tests which could simulate this scenario in the Master.scala class, but I couldn't find any. But i think I can reproduce this in an integration test environment. can you point me to the spark integration tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/12506#issuecomment-212546285
  
    No.
    Updates to waitingAppsWhileRecovering happens from different threads right ? Hence you will need to protect it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org