You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by zenglinxi0615 <gi...@git.apache.org> on 2015/08/31 09:12:30 UTC

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

GitHub user zenglinxi0615 opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/267

    fix remoteinterpreterserver deadlock problem

    After using zeppelin server for some days, we often found that paragraph can't run while the zeppelin server process is on, restart interpreter is just not work. Then we find out that the RemoteInterpreterServer is in deadlock, and can't be stopped by zeppelin server. Unfortunately, this deadlocked RemoteInterpreterServer will still hold the hadoop resources, we need to kill the process manually each time. The detail Info shows in: https://issues.apache.org/jira/browse/ZEPPELIN-271
    We figure out that the deadlock is caused by the lack of synchronized (intpGroup) in SparkSqlInterpreter.java

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zenglinxi0615/incubator-zeppelin fix-remoteinterpreterserver-deadlock

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #267
    
----
commit 4ec953b67b287946899a2f165ab4898dbaae6df2
Author: 曾林西 <ze...@meituan.com>
Date:   2015-08-31T06:58:49Z

    fix remoteinterpreterserver deadlock problem

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136933855
  
    @Leemoonsoo 
    any suggestion about this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-137323378
  
    Thanks for merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136642793
  
    hi, moon:
     thanks for your advice. 
    Actually, add synchronized (intpGroup) in LazyOpenInterpreter.open() will block running open() concurrently in the same interpreter group, which only happens at the first time that the interpreter be opened. But moving ((LazyOpenInterpreter) p).open() outside of synchronized(intpGroup) block is a better idea, and i have already do some test to make sure this idea works.  
    Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136759969
  
    @Leemoonsoo 
    the last commit is used to fix a problem (which makes it unable to pass the UT) in previous  commit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by djoelz <gi...@git.apache.org>.
Github user djoelz commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136520481
  
    Also please add unit tests for both methods that show the thread safety. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136938492
  
    @Leemoonsoo 
    can you merge these patchs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136942517
  
    Merging if there're no more discussions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136511380
  
    Thanks for fix and procedure to reproduce the problem. 
    
    Changing SparkSqlInterpreter.getSparkInterpreter() totally make sense.
    
    But i'm not quite get change of LazyOpenInterpreter.open(). It blocks running open() concurrently in the same interpreter group. 
    
    Instead of blocking open(), if `((LazyOpenInterpreter) p).open();` can be moved to outside of `synchronized(intpGroup)` block, i guess it'll also solve the problem without restriction of blocking open() from concurrent execution. for example
    
    ```
      private SparkInterpreter getSparkInterpreter() {
        InterpreterGroup intpGroup = getInterpreterGroup();
        LazyOpenInterpreter lazy = null;
        SparkInterpreter spark = null;
        synchronized (intpGroup) {
          for (Interpreter intp : getInterpreterGroup()){
            if (intp.getClassName().equals(SparkInterpreter.class.getName())) {
              Interpreter p = intp;
              while (p instanceof WrappedInterpreter) {
                if (p instanceof LazyOpenInterpreter) {
                  lazy = (LazyOpenInterpreter) p;
                }
                if (p instanceof SparkInterpreter) {
                  spark = (SparkInterpreter) p;
                }
                p = ((WrappedInterpreter) p).getInnerInterpreter();
              }
            }
          }
        }
    
        if (lazy != null) {
          lazy.open();
        }
    
        return spark;
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-zeppelin/pull/267


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136934480
  
    @zenglinxi0615 Thanks for the fix! 
    Change looks good to me. (CI result https://travis-ci.org/apache/incubator-zeppelin/builds/78241201)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...

Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136373818
  
    I found a way to make RemoteInterpreterServer deadlock recurrently:
    (1)Patch these codes and recompile zeppelin:
    ```
    diff --git a/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java b/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
    index 852dd33..17e877a 100644
    --- a/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
    +++ b/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
    @@ -345,6 +345,11 @@ public class PySparkInterpreter extends Interpreter implements ExecuteResultHand
       private SparkInterpreter getSparkInterpreter() {
         InterpreterGroup intpGroup = getInterpreterGroup();
         synchronized (intpGroup) {
    +      try {
    +        Thread.sleep(20000);
    +      } catch (InterruptedException e) {
    +        e.printStackTrace();
    +      }
           for (Interpreter intp : getInterpreterGroup()){
             if (intp.getClassName().equals(SparkInterpreter.class.getName())) {
               Interpreter p = intp;
    diff --git a/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java b/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java
    index a4ff494..7b191aa 100644
    --- a/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java
    +++ b/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java
    @@ -329,6 +329,11 @@ public class SparkInterpreter extends Interpreter {
     
       @Override
       public void open() {
    +    try {
    +      Thread.sleep(20000);
    +    } catch (InterruptedException e) {
    +      e.printStackTrace();
    +    }
         URL[] urls = getClassloaderUrls();
     
         // Very nice discussion about how scala compiler handle classpath
    ```
    
    (2)set zeppelin.spark.concurrentSQL=true
    
    (3)restart zeppeplin server
    
    (4)run %pyspark and %sql paragraph in the same time (a few seconds Interval is just ok)
    
    (5)monitor RemoteInterpreterServer with jstack, then you will get a deadlock


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---