You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by zenglinxi0615 <gi...@git.apache.org> on 2015/08/31 09:12:30 UTC
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
GitHub user zenglinxi0615 opened a pull request:
https://github.com/apache/incubator-zeppelin/pull/267
fix remoteinterpreterserver deadlock problem
After using zeppelin server for some days, we often found that paragraph can't run while the zeppelin server process is on, restart interpreter is just not work. Then we find out that the RemoteInterpreterServer is in deadlock, and can't be stopped by zeppelin server. Unfortunately, this deadlocked RemoteInterpreterServer will still hold the hadoop resources, we need to kill the process manually each time. The detail Info shows in: https://issues.apache.org/jira/browse/ZEPPELIN-271
We figure out that the deadlock is caused by the lack of synchronized (intpGroup) in SparkSqlInterpreter.java
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zenglinxi0615/incubator-zeppelin fix-remoteinterpreterserver-deadlock
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-zeppelin/pull/267.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #267
----
commit 4ec953b67b287946899a2f165ab4898dbaae6df2
Author: 曾林西 <ze...@meituan.com>
Date: 2015-08-31T06:58:49Z
fix remoteinterpreterserver deadlock problem
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136933855
@Leemoonsoo
any suggestion about this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-137323378
Thanks for merging.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136642793
hi, moon:
thanks for your advice.
Actually, add synchronized (intpGroup) in LazyOpenInterpreter.open() will block running open() concurrently in the same interpreter group, which only happens at the first time that the interpreter be opened. But moving ((LazyOpenInterpreter) p).open() outside of synchronized(intpGroup) block is a better idea, and i have already do some test to make sure this idea works.
Thanks a lot!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136759969
@Leemoonsoo
the last commit is used to fix a problem (which makes it unable to pass the UT) in previous commit
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by djoelz <gi...@git.apache.org>.
Github user djoelz commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136520481
Also please add unit tests for both methods that show the thread safety. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136938492
@Leemoonsoo
can you merge these patchs?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136942517
Merging if there're no more discussions.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136511380
Thanks for fix and procedure to reproduce the problem.
Changing SparkSqlInterpreter.getSparkInterpreter() totally make sense.
But i'm not quite get change of LazyOpenInterpreter.open(). It blocks running open() concurrently in the same interpreter group.
Instead of blocking open(), if `((LazyOpenInterpreter) p).open();` can be moved to outside of `synchronized(intpGroup)` block, i guess it'll also solve the problem without restriction of blocking open() from concurrent execution. for example
```
private SparkInterpreter getSparkInterpreter() {
InterpreterGroup intpGroup = getInterpreterGroup();
LazyOpenInterpreter lazy = null;
SparkInterpreter spark = null;
synchronized (intpGroup) {
for (Interpreter intp : getInterpreterGroup()){
if (intp.getClassName().equals(SparkInterpreter.class.getName())) {
Interpreter p = intp;
while (p instanceof WrappedInterpreter) {
if (p instanceof LazyOpenInterpreter) {
lazy = (LazyOpenInterpreter) p;
}
if (p instanceof SparkInterpreter) {
spark = (SparkInterpreter) p;
}
p = ((WrappedInterpreter) p).getInnerInterpreter();
}
}
}
}
if (lazy != null) {
lazy.open();
}
return spark;
}
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-zeppelin/pull/267
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136934480
@zenglinxi0615 Thanks for the fix!
Change looks good to me. (CI result https://travis-ci.org/apache/incubator-zeppelin/builds/78241201)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-zeppelin pull request: fix remoteinterpreterserver deadl...
Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/267#issuecomment-136373818
I found a way to make RemoteInterpreterServer deadlock recurrently:
(1)Patch these codes and recompile zeppelin:
```
diff --git a/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java b/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
index 852dd33..17e877a 100644
--- a/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
+++ b/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
@@ -345,6 +345,11 @@ public class PySparkInterpreter extends Interpreter implements ExecuteResultHand
private SparkInterpreter getSparkInterpreter() {
InterpreterGroup intpGroup = getInterpreterGroup();
synchronized (intpGroup) {
+ try {
+ Thread.sleep(20000);
+ } catch (InterruptedException e) {
+ e.printStackTrace();
+ }
for (Interpreter intp : getInterpreterGroup()){
if (intp.getClassName().equals(SparkInterpreter.class.getName())) {
Interpreter p = intp;
diff --git a/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java b/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java
index a4ff494..7b191aa 100644
--- a/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java
+++ b/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java
@@ -329,6 +329,11 @@ public class SparkInterpreter extends Interpreter {
@Override
public void open() {
+ try {
+ Thread.sleep(20000);
+ } catch (InterruptedException e) {
+ e.printStackTrace();
+ }
URL[] urls = getClassloaderUrls();
// Very nice discussion about how scala compiler handle classpath
```
(2)set zeppelin.spark.concurrentSQL=true
(3)restart zeppeplin server
(4)run %pyspark and %sql paragraph in the same time (a few seconds Interval is just ok)
(5)monitor RemoteInterpreterServer with jstack, then you will get a deadlock
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---