You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Shulei Zheng (Jira)" <ji...@apache.org> on 2020/11/21 07:04:00 UTC
[jira] [Created] (ZEPPELIN-5140) After Spark Interpreter timeout,
there will be no progress when the paragraph rerun again
Shulei Zheng created ZEPPELIN-5140:
--------------------------------------
Summary: After Spark Interpreter timeout, there will be no progress when the paragraph rerun again
Key: ZEPPELIN-5140
URL: https://issues.apache.org/jira/browse/ZEPPELIN-5140
Project: Zeppelin
Issue Type: Bug
Components: zeppelin-interpreter, zeppelin-zengine
Affects Versions: 0.9.0
Environment: zeppelin-0.9.0-SNAPSHOT build from the Master
Spark-2.4
Reporter: Shulei Zheng
h1. Step1:
set
{code:java}
zeppelin.interpreter.lifecyclemanager.class = org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager
zeppelin.interpreter.lifecyclemanager.timeout.threshold = 300000{code}
Now It works well, the paragraph bound with Spark Interpreter running well while the _*Progressbar*_ show the percentage .
h1. Step2:
After 5 minutes later, rerun the same paragraph. This time the paragraph's status is *PENDING* all the time and the _*Progressbar*_ is missing.
h1. The reason of this issue:
# When RemoteInterpreter expired, _*TimeoutLifecycleManager*_ will call *_RemoteInterpreterEventServer.unRegisterInterpreterProcess_* which only removes the _*RemoteInterpreterGroup*_ without close it.
# When the paragraph runs again, one new *_RemoteInterpreterGroup_* is instanced which asks the _*SchedulerFactory*_ for one _*RemoteScheduler*_ to submit the paragraph.
# _*SchedulerFactory always*_ find existed _*RemoteScheduler*_, so the previous _*RemoteScheduler*_ which hold the old** _*RemoteInterpreter*_ returned*_._*
# The *_JobStatusPoller_* which *__* started by the *__*RemoteScheduler*__* uses the old *__*RemoteInterpreter*__* to get status, thus an exception were thrown and is fails.
How to Fix :
The way to fix is simple, just add the following code to the *_RemoteInterpreterEventServer.unRegisterInterpreterProcess:_*
{code:java}
// Close RemoteInterpreter when RemoteInterpreterServer already timeout.
// Otherwise the ProgressBar will be missing when rerun after the RemoteInterpreterServer timeout and old RemoteInterpreterGroups will alway alive after GC
interpreterGroup.close();{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)