You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Shulei Zheng (Jira)" <ji...@apache.org> on 2020/11/21 07:04:00 UTC

[jira] [Created] (ZEPPELIN-5140) After Spark Interpreter timeout, there will be no progress when the paragraph rerun again

Shulei Zheng created ZEPPELIN-5140:
--------------------------------------

             Summary: After Spark Interpreter timeout, there will be no progress when the paragraph rerun again 
                 Key: ZEPPELIN-5140
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5140
             Project: Zeppelin
          Issue Type: Bug
          Components: zeppelin-interpreter, zeppelin-zengine
    Affects Versions: 0.9.0
         Environment: zeppelin-0.9.0-SNAPSHOT build from the Master

Spark-2.4
            Reporter: Shulei Zheng


h1. Step1:

set

 
{code:java}
zeppelin.interpreter.lifecyclemanager.class = org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager
zeppelin.interpreter.lifecyclemanager.timeout.threshold	 = 300000{code}
Now It works well, the paragraph bound with Spark Interpreter running well while the _*Progressbar*_ show the percentage .
h1. Step2:

After 5 minutes later, rerun the same paragraph. This time the paragraph's status is *PENDING* all the time and the _*Progressbar*_ is missing.

 
h1. The reason of this issue:
 # When RemoteInterpreter expired, _*TimeoutLifecycleManager*_ will call *_RemoteInterpreterEventServer.unRegisterInterpreterProcess_* which only removes the _*RemoteInterpreterGroup*_ without close it.
 # When the paragraph runs again, one new *_RemoteInterpreterGroup_* is instanced which asks the _*SchedulerFactory*_ for one _*RemoteScheduler*_ to submit the paragraph.
 # _*SchedulerFactory always*_ find existed _*RemoteScheduler*_, so the previous _*RemoteScheduler*_ which hold the old** _*RemoteInterpreter*_ returned*_._*
 # The *_JobStatusPoller_* which *__* started by the *__*RemoteScheduler*__* uses the old *__*RemoteInterpreter*__* to get status, thus an exception were thrown and is fails.

 


How to Fix :

The way to fix is simple, just add the following code to the *_RemoteInterpreterEventServer.unRegisterInterpreterProcess:_*
{code:java}
// Close RemoteInterpreter when RemoteInterpreterServer already timeout. 
// Otherwise the ProgressBar will be missing when rerun after the RemoteInterpreterServer timeout and old RemoteInterpreterGroups will alway alive after GC
interpreterGroup.close();{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)