You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Jeff Tsang (Jira)" <ji...@apache.org> on 2020/08/07 04:29:00 UTC

[jira] [Created] (ZEPPELIN-4986) org.apache.zeppelin.server.ZeppelinServer thread won't be released

Jeff Tsang created ZEPPELIN-4986:
------------------------------------

             Summary: org.apache.zeppelin.server.ZeppelinServer thread won't be released
                 Key: ZEPPELIN-4986
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4986
             Project: Zeppelin
          Issue Type: Bug
    Affects Versions: 0.9.0
            Reporter: Jeff Tsang
         Attachments: image-2020-08-07-12-19-18-212.png

I created 50 notebooks with each contains 4 paragraphs, and have a batch job calling API to async run all paragraphs for every 10 minutes.  The zeppelin runs with the docker images released at end of July (digest: 58568bd6f10e, source commit: fe8fe9be7487791dc21094dd3cbef1d9190662cc)

 

One day the server is totally malfunctioning and the root cause is that there are too many lived processes and exceeed the max limit of Linux PID.   After the server is recoverd, I monitor the process usage with "ps -eLfl" command, and found everytime the batch job is triggered, Zeppelin will create 50+ threads to run paragraphs.   These threads will turn into sleep state and still occupy PID numbers even when the running jobs are done.

Here's part of the result of the ps command, and can see they all have same parent PID but with different LWP (thread ID).   And all threads run a java application org.apache.zeppelin.server.ZeppelinServer. !image-2020-08-07-12-19-18-212.png|width=1270,height=480!

Because these threads can be removed when the zeppelin is restarted, my current workaround is to restart the zeppelin container periodically to prevent the PID number exceed the max value.  But still looking for a long-term solution to solve this issue.   Or is there any method to remove these sleeping threads?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)