You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Paul Brenner (Jira)" <ji...@apache.org> on 2020/02/07 17:24:00 UTC

[jira] [Created] (ZEPPELIN-4599) Zeppelin becomes unresponsive and can only be recovered by restart

Paul Brenner created ZEPPELIN-4599:
--------------------------------------

             Summary: Zeppelin becomes unresponsive and can only be recovered by restart
                 Key: ZEPPELIN-4599
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4599
             Project: Zeppelin
          Issue Type: Bug
    Affects Versions: 0.8.2
            Reporter: Paul Brenner
         Attachments: zeppelin-yarn-zeppelin-210.sec.placeiq.net.log, zeppelin-yarn-zeppelin-210.sec.placeiq.net.out

We use zeppelin with 10-20 users working primarily in spark. Every few days, and sometimes multiple times per day, the zeppelin webui becomes unresponsive and the only solution we have found is to restart zeppelin. This is extremely disruptive. 

"Unresponsive" usually takes the form of no longer being able to create new paragraphs, clicking run no longer doing anything or being stuck forever in pending, inability to create new notebooks, or the inability to load notebooks.

We have tried adding monitoring to the box zeppelin runs on and see nothing out of the ordinary with: GC rates, CPU utilizations, Memory usage, and heap utilization

We also don't see anything unusual in the logs. Is there any other way we can diagnose this issue to help find the root cause. 0.9 is currently too broken to use (based on a build using the live code on 1/27/2020 and again on 2/3/2020 )

 

Attaching a copy of logs JIC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)