You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Axel Van Damme (Jira)" <ji...@apache.org> on 2020/02/13 07:35:00 UTC

[jira] [Created] (ZEPPELIN-4612) Optimize Notebooks loading

Axel Van Damme created ZEPPELIN-4612:
----------------------------------------

             Summary: Optimize Notebooks loading
                 Key: ZEPPELIN-4612
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4612
             Project: Zeppelin
          Issue Type: Improvement
          Components: NotebookRepo
    Affects Versions: 0.9.0
         Environment: Number of Notebooks in NotebookRepo:
{code:java}
root@zeppelin:/opt# find NotebookRepo/ -type f -not -path "NotebookRepo/.git/*" | wc -l
1524
{code}
Disk space used by the NotebookRepo:
{code:java}
root@zeppelin:/opt# du -skh NotebookRepo/
2.7G    NotebookRepo/
{code}
Environment variable:
{code:java}
root@zeppelin:/opt/zeppelin/logs# echo $ZEPPELIN_MEM
-Xmx8192m -XX:MaxPermSize=1024m
{code}
Memory used before first login in Zeppelin UI (298.7MiB):
{code:java}
CONTAINER ID        NAME                                        CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
38bf46661dc9        avan_zeppelin.1.ow2ckxn7pghvgk3osewrtr16j   0.09%               298.7MiB / 16GiB    1.82%               2.14kB / 820B       0B / 0B             65
{code}
Memory used after first login in Zeppelin UI (5.491GiB):
{code:java}
CONTAINER ID        NAME                                        CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
38bf46661dc9        avan_zeppelin.1.ow2ckxn7pghvgk3osewrtr16j   0.12%               5.491GiB / 16GiB    34.32%              11.9kB / 40.2kB     0B / 73.7kB 
{code}
Logs of first login attached (zeppelin–zeppelin.log), login phase occured between 2020-02-13 08:23:11 and 2020-02-13 08:23:53
            Reporter: Axel Van Damme
         Attachments: zeppelin--zeppelin.log

Our current Notebooks base contains more than 1500 Notebooks.

While in Zeppelin 0.8.3 the solution was not ideal because all Notebooks were loaded in memory at Zeppelin startup, the situation in Zeppelin 0.9 is worse because loading the Notebooks is occuring at login phase. So the end user has to wait a long long time before he is getting in Zeppelin and usually thinks that Zeppelin is down.

Also, the solution of loading the entire Notebook base in memory is not scalable because as new Notebooks are created we always have to increase the ZEPPELIN_MEM environment variable.

At the moment to be able to log in Zeppelin with our 1500 Notebooks we set: 
{code:java}
ZEPPELIN_MEM=-Xmx8192m -XX:MaxPermSize=1024m
{code}
This is a lot of memory that cannot be used for actual code processing.

The first logging takes 42 sec (see logs from 2020-02-13 08:23:11 to 2020-02-13 08:23:53)

Wouldn't be possible to just walk through the directory structure of the NotebookRepo to display Zeppelin welcome page with the tree structure?

This would be a great improvement and would offer the possibility to use Zeppelin at scale.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)