You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by "Oliver Drewes (JIRA)" <ji...@apache.org> on 2015/12/03 18:14:11 UTC

[jira] [Created] (ZEPPELIN-483) Cronjob: Infinity interpreting notes -> Infinity new files & inodes

Oliver Drewes created ZEPPELIN-483:
--------------------------------------

             Summary: Cronjob: Infinity interpreting notes -> Infinity new files & inodes
                 Key: ZEPPELIN-483
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-483
             Project: Zeppelin
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.6.0
         Environment: Build for Spark 1.4.1 / mapr5
            Reporter: Oliver Drewes
            Priority: Blocker


Lets start with the basic:
Zeppelin will always write to the tmp folder. Whatever you enter for SparkInterpter settings, Zeppelin keeps writing his compiled spark source to /tmp/spark-{ID}
No ENV-variable will change this behaviour.

This means it takes inodes from each file it creates by interpreting the single lines of code. This wouldnt matter if you run them once. But it do run it regularly or in a cronjob, each line of your note is interpreted again and again. So 30 lines of code produce about 200 files. If you run the cronjob once a minute, it produces about 12000 Files an hour. Interpreting the code line by line, without checking if it already exists is a bad solution. 

For a 1 GB Filesystem f.e. you have 65k inodes available. This means if you run your source for some house, it need 100 MB of space but produces 65k files and you run out of inodes. 

My idea of an solution would be to check if the note has changed. If it has changed, delete the old class files and run it again. 
If it is the same, reuse the existing classes.
If a class if the same hash exists already, reuse this class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)