You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Oliver Drewes (JIRA)" <ji...@apache.org> on 2015/12/03 18:14:11 UTC
[jira] [Created] (ZEPPELIN-483) Cronjob: Infinity interpreting
notes -> Infinity new files & inodes
Oliver Drewes created ZEPPELIN-483:
--------------------------------------
Summary: Cronjob: Infinity interpreting notes -> Infinity new files & inodes
Key: ZEPPELIN-483
URL: https://issues.apache.org/jira/browse/ZEPPELIN-483
Project: Zeppelin
Issue Type: Bug
Components: Core
Affects Versions: 0.6.0
Environment: Build for Spark 1.4.1 / mapr5
Reporter: Oliver Drewes
Priority: Blocker
Lets start with the basic:
Zeppelin will always write to the tmp folder. Whatever you enter for SparkInterpter settings, Zeppelin keeps writing his compiled spark source to /tmp/spark-{ID}
No ENV-variable will change this behaviour.
This means it takes inodes from each file it creates by interpreting the single lines of code. This wouldnt matter if you run them once. But it do run it regularly or in a cronjob, each line of your note is interpreted again and again. So 30 lines of code produce about 200 files. If you run the cronjob once a minute, it produces about 12000 Files an hour. Interpreting the code line by line, without checking if it already exists is a bad solution.
For a 1 GB Filesystem f.e. you have 65k inodes available. This means if you run your source for some house, it need 100 MB of space but produces 65k files and you run out of inodes.
My idea of an solution would be to check if the note has changed. If it has changed, delete the old class files and run it again.
If it is the same, reuse the existing classes.
If a class if the same hash exists already, reuse this class.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)