You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by Marcin ZiemiƄski <zi...@gmail.com> on 2016/09/28 17:29:10 UTC

Zeppelin integration

Hi,

Apache Zeppelin is a really useful tool and very promising project. I image
that sometimes people would like to use it to analyze their eventdata or
modify it when necessary. I have been thinking how to create a simple and
seamless integration between PredictionIO and Zeppelin.

I have been playing with Zeppelin a little and eventually came up with a
simple implementation of an interpreter glued with already existing Spark
interpreter.
You can see some results here:
http://imgur.com/a/Efxu9

So instead of using %spark as interpreter you just switch to
%spark.pio-spark, which uses %spark underneath, but also provides some
additional bindings (currently for PEvents and LEvents for presentation).
The new interpreter comes with some settings to be configured, just like in
a regular  pio-env.sh file.

I realize that the functionality I added is quite poor and could be
provided in some other way than by plugging in a new interpreter. However,
it can be treated as a basis for building new analytical tools in
PredictionIO ecosystem.
Some ideas I have:
* Adding handles to play with different engines, so that one could load,
train, evaluate and save models programmatically. This would enable
injecting some custom code for cross validation, parameter choosing, etc.
and above everything provide means to prototype faster.
* Attaching an isolated environment to test things with Zeppelin, namely
providing pio <...> commands that could be run e.g. inside some docker
image or  session. I think that a user could have an access to engine
template repository represented as some form, from which he could select
templates to be downloaded and installed to use right away in the same note.
* I am also wondering whether Zeppelin could be used for dashboards
creation. It would be nice to be able to collect some statistic in real
time, e.g. engine performance represented as a nice graph within your
custom note?

Maybe this idea is a little farfetched or there are better ways to improve
user experience. I would like to know what others, especially active PIO
users think about it? Is there something that you would like to have inside
a tool like Zeppelin to be used with PredictionIO? Can you think of some
other useful features?

Best regards,
Marcin