You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by Alexander Bezzubov <bz...@apache.org> on 2015/03/10 11:46:24 UTC

[COMDEV-117] GSoC Project: create a ML\Deeplearning tutorial notebook

Lets start aggregating student proposals on the wiki
<https://cwiki.apache.org/confluence/display/ZEPPELIN/GoogleSummerOfCode#GoogleSummerOfCode-StudentProposals>

Below are my thoughts and I'd love to volunteer to be a mentor for this
project, feedback is very welcome.

This is deliberately an open-ended project, so we need to work together to
define a possible scope.


*Main Idea:*
Use an open data <http://www.kdnuggets.com/datasets/index.html> (any
dataset with compatible licence) to build a set of Zeppelin notebooks using
existing ML tools, which show how Zeppelin can help a data scientists in
their day to day tasks (cleaning the data, building the model, using it).

Extra bonus will be to use modern Deeplearing techniques i.e to
work\classify Images or any kind of NLP.

Good examples could be past Kaggle competitions, like Titanic
<https://www.kaggle.com/c/titanic-gettingStarted/details/new-getting-started-with-r>
and
all others.
There must be a lot of different ways to approach this so it leaves a space
for creative proposals.

List of possible tools include
Python\R\Mahout\MLlib\PredictionIO\H20\Sparkling-water\SINGA etc


Updates, directions, suggestions that can help students to make a good
proposals are more then welcome!

--
Kind regards,
Alexander.