You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Niketan Pansare <np...@us.ibm.com> on 2016/04/08 03:04:23 UTC

Updating documentation for notebook


Hi all,

Here is a suggestion for reducing the barrier to entry for SystemML: "Have
a detailed quickstart guide/video using Notebook on free (or trial-based)
hosting solution like IBM Bluemix or Data Scientist Workbench".

I have create a sample tutorial:
https://github.com/niketanpansare/systemml_tutorial

Missing items in above tutorial:
1. Create a separate section for Notebook rather than have it hidden under
MLContext Programming guide (
http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
).
2. Add Python Notebooks (This requires attaching both jars and python
MLContext to Zeppelin or Jupyter context).
3. Allow users to use jars from our nightly build (see my jupyter example)
as well as released version (see my zeppelin example).
4. Tutorials for all our algorithms using real world dataset. Example:
https://www.ibm.com/support/knowledgecenter/SSPT3X_2.1.2/com.ibm.swg.im.infosphere.biginsights.tut.doc/doc/tut_Mod_BigR.html
.
5. DML Kernel for Zeppelin (see
https://issues.apache.org/jira/browse/SYSTEMML-542).
6. Other hosting services such as AzureML.
7. Tutorial that shows SystemML's integration with MLPipeline.

These missing items can be broken down into relatively small tasks with
detailed specification that external contributors can work on. Any
thoughts ?

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

Re: Updating documentation for notebook

Posted by Luciano Resende <lu...@gmail.com>.
On Thu, Apr 7, 2016 at 6:04 PM, Niketan Pansare <np...@us.ibm.com> wrote:

>
>
> Hi all,
>
> Here is a suggestion for reducing the barrier to entry for SystemML: "Have
> a detailed quickstart guide/video using Notebook on free (or trial-based)
> hosting solution like IBM Bluemix or Data Scientist Workbench".
>
> I have create a sample tutorial:
> https://github.com/niketanpansare/systemml_tutorial
>
> Missing items in above tutorial:
> 1. Create a separate section for Notebook rather than have it hidden under
> MLContext Programming guide (
>
> http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
> ).
> 2. Add Python Notebooks (This requires attaching both jars and python
> MLContext to Zeppelin or Jupyter context).
> 3. Allow users to use jars from our nightly build (see my jupyter example)
> as well as released version (see my zeppelin example).
> 4. Tutorials for all our algorithms using real world dataset. Example:
>
> https://www.ibm.com/support/knowledgecenter/SSPT3X_2.1.2/com.ibm.swg.im.infosphere.biginsights.tut.doc/doc/tut_Mod_BigR.html
> .
> 5. DML Kernel for Zeppelin (see
> https://issues.apache.org/jira/browse/SYSTEMML-542).
> 6. Other hosting services such as AzureML.
> 7. Tutorial that shows SystemML's integration with MLPipeline.
>
> These missing items can be broken down into relatively small tasks with
> detailed specification that external contributors can work on. Any
> thoughts ?
>
> Thanks,
>
>
>
>
Great !!! Any reason these are not on the project git (e.g.  samples) ? And
the tutorial as part of the documentation ? I believe that having a central
place to find these might be very useful for interested users.


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Updating documentation for notebook

Posted by Niketan Pansare <np...@us.ibm.com>.
Thanks Abhishek. I am glad it was helpful :)

Luciano: I agree with you about having a central place for documentation.
Before cleaning up the tutorial and putting it into our documentation, I
wanted to:
1. Have a discussion about which setup should we use to introduce SystemML:
command-line standalone, command-line spark/pyspark REPL (yarn/standalone),
command-line hadoop, scala/python notebook (online notebook or require user
to setup jupyter/zeppelin).
2. Encourage other contributors to come up with intellectually simulating
tutorial using real world dataset and our existing DML algorithms. This
means creating JIRAs that people can work on. My repository is only a POC
to facilitate discussion and will be deleted after that.
3. If we do decide to go with online notebook based tutorial, have a
discussion on how to structure the tutorial:
- so as to support variety of hosting sites (bluemix / datascientist
workbench / databricks cloud / azureml / aws / ...).
- Python or Scala as primary language.
- Jupyter or Zeppelin as primary notebook.
- DML kernel or MLContext-based or JMLC-based example.
- Any standard tutorial (or textbook) we should use as example for choosing
the dataset.
- Whether the emphasis should be on learning DML or on building larger data
pipeline (for example: our MLPipeline-wrapper).

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	Abhishek Srivastava <ab...@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	04/08/2016 08:55 AM
Subject:	Re: Updating documentation for notebook



Great job Niketan , I had been searching for such document off late.

Regards,
Abhishek Srivastava
Fellowship Scholar , IIM Ranchi
Skype : abhi.sri3

On Fri, Apr 8, 2016 at 6:34 AM, Niketan Pansare <np...@us.ibm.com> wrote:

>
>
> Hi all,
>
> Here is a suggestion for reducing the barrier to entry for SystemML:
"Have
> a detailed quickstart guide/video using Notebook on free (or trial-based)
> hosting solution like IBM Bluemix or Data Scientist Workbench".
>
> I have create a sample tutorial:
> https://github.com/niketanpansare/systemml_tutorial
>
> Missing items in above tutorial:
> 1. Create a separate section for Notebook rather than have it hidden
under
> MLContext Programming guide (
>
>
http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html

> ).
> 2. Add Python Notebooks (This requires attaching both jars and python
> MLContext to Zeppelin or Jupyter context).
> 3. Allow users to use jars from our nightly build (see my jupyter
example)
> as well as released version (see my zeppelin example).
> 4. Tutorials for all our algorithms using real world dataset. Example:
>
>
https://www.ibm.com/support/knowledgecenter/SSPT3X_2.1.2/com.ibm.swg.im.infosphere.biginsights.tut.doc/doc/tut_Mod_BigR.html

> .
> 5. DML Kernel for Zeppelin (see
> https://issues.apache.org/jira/browse/SYSTEMML-542).
> 6. Other hosting services such as AzureML.
> 7. Tutorial that shows SystemML's integration with MLPipeline.
>
> These missing items can be broken down into relatively small tasks with
> detailed specification that external contributors can work on. Any
> thoughts ?
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>



Re: Updating documentation for notebook

Posted by Abhishek Srivastava <ab...@gmail.com>.
Great job Niketan , I had been searching for such document off late.

Regards,
Abhishek Srivastava
Fellowship Scholar , IIM Ranchi
Skype : abhi.sri3

On Fri, Apr 8, 2016 at 6:34 AM, Niketan Pansare <np...@us.ibm.com> wrote:

>
>
> Hi all,
>
> Here is a suggestion for reducing the barrier to entry for SystemML: "Have
> a detailed quickstart guide/video using Notebook on free (or trial-based)
> hosting solution like IBM Bluemix or Data Scientist Workbench".
>
> I have create a sample tutorial:
> https://github.com/niketanpansare/systemml_tutorial
>
> Missing items in above tutorial:
> 1. Create a separate section for Notebook rather than have it hidden under
> MLContext Programming guide (
>
> http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
> ).
> 2. Add Python Notebooks (This requires attaching both jars and python
> MLContext to Zeppelin or Jupyter context).
> 3. Allow users to use jars from our nightly build (see my jupyter example)
> as well as released version (see my zeppelin example).
> 4. Tutorials for all our algorithms using real world dataset. Example:
>
> https://www.ibm.com/support/knowledgecenter/SSPT3X_2.1.2/com.ibm.swg.im.infosphere.biginsights.tut.doc/doc/tut_Mod_BigR.html
> .
> 5. DML Kernel for Zeppelin (see
> https://issues.apache.org/jira/browse/SYSTEMML-542).
> 6. Other hosting services such as AzureML.
> 7. Tutorial that shows SystemML's integration with MLPipeline.
>
> These missing items can be broken down into relatively small tasks with
> detailed specification that external contributors can work on. Any
> thoughts ?
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>