You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by gw...@apache.org on 2017/02/20 18:39:26 UTC

incubator-systemml git commit: [SYSTEMML-1266] Replace README.txt In Release Package

Repository: incubator-systemml
Updated Branches:
  refs/heads/master d48121217 -> edb9e7786


[SYSTEMML-1266] Replace README.txt In Release Package

Closes #401.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/edb9e778
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/edb9e778
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/edb9e778

Branch: refs/heads/master
Commit: edb9e7786c6fe3a62dbc58a14f7e87aa9af8f67e
Parents: d481212
Author: Glenn Weidner <gw...@us.ibm.com>
Authored: Mon Feb 20 10:36:46 2017 -0800
Committer: Glenn Weidner <gw...@us.ibm.com>
Committed: Mon Feb 20 10:36:46 2017 -0800

----------------------------------------------------------------------
 src/main/standalone/README.txt | 169 ++++++++++++++++++++++++++----------
 1 file changed, 121 insertions(+), 48 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/edb9e778/src/main/standalone/README.txt
----------------------------------------------------------------------
diff --git a/src/main/standalone/README.txt b/src/main/standalone/README.txt
index af60940..024c679 100644
--- a/src/main/standalone/README.txt
+++ b/src/main/standalone/README.txt
@@ -1,65 +1,138 @@
--------------------------------------------------------------------------------
-Apache SystemML (incubating)
--------------------------------------------------------------------------------
+\ufeffLicensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
 
-SystemML is now an Apache Incubator project! Please see the Apache SystemML
-(incubating) website at http://systemml.apache.org/ for more information. The
-latest project documentation can be found at the SystemML Documentation website
-on GitHub at http://apache.github.io/incubator-systemml/.
+http://www.apache.org/licenses/LICENSE-2.0
 
-SystemML is a flexible, scalable machine learning system. SystemML's
-distinguishing characteristics are:
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
 
-  1. Algorithm customizability via R-like and Python-like languages.
-  2. Multiple execution modes, including Standalone, Spark Batch, Spark
-     MLContext, Hadoop Batch, and JMLC.
-  3. Automatic optimization based on data and cluster characteristics to ensure
-     both efficiency and scalability.
 
+# Apache SystemML
 
--------------------------------------------------------------------------------
-SystemML in Standalone Mode
--------------------------------------------------------------------------------
+**Documentation:** [SystemML Documentation](http://apache.github.io/incubator-systemml/)
+**Mailing List:** [Dev Mailing List](mailto:dev@systemml.incubator.apache.org)
+**Build Status:** [![Build Status](https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/badge/icon)](https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest)
+**Issue Tracker:** [JIRA](https://issues.apache.org/jira/browse/SYSTEMML)
+**Download:** [Download SystemML](http://systemml.apache.org/download.html)
 
-Standalone mode can be run on a single machine in a non-Hadoop environment,
-allowing data scientists to develop algorithms locally without need of a
-distributed cluster. The Standalone release packages all required libraries
-into a single distribution. Standalone mode is not appropriate for large
-datasets.
+**SystemML** is now an **Apache Incubator** project! Please see the [**Apache SystemML (incubating)**](http://systemml.apache.org/)
+website for more information. The latest project documentation can be found at the
+[**SystemML Documentation**](http://apache.github.io/incubator-systemml/) website on GitHub.
 
-OS X and Linux users can use the runStandaloneSystemML.sh script to run in
-Standalone mode, while Windows users can use the runStandaloneSystemML.bat
-script.
+SystemML is a flexible, scalable machine learning system.
+SystemML's distinguishing characteristics are:
 
+  1. **Algorithm customizability via R-like and Python-like languages**.
+  2. **Multiple execution modes**, including Spark MLContext API, Spark Batch, Hadoop Batch, Standalone, and JMLC.
+  3. **Automatic optimization** based on data and cluster characteristics to ensure both efficiency and scalability.
 
--------------------------------------------------------------------------------
-Hello World Example
--------------------------------------------------------------------------------
 
-The following example will run a "hello world" DML script on SystemML in
-Standalone mode.
+## Algorithm Customizability
 
-$ echo 'print("hello world");' > helloworld.dml
-$ ./runStandaloneSystemML.sh helloworld.dml
+ML algorithms in SystemML are specified in a high-level, declarative machine learning (DML) language.
+Algorithms can be expressed in either an R-like syntax or a Python-like syntax. DML includes
+linear algebra primitives, statistical functions, and additional constructs.
 
+This high-level language significantly increases the productivity of
+data scientists as it provides (1) full flexibility in expressing custom
+analytics and (2) data independence from the underlying input formats and
+physical data representations.
 
--------------------------------------------------------------------------------
-Running SystemML Algorithms
--------------------------------------------------------------------------------
 
-Several existing algorithms can be found in the scripts directory in the
-Standalone distribution. In the following example, we first obtain Haberman's
-Survival Data Set. We create a metadata file for this data. We create a
-types.csv file that describes the type of each column along with a
-corresponding metadata file. We then run the Univariate Statistics algorithm
-on the data in Standalone mode. The results are output to the
-data/univarOut.mtx file.
+## Multiple Execution Modes
 
-$ wget -P data/ http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data
-$ echo '{"rows": 306, "cols": 4, "format": "csv"}' > data/haberman.data.mtd
-$ echo '1,1,1,2' > data/types.csv
-$ echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd
-$ ./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx
+SystemML computations can be executed in a variety of different modes. To begin with, SystemML
+can be operated in Standalone mode on a single machine, allowing data scientists to develop
+algorithms locally without need of a distributed cluster. In order to scale up, algorithms can also be distributed
+across a cluster using Spark or Hadoop.
+This flexibility allows the utilization of an organization's existing resources and expertise.
+In addition, SystemML features a
+[Spark MLContext API](http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html)
+that allows for programmatic interaction via Scala, Python, and Java. SystemML also features an
+embedded API for scoring models.
 
-For more information, please see the online SystemML documentation.
 
+## Automatic Optimization
+
+Algorithms specified in DML are dynamically compiled and optimized based on data and cluster characteristics
+using rule-based and cost-based optimization techniques. The optimizer automatically generates hybrid runtime
+execution plans ranging from in-memory, single-node execution, to distributed computations on Spark or Hadoop.
+This ensures both efficiency and scalability. Automatic optimization reduces or eliminates the need to hand-tune
+distributed runtime execution plans and system configurations.
+
+## ML Algorithms
+
+SystemML features a suite of production-level examples that can be grouped into six broad categories:
+Descriptive Statistics, Classification, Clustering, Regression, Matrix Factorization, and Survival Analysis.
+Detailed descriptions of these algorithms can be found in the
+[SystemML Algorithms Reference](http://apache.github.io/incubator-systemml/algorithms-reference.html).  The goal of these provided algorithms is to serve as production-level examples that can modified or used as inspiration for a new custom algorithm.
+
+## Download & Setup
+
+Before you get started on SystemML, make sure that your environment is set up and ready to go.
+
+  1. **If you\u2019re on OS X, we recommend installing [Homebrew](http://brew.sh) if you haven\u2019t already.  For Linux users, the [Linuxbrew project](http://linuxbrew.sh/) is equivalent.**
+
+  OS X:
+  ```
+  /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
+  ```
+  Linux:
+  ```
+  ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install)"
+  ```
+
+  2. **Install Java (need Java 8).**
+  ```
+  brew tap caskroom/cask
+  brew install Caskroom/cask/java
+  ```
+
+  3. **Install Spark 2.1.**
+  ```
+  brew tap homebrew/versions
+  brew install apache-spark21
+  ```
+
+  4. **Download SystemML.**
+
+  Go to the [SystemML Downloads page](http://systemml.apache.org/download.html), download `systemml-0.13.0-incubating.zip` (should be 2nd), and unzip it to a location of your choice.
+
+  *The next step is optional, but it will make your life a lot easier.*
+
+  5. **[OPTIONAL] Set `SYSTEMML_HOME` in your bash profile.**
+  Add the following to `~/.bash_profile`, replacing `path/to/` with the location of the download in step 5.
+  ```
+  export SYSTEMML_HOME=path/to/systemml-0.13.0-incubating
+  ```
+  *Make sure to open a new tab in terminal so that you make sure the changes have been made.*
+
+  6. **[OPTIONAL] Install Python or Python 3 (to follow along with our Jupyter notebook examples).**
+
+  Python 2:
+  ```
+  brew install python
+  pip install jupyter matplotlib numpy
+  ```
+
+  Python 3:
+  ```
+  brew install python3
+  pip3 install jupyter matplotlib numpy
+  ```
+
+**Congrats! You can now use SystemML!**
+
+## Next Steps!
+
+To get started, please consult the
+[SystemML Documentation](http://apache.github.io/incubator-systemml/) website on GitHub.  We
+recommend using the [Spark MLContext API](http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html)
+to run SystemML from Scala or Python using `spark-shell`, `pyspark`, or `spark-submit`.