You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by na...@apache.org on 2017/02/23 21:29:07 UTC

[1/2] incubator-systemml git commit: [SYSTEMML-1238] Updated the default parameters of mllearn to match that of scikit learn.

Repository: incubator-systemml
Updated Branches:
  refs/heads/gh-pages bb97a4bc6 -> 5c4e27c70


[SYSTEMML-1238] Updated the default parameters of mllearn to match that of
scikit learn.

- Also updated the test to compare our algorithm to scikit-learn.

Closes #398.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/0fb74b94
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/0fb74b94
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/0fb74b94

Branch: refs/heads/gh-pages
Commit: 0fb74b94af9e244b5695745ac7b3651b485b812f
Parents: bb97a4b
Author: Niketan Pansare <np...@us.ibm.com>
Authored: Fri Feb 17 14:54:23 2017 -0800
Committer: Niketan Pansare <np...@us.ibm.com>
Committed: Fri Feb 17 14:59:49 2017 -0800

----------------------------------------------------------------------
 algorithms-regression.md  | 8 ++++----
 beginners-guide-python.md | 2 +-
 python-reference.md       | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/0fb74b94/algorithms-regression.md
----------------------------------------------------------------------
diff --git a/algorithms-regression.md b/algorithms-regression.md
index 992862e..80b38a3 100644
--- a/algorithms-regression.md
+++ b/algorithms-regression.md
@@ -83,8 +83,8 @@ efficient when the number of features $m$ is relatively small
 <div data-lang="Python" markdown="1">
 {% highlight python %}
 from systemml.mllearn import LinearRegression
-# C = 1/reg
-lr = LinearRegression(sqlCtx, fit_intercept=True, C=1.0, solver='direct-solve')
+# C = 1/reg (to disable regularization, use float("inf"))
+lr = LinearRegression(sqlCtx, fit_intercept=True, normalize=False, C=float("inf"), solver='direct-solve')
 # X_train, y_train and X_test can be NumPy matrices or Pandas DataFrame or SciPy Sparse Matrix
 y_test = lr.fit(X_train, y_train)
 # df_train is DataFrame that contains two columns: "features" (of type Vector) and "label". df_test is a DataFrame that contains the column "features"
@@ -125,8 +125,8 @@ y_test = lr.fit(df_train)
 <div data-lang="Python" markdown="1">
 {% highlight python %}
 from systemml.mllearn import LinearRegression
-# C = 1/reg
-lr = LinearRegression(sqlCtx, fit_intercept=True, max_iter=100, tol=0.000001, C=1.0, solver='newton-cg')
+# C = 1/reg (to disable regularization, use float("inf"))
+lr = LinearRegression(sqlCtx, fit_intercept=True, normalize=False, max_iter=100, tol=0.000001, C=float("inf"), solver='newton-cg')
 # X_train, y_train and X_test can be NumPy matrices or Pandas DataFrames or SciPy Sparse matrices
 y_test = lr.fit(X_train, y_train)
 # df_train is DataFrame that contains two columns: "features" (of type Vector) and "label". df_test is a DataFrame that contains the column "features"

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/0fb74b94/beginners-guide-python.md
----------------------------------------------------------------------
diff --git a/beginners-guide-python.md b/beginners-guide-python.md
index 4d1b098..ffab09e 100644
--- a/beginners-guide-python.md
+++ b/beginners-guide-python.md
@@ -228,7 +228,7 @@ X_test = diabetes_X[-20:]
 y_train = diabetes.target[:-20]
 y_test = diabetes.target[-20:]
 # Create linear regression object
-regr = LinearRegression(sqlCtx, fit_intercept=True, C=1, solver='direct-solve')
+regr = LinearRegression(sqlCtx, fit_intercept=True, C=float("inf"), solver='direct-solve')
 # Train the model using the training sets
 regr.fit(X_train, y_train)
 y_predicted = regr.predict(X_test)

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/0fb74b94/python-reference.md
----------------------------------------------------------------------
diff --git a/python-reference.md b/python-reference.md
index 65dcb5c..8d38598 100644
--- a/python-reference.md
+++ b/python-reference.md
@@ -731,7 +731,7 @@ LogisticRegression score: 0.922222
 
 ### Reference documentation
 
- *class*`systemml.mllearn.estimators.LinearRegression`(*sqlCtx*, *fit\_intercept=True*, *max\_iter=100*, *tol=1e-06*, *C=1.0*, *solver='newton-cg'*, *transferUsingDF=False*)(#systemml.mllearn.estimators.LinearRegression "Permalink to this definition")
+ *class*`systemml.mllearn.estimators.LinearRegression`(*sqlCtx*, *fit\_intercept=True*, *normalize=False*, *max\_iter=100*, *tol=1e-06*, *C=float("inf")*, *solver='newton-cg'*, *transferUsingDF=False*)(#systemml.mllearn.estimators.LinearRegression "Permalink to this definition")
 :   Bases: `systemml.mllearn.estimators.BaseSystemMLRegressor`{.xref .py
     .py-class .docutils .literal}
 
@@ -760,7 +760,7 @@ LogisticRegression score: 0.922222
         >>> # The mean square error
         >>> print("Residual sum of squares: %.2f" % np.mean((regr.predict(diabetes_X_test) - diabetes_y_test) ** 2))
 
- *class*`systemml.mllearn.estimators.LogisticRegression`(*sqlCtx*, *penalty='l2'*, *fit\_intercept=True*, *max\_iter=100*, *max\_inner\_iter=0*, *tol=1e-06*, *C=1.0*, *solver='newton-cg'*, *transferUsingDF=False*)(#systemml.mllearn.estimators.LogisticRegression "Permalink to this definition")
+ *class*`systemml.mllearn.estimators.LogisticRegression`(*sqlCtx*, *penalty='l2'*, *fit\_intercept=True*, *normalize=False*,  *max\_iter=100*, *max\_inner\_iter=0*, *tol=1e-06*, *C=1.0*, *solver='newton-cg'*, *transferUsingDF=False*)(#systemml.mllearn.estimators.LogisticRegression "Permalink to this definition")
 :   Bases: `systemml.mllearn.estimators.BaseSystemMLClassifier`{.xref
     .py .py-class .docutils .literal}
 
@@ -817,7 +817,7 @@ LogisticRegression score: 0.922222
         >>> prediction = model.transform(test)
         >>> prediction.show()
 
- *class*`systemml.mllearn.estimators.SVM`(*sqlCtx*, *fit\_intercept=True*, *max\_iter=100*, *tol=1e-06*, *C=1.0*, *is\_multi\_class=False*, *transferUsingDF=False*)(#systemml.mllearn.estimators.SVM "Permalink to this definition")
+ *class*`systemml.mllearn.estimators.SVM`(*sqlCtx*, *fit\_intercept=True*, *normalize=False*, *max\_iter=100*, *tol=1e-06*, *C=1.0*, *is\_multi\_class=False*, *transferUsingDF=False*)(#systemml.mllearn.estimators.SVM "Permalink to this definition")
 :   Bases: `systemml.mllearn.estimators.BaseSystemMLClassifier`{.xref
     .py .py-class .docutils .literal}
 


[2/2] incubator-systemml git commit: Updated document to correspond to the currently released artifacts

Posted by na...@apache.org.
Updated document to correspond to the currently released artifacts

Closes #403


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/5c4e27c7
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/5c4e27c7
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/5c4e27c7

Branch: refs/heads/gh-pages
Commit: 5c4e27c701da1084d1e47d7ad049f9570033e7ae
Parents: 0fb74b9
Author: Nakul Jindal <na...@gmail.com>
Authored: Tue Feb 21 14:56:58 2017 -0800
Committer: Nakul Jindal <na...@gmail.com>
Committed: Thu Feb 23 13:20:27 2017 -0800

----------------------------------------------------------------------
 release-process.md | 146 ++++++++++++++++++++----------------------------
 1 file changed, 62 insertions(+), 84 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/5c4e27c7/release-process.md
----------------------------------------------------------------------
diff --git a/release-process.md b/release-process.md
index 1cc5c9f..a75a281 100644
--- a/release-process.md
+++ b/release-process.md
@@ -102,86 +102,64 @@ The build artifacts should be downloaded from [https://dist.apache.org/repos/dis
 this OS X example.
 
 	# download artifacts
-	wget -r -nH -nd -np -R index.html* https://dist.apache.org/repos/dist/dev/incubator/systemml/0.11.0-incubating-rc1/
+	wget -r -nH -nd -np -R 'index.html*' https://dist.apache.org/repos/dist/dev/incubator/systemml/0.13.0-incubating-rc1/
 
 	# verify standalone tgz works
-	tar -xvzf systemml-0.11.0-incubating-standalone.tgz
-	cd systemml-0.11.0-incubating-standalone
+	tar -xvzf systemml-0.13.0-incubating-bin.tgz
+	cd systemml-0.13.0-incubating-bin
 	echo "print('hello world');" > hello.dml
 	./runStandaloneSystemML.sh hello.dml
 	cd ..
 
-	# verify main jar works
-	mkdir lib
-	cp -R systemml-0.11.0-incubating-standalone/lib/* lib/
-	rm lib/systemml-0.11.0-incubating.jar
-	java -cp ./lib/*:systemml-0.11.0-incubating.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
-
-	# verify src works
-	tar -xvzf systemml-0.11.0-incubating-src.tgz
-	cd systemml-0.11.0-incubating-src
-	mvn clean package -P distribution
-	cd target/
-	java -cp ./lib/*:systemml-0.11.0-incubating.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
-	java -cp ./lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
-	cd ..
+	# verify standalon zip works
+	rm -rf systemml-0.13.0-incubating-bin
+	unzip systemml-0.13.0-incubating-bin.zip
+	cd systemml-0.13.0-incubating-bin
+	echo "print('hello world');" > hello.dml
+	./runStandaloneSystemML.sh hello.dml
 	cd ..
 
-	# verify distrib tgz works
-	tar -xvzf systemml-0.11.0-incubating.tgz
-	cd systemml-0.11.0-incubating
-	java -cp ../lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
-
-	# verify spark batch mode
-	export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6
-	$SPARK_HOME/bin/spark-submit SystemML.jar -s "print('hello world');" -exec hybrid_spark
-
-	# verify hadoop batch mode
-	hadoop jar SystemML.jar -s "print('hello world');"
-
-
-Here is an example of doing a basic
-sanity check on OS X after building the artifacts manually.
-
-	# build distribution artifacts
-	mvn clean package -P distribution
-
-	cd target
-
-	# verify main jar works
-	java -cp ./lib/*:systemml-0.11.0-incubating.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
-
-	# verify SystemML.jar works
-	java -cp ./lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
-
 	# verify src works
-	tar -xvzf systemml-0.11.0-incubating-src.tgz
-	cd systemml-0.11.0-incubating-src
+	tar -xvzf systemml-0.13.0-incubating-src.tgz
+	cd systemml-0.13.0-incubating-src
 	mvn clean package -P distribution
 	cd target/
-	java -cp ./lib/*:systemml-0.11.0-incubating.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
-	java -cp ./lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
-	cd ..
-	cd ..
-
-	# verify standalone tgz works
-	tar -xvzf systemml-0.11.0-incubating-standalone.tgz
-	cd systemml-0.11.0-incubating-standalone
-	echo "print('hello world');" > hello.dml
-	./runStandaloneSystemML.sh hello.dml
-	cd ..
-
-	# verify distrib tgz works
-	tar -xvzf systemml-0.11.0-incubating.tgz
-	cd systemml-0.11.0-incubating
-	java -cp ../lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
+	java -cp "./lib/*:systemml-0.13.0-incubating.jar" org.apache.sysml.api.DMLScript -s "print('hello world');"
+	java -cp "./lib/*:SystemML.jar" org.apache.sysml.api.DMLScript -s "print('hello world');"
+	cd ../..
 
 	# verify spark batch mode
-	export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6
-	$SPARK_HOME/bin/spark-submit SystemML.jar -s "print('hello world');" -exec hybrid_spark
+	export SPARK_HOME=~/spark-2.1.0-bin-hadoop2.7
+	cd systemml-0.13.0-incubating-bin/target/lib
+	$SPARK_HOME/bin/spark-submit systemml-0.13.0-incubating.jar -s "print('hello world');" -exec hybrid_spark
 
 	# verify hadoop batch mode
-	hadoop jar SystemML.jar -s "print('hello world');"
+	hadoop jar systemml-0.13.0-incubating.jar -s "print('hello world');"
+
+
+	# verify python artifact
+	# install numpy, pandas, scipy & set SPARK_HOME
+	pip install numpy
+	pip install pandas
+	pip install scipy
+	export SPARK_HOME=~/spark-2.1.0-bin-hadoop2.7
+	# get into the pyspark prompt
+	cd systemml-0.13.0
+	$SPARK_HOME/bin/pyspark --driver-class-path systemml-java/systemml-0.13.0-incubating.jar
+	# Use this program at the prompt:
+	import systemml as sml
+	import numpy as np
+	m1 = sml.matrix(np.ones((3,3)) + 2)
+	m2 = sml.matrix(np.ones((3,3)) + 3)
+	m2 = m1 * (m2 + m1)
+	m4 = 1.0 - m2
+	m4.sum(axis=1).toNumPy()
+
+	# This should be printed
+	# array([[-60.],
+	#       [-60.],
+	#       [-60.]])
+
 
 
 ## Python Tests
@@ -229,8 +207,8 @@ The project should be built using the `src` (tgz and zip) artifacts.
 In addition, the test suite should be run using an `src` artifact and
 the tests should pass.
 
-	tar -xvzf systemml-0.11.0-incubating-src.tgz
-	cd systemml-0.11.0-incubating-src
+	tar -xvzf systemml-0.13.0-incubating-src.tgz
+	cd systemml-0.13.0-incubating-src
 	mvn clean package -P distribution
 	mvn verify
 
@@ -246,13 +224,14 @@ standalone distributions.
 Here is an example based on the [Standalone Guide](http://apache.github.io/incubator-systemml/standalone-guide.html)
 demonstrating the execution of an algorithm (on OS X).
 
-	$ tar -xvzf systemml-0.11.0-incubating-standalone.tgz
-	$ cd systemml-0.11.0-incubating-standalone
-	$ wget -P data/ http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data
-	$ echo '{"rows": 306, "cols": 4, "format": "csv"}' > data/haberman.data.mtd
-	$ echo '1,1,1,2' > data/types.csv
-	$ echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd
-	$ ./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx CONSOLE_OUTPUT=TRUE
+	tar -xvzf systemml-0.13.0-incubating-bin.tgz
+	cd systemml-0.13.0-incubating-bin
+	wget -P data/ http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data
+	echo '{"rows": 306, "cols": 4, "format": "csv"}' > data/haberman.data.mtd
+	echo '1,1,1,2' > data/types.csv
+	echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd
+	./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx CONSOLE_OUTPUT=TRUE
+	cd ..
 
 
 ## Single-Node Spark
@@ -263,13 +242,13 @@ Verify that SystemML runs algorithms on Spark locally.
 
 Here is an example of running the `Univar-Stats.dml` algorithm on random generated data.
 
-	$ tar -xvzf systemml-0.11.0-incubating.tgz
-	$ cd systemml-0.11.0-incubating
-	$ export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6
-	$ $SPARK_HOME/bin/spark-submit SystemML.jar -f scripts/datagen/genRandData4Univariate.dml -exec hybrid_spark -args 1000000 100 10 1 2 3 4 uni.mtx
-	$ echo '1' > uni-types.csv
-	$ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd
-	$ $SPARK_HOME/bin/spark-submit SystemML.jar -f scripts/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE
+	cd systemml-0.13.0-incubating-bin/lib
+	export SPARK_HOME=~/spark-2.1.0-bin-hadoop2.7
+	$SPARK_HOME/bin/spark-submit systemml-0.13.0-incubating.jar -f ../scripts/datagen/genRandData4Univariate.dml -exec hybrid_spark -args 1000000 100 10 1 2 3 4 uni.mtx
+	echo '1' > uni-types.csv
+	echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd
+	$SPARK_HOME/bin/spark-submit systemml-0.13.0-incubating.jar -f ../scripts/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE
+	cd ..
 
 
 ## Single-Node Hadoop
@@ -280,7 +259,8 @@ Verify that SystemML runs algorithms on Hadoop locally.
 
 Based on the "Single-Node Spark" setup above, the `Univar-Stats.dml` algorithm could be run as follows:
 
-	$ hadoop jar SystemML.jar -f scripts/algorithms/Univar-Stats.dml -nvargs X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE
+	cd systemml-0.13.0-incubating-bin/lib
+	hadoop jar systemml-0.13.0-incubating.jar -f ../scripts/algorithms/Univar-Stats.dml -nvargs X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE
 
 
 ## Notebooks
@@ -313,5 +293,3 @@ has been approved.
 
 To be written. (What steps need to be done? How is the release deployed to the central maven repo? What updates need to
 happen to the main website, such as updating the Downloads page? Where do the release notes for the release go?)
-
-