You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@systemml.apache.org by du...@apache.org on 2016/01/26 02:24:17 UTC

[1/2] incubator-systemml git commit: [SYSTEMML-482] [SYSTEMML-480] Adding a Git attributes file to enfore Unix-styled line endings, and normalizing all of the line endings.

Repository: incubator-systemml
Updated Branches:
  refs/heads/gh-pages 090fb9403 -> 3ce15871b


[SYSTEMML-482] [SYSTEMML-480] Adding a Git attributes file to enfore Unix-styled line endings, and normalizing all of the line endings.

Closes #49.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/4ce58eb6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/4ce58eb6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/4ce58eb6

Branch: refs/heads/gh-pages
Commit: 4ce58eb6c7d42cbe84449473ca441043d683fb0a
Parents: 090fb94
Author: Mike Dusenberry <mw...@us.ibm.com>
Authored: Fri Jan 22 08:31:35 2016 -0800
Committer: Mike Dusenberry <mw...@us.ibm.com>
Committed: Fri Jan 22 08:31:35 2016 -0800

----------------------------------------------------------------------
 devdocs/MatrixMultiplicationOperators.txt | 256 ++++++++++++-------------
 1 file changed, 128 insertions(+), 128 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/4ce58eb6/devdocs/MatrixMultiplicationOperators.txt
----------------------------------------------------------------------
diff --git a/devdocs/MatrixMultiplicationOperators.txt b/devdocs/MatrixMultiplicationOperators.txt
index 7bc8a9c..962951c 100644
--- a/devdocs/MatrixMultiplicationOperators.txt
+++ b/devdocs/MatrixMultiplicationOperators.txt
@@ -1,128 +1,128 @@
-#####################################################################
-# TITLE: An Overview of Matrix Multiplication Operators in SystemML #
-# DATE MODIFIED: 11/21/2015                                         #
-#####################################################################
-
-In the following, we give an overview of backend-specific physical matrix multiplication operators in SystemML as well as their internally used matrix multiplication block operations.
-
-A) BASIC MATRIX MULT OPERATORS 
--------------------------------
-
-An AggBinaryOp hop can be compiled into the following physical operators.
-
-* 1) Physical Operators in CP (single node, control program)
-  - MM (basic matrix multiplication)                     --> mm
-  - MMChain (matrix multiplication chain)                --> mmchain
-  - TSMM (transpose-self matrix multiplication)          --> tsmm
-  - PMM (permutation matrix multiplication)              --> pmm
-
-* 2) Physical Operator in MR (distributed, mapreduce)
-  - MapMM (map-side matrix multiplication, w/|w/o agg)   --> mm
-  - MapMMChain (map-side matrix chain multiplication)    --> mmchain
-  - TSMM (map-side transpose-self matrix multiplication) --> tsmm
-  - PMM (map-side permutation matrix multiplication)     --> pmm
-  - CPMM (cross-product matrix multiplication, 2 jobs)   --> mm
-  - RMM (replication-based matrix multiplication, 1 job) --> mm
-
-* 3) Physical Operators in SPARK (distributed, spark)
-  - MapMM (see MR, flatmap/mappartitions/maptopair +     --> mm
-    reduce/reducebykey/no_aggregation)                   
-  - MapMMChain (see MR, mapvalues/maptopair + reduce)    --> mmchain
-  - TSMM (see MR, mapvalues + reduce)                    --> tsmm
-  - PMM (see MR, flatmaptopair + reducebykey)            --> pmm
-  - CPMM (see MR, 2 x maptopair + join + maptopair +     --> mm
-    reduce/reducebykey) 
-  - RMM (see MR, 2 x flatmap + join + maptopair +        --> mm
-    reducebykey) 
-  - ZIPMM (partitioning-preserving 1-1 zipping mm,       --> mm
-    join + mapvalues + reduce) 
-
-
-B) COMPLEX MATRIX MULT OPERATORS
--------------------------------  
-
-A QuaternaryOp hop can be compiled into the following physical operators. Note that wsloss, wsigmoid, wdivmm have different semantics though. The main goal of these operators is to prevent the creation of dense "outer" products via selective computation over a sparse driver (sparse matrix and sparse-safe operation).
- 
-* 1) Physical Operators in CP (single node, control program)
-  - WSLoss (weighted squared loss)                       --> wsloss
-  - WSigmoid (weighted sigmoid)                          --> wsigmoid
-  - WDivMM (weighted divide matrix multiplication)       --> wdivmm
-  - WCeMM (weighted cross entropy matrix multiplication) --> wcemm
-  - WuMM (weighted unary op matrix multiplication)       --> wumm
-
-* 2) Physical Operator in MR (distributed, mapreduce)
-  - MapWSLoss (map-side weighted squared loss)           --> wsloss
-  - RedWSLoss (reduce-side weighted squared loss)        --> wsloss
-  - MapWSigmoid (map-side weighted sigmoid)              --> wsigmoid
-  - RedWSigmoid (reduce-side weighted sigmoid)           --> wsigmoid
-  - MapWDivMM (map-side weighted divide matrix mult)     --> wdivmm
-  - RedWDivMM (reduce-side weighted divide matrix mult)  --> wdivmm
-  - MapWCeMM (map-side weighted cross entr. matrix mult) --> wcemm
-  - RedWCeMM (reduce-side w. cross entr. matrix mult)    --> wcemm
-  - MapWuMM (map-side weighted unary op matrix mult)     --> wumm
-  - RedWuMM (reduce-side weighted unary op matrix mult)  --> wumm
-
-* 3) Physical Operators in SPARK (distributed, spark)
-  - MapWSLoss (see MR, mappartitions + reduce)           --> wsloss           
-  - RedWSLoss (see MR, 1/2x flatmaptopair + 1-3x join +  --> wsloss
-    maptopair + reduce)
-  - MapWSigmoid (see MR, mappartitions)                  --> wsigmoid
-  - RedWSigmoid (see MR, 1/2x flatmaptopair +            --> wsigmoid
-    1/2x join + maptopair)          
-  - MapWDivMM (see MR, mappartitions + reducebykey )     --> wdivmm
-  - RedWDivMM (see MR, 1/2x flatmaptopair + 1/2x join +  --> wdivmm 
-    maptopair + reducebykey)  
-  - MapWCeMM (see MR, mappartitions + reduce)            --> wcemm           
-  - RedWCeMM (see MR, 1/2x flatmaptopair + 1/2x join +   --> wcemm 
-    maptopair + reduce)  
-  - MapWuMM (see MR, mappartitions)                      --> wumm
-  - RedWuMM (see MR, 1/2x flatmaptopair +                --> wumm
-    1/2x join + maptopair)          
-  
-  
-C) CORE MATRIX MULT PRIMITIVES LibMatrixMult (incl related script patterns)
--------------------------------  
-* 1) mm       (general A %*% B)
-  - sequential / multi-threaded (same block ops, par over rows in A)
-  - dense-dense, dense-sparse, sparse-dense, sparse-sparse, ultra-sparse*
-  - ~20 special cases for matrix-vector, vector-vector, etc
-  
-* 2) mmchain  ((a) t(X) %*% (X %*% v), (b) t(X) %*% (w * (X %*% v)))
-  - sequential / multi-threaded (same block ops, par over rows in X)
-  - dense / sparse x 2 patterns
-
-* 3) tsmm     ((a) t(X) %*% X, (b) X %*% t(X)
-  - sequential / multi-threaded (same block ops, par over rows in R, 2x tasks)
-  - dense / sparse x 2 patterns; special cases for dot products
-
-* 4) pmm      (removeEmpty(diag(v), "rows") %*% X)
-  - sequential / multi-threaded (same block ops, par over rows in X)
-  - sparse-sparse, dense-dense, sparse-dense
-
-* 5) wsloss   ((a) sum(W*(X-U%*%t(V))^2), (b) sum((X-W*(U%*%t(V)))^2), 
-               (c) sum((X-(U%*%t(V)))^2)), (d) sum(W*(U%*%t(V)-X)^2),
-               (e) sum((W*(U%*%t(V))-X)^2), (f) sum(((U%*%t(V))-X)^2))
-  - sequential / multi-threaded (same block ops, par over rows in W/X)                 
-  - all dense, sparse-dense factors, sparse/dense-* x 3 patterns      
-  - special patterns for (a) and (d) if W is X!=0      
-
-* 6) wsigmoid ((a) W*sigmoid(Y%*%t(X))), (b) W*sigmoid(-(Y%*%t(X))), 
-               (c) W*log(sigmoid(Y%*%t(X))), (d) W*log(sigmoid(-(Y%*%t(X))))) 
-  - sequential / multi-threaded (same block ops, par over rows in W)                 
-  - all dense, sparse-dense factors, sparse/dense-* x 4 patterns                   
-
-* 7) wdivmm   ((a) t(t(U)%*%(W/(U%*%t(V)))), (b) (W/(U%*%t(V)))%*%V,
-               (c) t(t(U)%*%(W*(U%*%t(V)))), (d) (W*(U%*%t(V)))%*%V, 
-               (e) W*(U%*%t(V)), (f) t(t(U)%*%((X!=0)*(U%*%t(V)-X))),
-               (g) ((X!=0)*(U%*%t(V)-X)%*%V)
-  - sequential / multi-threaded (same block ops, par over rows in X)                 
-  - all dense, sparse-dense factors, sparse/dense-* x 7 patterns
-
-* 8) wcemm    (sum(X*log(U%*%t(V))))  
-  - sequential / multi-threaded (same block ops, par over rows in X)                 
-  - all dense, sparse-dense factors, sparse/dense-*, 1 pattern
-
-* 9) wumm     ((a) X*uop(U%*%t(V)), (b) X/uop(U%*%t(V)))
-  - any unary operator, e.g., X*exp(U%*%t(V)) or X*(U%*%t(V))^2  
-  - sequential / multi-threaded (same block ops, par over rows in X)                 
-  - all dense, sparse-dense factors, sparse/dense-*, 2 pattern
+#####################################################################
+# TITLE: An Overview of Matrix Multiplication Operators in SystemML #
+# DATE MODIFIED: 11/21/2015                                         #
+#####################################################################
+
+In the following, we give an overview of backend-specific physical matrix multiplication operators in SystemML as well as their internally used matrix multiplication block operations.
+
+A) BASIC MATRIX MULT OPERATORS 
+-------------------------------
+
+An AggBinaryOp hop can be compiled into the following physical operators.
+
+* 1) Physical Operators in CP (single node, control program)
+  - MM (basic matrix multiplication)                     --> mm
+  - MMChain (matrix multiplication chain)                --> mmchain
+  - TSMM (transpose-self matrix multiplication)          --> tsmm
+  - PMM (permutation matrix multiplication)              --> pmm
+
+* 2) Physical Operator in MR (distributed, mapreduce)
+  - MapMM (map-side matrix multiplication, w/|w/o agg)   --> mm
+  - MapMMChain (map-side matrix chain multiplication)    --> mmchain
+  - TSMM (map-side transpose-self matrix multiplication) --> tsmm
+  - PMM (map-side permutation matrix multiplication)     --> pmm
+  - CPMM (cross-product matrix multiplication, 2 jobs)   --> mm
+  - RMM (replication-based matrix multiplication, 1 job) --> mm
+
+* 3) Physical Operators in SPARK (distributed, spark)
+  - MapMM (see MR, flatmap/mappartitions/maptopair +     --> mm
+    reduce/reducebykey/no_aggregation)                   
+  - MapMMChain (see MR, mapvalues/maptopair + reduce)    --> mmchain
+  - TSMM (see MR, mapvalues + reduce)                    --> tsmm
+  - PMM (see MR, flatmaptopair + reducebykey)            --> pmm
+  - CPMM (see MR, 2 x maptopair + join + maptopair +     --> mm
+    reduce/reducebykey) 
+  - RMM (see MR, 2 x flatmap + join + maptopair +        --> mm
+    reducebykey) 
+  - ZIPMM (partitioning-preserving 1-1 zipping mm,       --> mm
+    join + mapvalues + reduce) 
+
+
+B) COMPLEX MATRIX MULT OPERATORS
+-------------------------------  
+
+A QuaternaryOp hop can be compiled into the following physical operators. Note that wsloss, wsigmoid, wdivmm have different semantics though. The main goal of these operators is to prevent the creation of dense "outer" products via selective computation over a sparse driver (sparse matrix and sparse-safe operation).
+ 
+* 1) Physical Operators in CP (single node, control program)
+  - WSLoss (weighted squared loss)                       --> wsloss
+  - WSigmoid (weighted sigmoid)                          --> wsigmoid
+  - WDivMM (weighted divide matrix multiplication)       --> wdivmm
+  - WCeMM (weighted cross entropy matrix multiplication) --> wcemm
+  - WuMM (weighted unary op matrix multiplication)       --> wumm
+
+* 2) Physical Operator in MR (distributed, mapreduce)
+  - MapWSLoss (map-side weighted squared loss)           --> wsloss
+  - RedWSLoss (reduce-side weighted squared loss)        --> wsloss
+  - MapWSigmoid (map-side weighted sigmoid)              --> wsigmoid
+  - RedWSigmoid (reduce-side weighted sigmoid)           --> wsigmoid
+  - MapWDivMM (map-side weighted divide matrix mult)     --> wdivmm
+  - RedWDivMM (reduce-side weighted divide matrix mult)  --> wdivmm
+  - MapWCeMM (map-side weighted cross entr. matrix mult) --> wcemm
+  - RedWCeMM (reduce-side w. cross entr. matrix mult)    --> wcemm
+  - MapWuMM (map-side weighted unary op matrix mult)     --> wumm
+  - RedWuMM (reduce-side weighted unary op matrix mult)  --> wumm
+
+* 3) Physical Operators in SPARK (distributed, spark)
+  - MapWSLoss (see MR, mappartitions + reduce)           --> wsloss           
+  - RedWSLoss (see MR, 1/2x flatmaptopair + 1-3x join +  --> wsloss
+    maptopair + reduce)
+  - MapWSigmoid (see MR, mappartitions)                  --> wsigmoid
+  - RedWSigmoid (see MR, 1/2x flatmaptopair +            --> wsigmoid
+    1/2x join + maptopair)          
+  - MapWDivMM (see MR, mappartitions + reducebykey )     --> wdivmm
+  - RedWDivMM (see MR, 1/2x flatmaptopair + 1/2x join +  --> wdivmm 
+    maptopair + reducebykey)  
+  - MapWCeMM (see MR, mappartitions + reduce)            --> wcemm           
+  - RedWCeMM (see MR, 1/2x flatmaptopair + 1/2x join +   --> wcemm 
+    maptopair + reduce)  
+  - MapWuMM (see MR, mappartitions)                      --> wumm
+  - RedWuMM (see MR, 1/2x flatmaptopair +                --> wumm
+    1/2x join + maptopair)          
+  
+  
+C) CORE MATRIX MULT PRIMITIVES LibMatrixMult (incl related script patterns)
+-------------------------------  
+* 1) mm       (general A %*% B)
+  - sequential / multi-threaded (same block ops, par over rows in A)
+  - dense-dense, dense-sparse, sparse-dense, sparse-sparse, ultra-sparse*
+  - ~20 special cases for matrix-vector, vector-vector, etc
+  
+* 2) mmchain  ((a) t(X) %*% (X %*% v), (b) t(X) %*% (w * (X %*% v)))
+  - sequential / multi-threaded (same block ops, par over rows in X)
+  - dense / sparse x 2 patterns
+
+* 3) tsmm     ((a) t(X) %*% X, (b) X %*% t(X)
+  - sequential / multi-threaded (same block ops, par over rows in R, 2x tasks)
+  - dense / sparse x 2 patterns; special cases for dot products
+
+* 4) pmm      (removeEmpty(diag(v), "rows") %*% X)
+  - sequential / multi-threaded (same block ops, par over rows in X)
+  - sparse-sparse, dense-dense, sparse-dense
+
+* 5) wsloss   ((a) sum(W*(X-U%*%t(V))^2), (b) sum((X-W*(U%*%t(V)))^2), 
+               (c) sum((X-(U%*%t(V)))^2)), (d) sum(W*(U%*%t(V)-X)^2),
+               (e) sum((W*(U%*%t(V))-X)^2), (f) sum(((U%*%t(V))-X)^2))
+  - sequential / multi-threaded (same block ops, par over rows in W/X)                 
+  - all dense, sparse-dense factors, sparse/dense-* x 3 patterns      
+  - special patterns for (a) and (d) if W is X!=0      
+
+* 6) wsigmoid ((a) W*sigmoid(Y%*%t(X))), (b) W*sigmoid(-(Y%*%t(X))), 
+               (c) W*log(sigmoid(Y%*%t(X))), (d) W*log(sigmoid(-(Y%*%t(X))))) 
+  - sequential / multi-threaded (same block ops, par over rows in W)                 
+  - all dense, sparse-dense factors, sparse/dense-* x 4 patterns                   
+
+* 7) wdivmm   ((a) t(t(U)%*%(W/(U%*%t(V)))), (b) (W/(U%*%t(V)))%*%V,
+               (c) t(t(U)%*%(W*(U%*%t(V)))), (d) (W*(U%*%t(V)))%*%V, 
+               (e) W*(U%*%t(V)), (f) t(t(U)%*%((X!=0)*(U%*%t(V)-X))),
+               (g) ((X!=0)*(U%*%t(V)-X)%*%V)
+  - sequential / multi-threaded (same block ops, par over rows in X)                 
+  - all dense, sparse-dense factors, sparse/dense-* x 7 patterns
+
+* 8) wcemm    (sum(X*log(U%*%t(V))))  
+  - sequential / multi-threaded (same block ops, par over rows in X)                 
+  - all dense, sparse-dense factors, sparse/dense-*, 1 pattern
+
+* 9) wumm     ((a) X*uop(U%*%t(V)), (b) X/uop(U%*%t(V)))
+  - any unary operator, e.g., X*exp(U%*%t(V)) or X*(U%*%t(V))^2  
+  - sequential / multi-threaded (same block ops, par over rows in X)                 
+  - all dense, sparse-dense factors, sparse/dense-*, 2 pattern

[2/2] incubator-systemml git commit: [SYSTEMML-480] [SYSTEMML-463] Fix Release Packaging in Prep for 0.9.0 Release.

Posted by du...@apache.org.

[SYSTEMML-480] [SYSTEMML-463] Fix Release Packaging in Prep for 0.9.0 Release.

This fix addresses additional issues with our release packaging that blocked our 0.9.0 release candidate.  Changes include cleaning up files, adding missing files, updating the naming from 'system-ml-*' to 'systemml-*', and fixing broken dependencies.  Additionally, this adds experimental support for a standalone JAR that we can use in the future.

Closes #54.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/3ce15871
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/3ce15871
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/3ce15871

Branch: refs/heads/gh-pages
Commit: 3ce15871bce316c00cf6cee5ef02671a51a2b5cd
Parents: 4ce58eb
Author: Mike Dusenberry <mw...@us.ibm.com>
Authored: Mon Jan 25 13:25:43 2016 -0800
Committer: Mike Dusenberry <mw...@us.ibm.com>
Committed: Mon Jan 25 13:28:37 2016 -0800

----------------------------------------------------------------------
 Language Reference/README.txt               | 87 ------------------------
 Language Reference/README_HADOOP_CONFIG.txt | 83 ++++++++++++++++++++++
 2 files changed, 83 insertions(+), 87 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/3ce15871/Language Reference/README.txt
----------------------------------------------------------------------
diff --git a/Language Reference/README.txt b/Language Reference/README.txt
deleted file mode 100644
index 0f22aa6..0000000
--- a/Language Reference/README.txt	
+++ /dev/null
@@ -1,87 +0,0 @@
-Usage
------
-The machine learning algorithms described in 
-$BIGINSIGHTS_HOME/machine-learning/docs/SystemML_Algorithms_Reference.pdf can be invoked
-from the hadoop command line using the described, algorithm-specific parameters. 
-
-Generic command line arguments arguments are provided by the help command below.
-
-   hadoop jar SystemML.jar -? or -help 
-
-
-Recommended configurations
---------------------------
-1) JVM Heap Sizes: 
-We recommend an equal-sized JVM configuration for clients, mappers, and reducers. For the client
-process this can be done via 
-
-   export HADOOP_CLIENT_OPTS="-Xmx2048m -Xms2048m -Xmn256m" 
-   
-where Xmx specifies the maximum heap size, Xms the initial heap size, and Xmn is size of the young 
-generation. For Xmn values of equal or less than 15% of the max heap size, we guarantee the memory budget.
-
-The above option may also be set through BigR setting the "ml.jvm" option, e.g.
-   bigr.set.server.option("jaql.fence.jvm.parameters", "-Xmx2g -Xms2g -Xmn256m")
-
-For mapper or reducer JVM configurations, the following properties can be specified in mapred-site.xml, 
-where 'child' refers to both mapper and reducer. If map and reduce are specified individually, they take 
-precedence over the generic property.
-
-  <property>
-    <name>mapreduce.child.java.opts</name> <!-- synonym: mapred.child.java.opts -->
-    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
-  </property>
-  <property>
-    <name>mapreduce.map.java.opts</name> <!-- synonym: mapred.map.java.opts -->
-    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
-  </property>
-  <property>
-    <name>mapreduce.reduce.java.opts</name> <!-- synonym: mapred.reduce.java.opts -->
-    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
-  </property>
- 
-
-2) CP Memory Limitation:
-There exist size limitations for in-memory matrices. Dense in-memory matrices are limited to 16GB 
-independent of their dimension. Sparse in-memory matrices are limited to 2G rows and 2G columns 
-but the overall matrix can be larger. These limitations do only apply to in-memory matrices but 
-NOT in HDFS or involved in MR computations. Setting HADOOP_CLIENT_OPTS below those limitations 
-prevents runtime errors.
-
-3) Transparent Huge Pages (on Red Hat Enterprise Linux 6):
-Hadoop workloads might show very high System CPU utilization if THP is enabled. In case of such 
-behavior, we recommend to disable THP with
-   
-   echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
-   
-4) JVM Reuse:
-Performance benefits from JVM reuse because data sets that fit into the mapper memory budget are 
-reused across tasks per slot. However, Hadoop 1.0.3 JVM Reuse is incompatible with security (when 
-using the LinuxTaskController). The workaround is to use the DefaultTaskController. SystemML provides 
-a configuration property in $BIGINSIGHTS_HOME/machine-learning/SystemML-config.xml to enable JVM reuse 
-on a per job level without changing the global cluster configuration. 
-   
-   <jvmreuse>false</jvmreuse> 
-   
-5) Number of Reducers:
-The number of reducers can have significant impact on performance. SystemML provides a configuration
-property to set the default number of reducers per job without changing the global cluster configuration.
-In general, we recommend a setting of twice the number of nodes. Smaller numbers create less intermediate
-files, larger numbers increase the degree of parallelism for compute and parallel write. In 
-$BIGINSIGHTS_HOME/machine-learning/SystemML-config.xml, set:
-   
-   <!-- default number of reduce tasks per MR job, default: 2 x number of nodes -->
-   <numreducers>12</numreducers> 
-
-6) SystemML temporary directories:
-SystemML uses temporary directories in two different locations: (1) on local file system for temping from 
-the client process, and (2) on HDFS for intermediate results between different MR jobs and between MR jobs 
-and in-memory operations. Locations of these directories can be configured in 
-$BIGINSIGHTS_HOME/machine-learning/SystemML-config.xml with the following properties
-
-   <!-- local fs tmp working directory-->
-   <localtmpdir>/tmp/systemml</localtmpdir>
-
-   <!-- hdfs tmp working directory--> 
-   <scratch>scratch_space</scratch> 
- 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/3ce15871/Language Reference/README_HADOOP_CONFIG.txt
----------------------------------------------------------------------
diff --git a/Language Reference/README_HADOOP_CONFIG.txt b/Language Reference/README_HADOOP_CONFIG.txt
new file mode 100644
index 0000000..e34d4f3
--- /dev/null
+++ b/Language Reference/README_HADOOP_CONFIG.txt	
@@ -0,0 +1,83 @@
+Usage
+-----
+The machine learning algorithms described in SystemML_Algorithms_Reference.pdf can be invoked
+from the hadoop command line using the described, algorithm-specific parameters. 
+
+Generic command line arguments arguments are provided by the help command below.
+
+   hadoop jar SystemML.jar -? or -help 
+
+
+Recommended configurations
+--------------------------
+1) JVM Heap Sizes: 
+We recommend an equal-sized JVM configuration for clients, mappers, and reducers. For the client
+process this can be done via
+
+   export HADOOP_CLIENT_OPTS="-Xmx2048m -Xms2048m -Xmn256m" 
+   
+where Xmx specifies the maximum heap size, Xms the initial heap size, and Xmn is size of the young 
+generation. For Xmn values of equal or less than 15% of the max heap size, we guarantee the memory budget.
+
+For mapper or reducer JVM configurations, the following properties can be specified in mapred-site.xml,
+where 'child' refers to both mapper and reducer. If map and reduce are specified individually, they take 
+precedence over the generic property.
+
+  <property>
+    <name>mapreduce.child.java.opts</name> <!-- synonym: mapred.child.java.opts -->
+    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
+  </property>
+  <property>
+    <name>mapreduce.map.java.opts</name> <!-- synonym: mapred.map.java.opts -->
+    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
+  </property>
+  <property>
+    <name>mapreduce.reduce.java.opts</name> <!-- synonym: mapred.reduce.java.opts -->
+    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
+  </property>
+ 
+
+2) CP Memory Limitation:
+There exist size limitations for in-memory matrices. Dense in-memory matrices are limited to 16GB 
+independent of their dimension. Sparse in-memory matrices are limited to 2G rows and 2G columns 
+but the overall matrix can be larger. These limitations do only apply to in-memory matrices but 
+NOT in HDFS or involved in MR computations. Setting HADOOP_CLIENT_OPTS below those limitations 
+prevents runtime errors.
+
+3) Transparent Huge Pages (on Red Hat Enterprise Linux 6):
+Hadoop workloads might show very high System CPU utilization if THP is enabled. In case of such 
+behavior, we recommend to disable THP with
+   
+   echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
+   
+4) JVM Reuse:
+Performance benefits from JVM reuse because data sets that fit into the mapper memory budget are 
+reused across tasks per slot. However, Hadoop 1.0.3 JVM Reuse is incompatible with security (when 
+using the LinuxTaskController). The workaround is to use the DefaultTaskController. SystemML provides 
+a configuration property in SystemML-config.xml to enable JVM reuse on a per job level without
+changing the global cluster configuration.
+   
+   <jvmreuse>false</jvmreuse> 
+   
+5) Number of Reducers:
+The number of reducers can have significant impact on performance. SystemML provides a configuration
+property to set the default number of reducers per job without changing the global cluster configuration.
+In general, we recommend a setting of twice the number of nodes. Smaller numbers create less intermediate
+files, larger numbers increase the degree of parallelism for compute and parallel write. In
+SystemML-config.xml, set:
+   
+   <!-- default number of reduce tasks per MR job, default: 2 x number of nodes -->
+   <numreducers>12</numreducers> 
+
+6) SystemML temporary directories:
+SystemML uses temporary directories in two different locations: (1) on local file system for temping from 
+the client process, and (2) on HDFS for intermediate results between different MR jobs and between MR jobs 
+and in-memory operations. Locations of these directories can be configured in SystemML-config.xml with the
+following properties:
+
+   <!-- local fs tmp working directory-->
+   <localtmpdir>/tmp/systemml</localtmpdir>
+
+   <!-- hdfs tmp working directory--> 
+   <scratch>scratch_space</scratch> 
+ 
\ No newline at end of file