You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by du...@apache.org on 2016/01/26 02:24:18 UTC

[2/2] incubator-systemml git commit: [SYSTEMML-480] [SYSTEMML-463] Fix Release Packaging in Prep for 0.9.0 Release.

[SYSTEMML-480] [SYSTEMML-463] Fix Release Packaging in Prep for 0.9.0 Release.

This fix addresses additional issues with our release packaging that blocked our 0.9.0 release candidate.  Changes include cleaning up files, adding missing files, updating the naming from 'system-ml-*' to 'systemml-*', and fixing broken dependencies.  Additionally, this adds experimental support for a standalone JAR that we can use in the future.

Closes #54.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/3ce15871
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/3ce15871
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/3ce15871

Branch: refs/heads/gh-pages
Commit: 3ce15871bce316c00cf6cee5ef02671a51a2b5cd
Parents: 4ce58eb
Author: Mike Dusenberry <mw...@us.ibm.com>
Authored: Mon Jan 25 13:25:43 2016 -0800
Committer: Mike Dusenberry <mw...@us.ibm.com>
Committed: Mon Jan 25 13:28:37 2016 -0800

----------------------------------------------------------------------
 Language Reference/README.txt               | 87 ------------------------
 Language Reference/README_HADOOP_CONFIG.txt | 83 ++++++++++++++++++++++
 2 files changed, 83 insertions(+), 87 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/3ce15871/Language Reference/README.txt
----------------------------------------------------------------------
diff --git a/Language Reference/README.txt b/Language Reference/README.txt
deleted file mode 100644
index 0f22aa6..0000000
--- a/Language Reference/README.txt	
+++ /dev/null
@@ -1,87 +0,0 @@
-Usage
------
-The machine learning algorithms described in 
-$BIGINSIGHTS_HOME/machine-learning/docs/SystemML_Algorithms_Reference.pdf can be invoked
-from the hadoop command line using the described, algorithm-specific parameters. 
-
-Generic command line arguments arguments are provided by the help command below.
-
-   hadoop jar SystemML.jar -? or -help 
-
-
-Recommended configurations
---------------------------
-1) JVM Heap Sizes: 
-We recommend an equal-sized JVM configuration for clients, mappers, and reducers. For the client
-process this can be done via 
-
-   export HADOOP_CLIENT_OPTS="-Xmx2048m -Xms2048m -Xmn256m" 
-   
-where Xmx specifies the maximum heap size, Xms the initial heap size, and Xmn is size of the young 
-generation. For Xmn values of equal or less than 15% of the max heap size, we guarantee the memory budget.
-
-The above option may also be set through BigR setting the "ml.jvm" option, e.g.
-   bigr.set.server.option("jaql.fence.jvm.parameters", "-Xmx2g -Xms2g -Xmn256m")
-
-For mapper or reducer JVM configurations, the following properties can be specified in mapred-site.xml, 
-where 'child' refers to both mapper and reducer. If map and reduce are specified individually, they take 
-precedence over the generic property.
-
-  <property>
-    <name>mapreduce.child.java.opts</name> <!-- synonym: mapred.child.java.opts -->
-    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
-  </property>
-  <property>
-    <name>mapreduce.map.java.opts</name> <!-- synonym: mapred.map.java.opts -->
-    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
-  </property>
-  <property>
-    <name>mapreduce.reduce.java.opts</name> <!-- synonym: mapred.reduce.java.opts -->
-    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
-  </property>
- 
-
-2) CP Memory Limitation:
-There exist size limitations for in-memory matrices. Dense in-memory matrices are limited to 16GB 
-independent of their dimension. Sparse in-memory matrices are limited to 2G rows and 2G columns 
-but the overall matrix can be larger. These limitations do only apply to in-memory matrices but 
-NOT in HDFS or involved in MR computations. Setting HADOOP_CLIENT_OPTS below those limitations 
-prevents runtime errors.
-
-3) Transparent Huge Pages (on Red Hat Enterprise Linux 6):
-Hadoop workloads might show very high System CPU utilization if THP is enabled. In case of such 
-behavior, we recommend to disable THP with
-   
-   echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
-   
-4) JVM Reuse:
-Performance benefits from JVM reuse because data sets that fit into the mapper memory budget are 
-reused across tasks per slot. However, Hadoop 1.0.3 JVM Reuse is incompatible with security (when 
-using the LinuxTaskController). The workaround is to use the DefaultTaskController. SystemML provides 
-a configuration property in $BIGINSIGHTS_HOME/machine-learning/SystemML-config.xml to enable JVM reuse 
-on a per job level without changing the global cluster configuration. 
-   
-   <jvmreuse>false</jvmreuse> 
-   
-5) Number of Reducers:
-The number of reducers can have significant impact on performance. SystemML provides a configuration
-property to set the default number of reducers per job without changing the global cluster configuration.
-In general, we recommend a setting of twice the number of nodes. Smaller numbers create less intermediate
-files, larger numbers increase the degree of parallelism for compute and parallel write. In 
-$BIGINSIGHTS_HOME/machine-learning/SystemML-config.xml, set:
-   
-   <!-- default number of reduce tasks per MR job, default: 2 x number of nodes -->
-   <numreducers>12</numreducers> 
-
-6) SystemML temporary directories:
-SystemML uses temporary directories in two different locations: (1) on local file system for temping from 
-the client process, and (2) on HDFS for intermediate results between different MR jobs and between MR jobs 
-and in-memory operations. Locations of these directories can be configured in 
-$BIGINSIGHTS_HOME/machine-learning/SystemML-config.xml with the following properties
-
-   <!-- local fs tmp working directory-->
-   <localtmpdir>/tmp/systemml</localtmpdir>
-
-   <!-- hdfs tmp working directory--> 
-   <scratch>scratch_space</scratch> 
- 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/3ce15871/Language Reference/README_HADOOP_CONFIG.txt
----------------------------------------------------------------------
diff --git a/Language Reference/README_HADOOP_CONFIG.txt b/Language Reference/README_HADOOP_CONFIG.txt
new file mode 100644
index 0000000..e34d4f3
--- /dev/null
+++ b/Language Reference/README_HADOOP_CONFIG.txt	
@@ -0,0 +1,83 @@
+Usage
+-----
+The machine learning algorithms described in SystemML_Algorithms_Reference.pdf can be invoked
+from the hadoop command line using the described, algorithm-specific parameters. 
+
+Generic command line arguments arguments are provided by the help command below.
+
+   hadoop jar SystemML.jar -? or -help 
+
+
+Recommended configurations
+--------------------------
+1) JVM Heap Sizes: 
+We recommend an equal-sized JVM configuration for clients, mappers, and reducers. For the client
+process this can be done via
+
+   export HADOOP_CLIENT_OPTS="-Xmx2048m -Xms2048m -Xmn256m" 
+   
+where Xmx specifies the maximum heap size, Xms the initial heap size, and Xmn is size of the young 
+generation. For Xmn values of equal or less than 15% of the max heap size, we guarantee the memory budget.
+
+For mapper or reducer JVM configurations, the following properties can be specified in mapred-site.xml,
+where 'child' refers to both mapper and reducer. If map and reduce are specified individually, they take 
+precedence over the generic property.
+
+  <property>
+    <name>mapreduce.child.java.opts</name> <!-- synonym: mapred.child.java.opts -->
+    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
+  </property>
+  <property>
+    <name>mapreduce.map.java.opts</name> <!-- synonym: mapred.map.java.opts -->
+    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
+  </property>
+  <property>
+    <name>mapreduce.reduce.java.opts</name> <!-- synonym: mapred.reduce.java.opts -->
+    <value>-Xmx2048m -Xms2048m -Xmn256m</value>
+  </property>
+ 
+
+2) CP Memory Limitation:
+There exist size limitations for in-memory matrices. Dense in-memory matrices are limited to 16GB 
+independent of their dimension. Sparse in-memory matrices are limited to 2G rows and 2G columns 
+but the overall matrix can be larger. These limitations do only apply to in-memory matrices but 
+NOT in HDFS or involved in MR computations. Setting HADOOP_CLIENT_OPTS below those limitations 
+prevents runtime errors.
+
+3) Transparent Huge Pages (on Red Hat Enterprise Linux 6):
+Hadoop workloads might show very high System CPU utilization if THP is enabled. In case of such 
+behavior, we recommend to disable THP with
+   
+   echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
+   
+4) JVM Reuse:
+Performance benefits from JVM reuse because data sets that fit into the mapper memory budget are 
+reused across tasks per slot. However, Hadoop 1.0.3 JVM Reuse is incompatible with security (when 
+using the LinuxTaskController). The workaround is to use the DefaultTaskController. SystemML provides 
+a configuration property in SystemML-config.xml to enable JVM reuse on a per job level without
+changing the global cluster configuration.
+   
+   <jvmreuse>false</jvmreuse> 
+   
+5) Number of Reducers:
+The number of reducers can have significant impact on performance. SystemML provides a configuration
+property to set the default number of reducers per job without changing the global cluster configuration.
+In general, we recommend a setting of twice the number of nodes. Smaller numbers create less intermediate
+files, larger numbers increase the degree of parallelism for compute and parallel write. In
+SystemML-config.xml, set:
+   
+   <!-- default number of reduce tasks per MR job, default: 2 x number of nodes -->
+   <numreducers>12</numreducers> 
+
+6) SystemML temporary directories:
+SystemML uses temporary directories in two different locations: (1) on local file system for temping from 
+the client process, and (2) on HDFS for intermediate results between different MR jobs and between MR jobs 
+and in-memory operations. Locations of these directories can be configured in SystemML-config.xml with the
+following properties:
+
+   <!-- local fs tmp working directory-->
+   <localtmpdir>/tmp/systemml</localtmpdir>
+
+   <!-- hdfs tmp working directory--> 
+   <scratch>scratch_space</scratch> 
+ 
\ No newline at end of file