You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by mb...@apache.org on 2018/08/05 21:58:05 UTC

[2/4] systemml git commit: [MINOR] Profile memory use in JMLC execution

[MINOR] Profile memory use in JMLC execution

This PR adds utilities to profile memory use during execution in JMLC. Specifically, the following changes were made:

1. Added options setStatistics() and gatherMemStats() to api.jmlc.Connection which control whether or not statistics should be gathered, and if so, whether memory use should be profiled. Also added an appropriate method to api.jmlc.PreparedScript to display the resulting statistics. The following points are only applicable when running in JMLC mode, and memory statistics have been enabled. Both these options are false by default.
2. Modified utils.Statistics to track the memory used by distinct CacheBlock objects. At the conclusion of the script, the maximum memory use is reported. Memory use is computed by calling the object's getInMemorySize() method. This will generally be a slight over-estimate of the actual memory used by the object.
3. If FINEGRAINED_STATISTICS are enabled, Statistics will also track the memory use by each named variable in a DML script and report this in a table as in heavy hitter instructions. The goal of this is to detect unexpected large intermediate matrices (e.g. resulting from an outer product X %*% t(X)).
4. If FINEGRAINED_STATISTICS are enabled, Statistics will attempt to measure more accurate memory use by checking to see if an object has been garbage collected. This is done by maintaining a soft reference to the object and periodically checking to see if it has become null. This is enabled only when using fine-grained statistics since it introduces potentially non-trivial overheads by scanning a list of live objects. Note that simply using rmvar to remove a live variable results in a substantial underestimate of memory used by the program and so this method is not used. When finegrained statistics are not enabled, the resulting statistics will be an overestimate.

Potential impacts to performance: when finegrained statistics are enabled there will be some performance degradation from maintaining the set of live variables.

Potential Improvements: Related to the above, it would be nice to find a way of accurately tracking when an object is actually released without resorting to checking whether a soft reference has become null. It might also be nice to include a line number indicating where a "heavy hitting object" was created to make debugging easier.

Closes #794.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/a2d3a721
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/a2d3a721
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/a2d3a721

Branch: refs/heads/gh-pages
Commit: a2d3a721d05851ed03cb3fa5320075d7872b7ed0
Parents: af4cf76
Author: Anthony Thomas <ah...@eng.ucsd.edu>
Authored: Fri Jul 6 11:10:17 2018 -0700
Committer: Niketan Pansare <np...@us.ibm.com>
Committed: Fri Jul 6 11:23:36 2018 -0700

----------------------------------------------------------------------
 jmlc.md | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/a2d3a721/jmlc.md
----------------------------------------------------------------------
diff --git a/jmlc.md b/jmlc.md
index 2183700..a703d01 100644
--- a/jmlc.md
+++ b/jmlc.md
@@ -49,6 +49,18 @@ of SystemML's distributed modes, such as Spark batch mode or Hadoop batch mode,
 distributed computing capabilities. JMLC offers embeddability at the cost of performance, so its use is
 dependent on the nature of the business use case being addressed.
 
+## Statistics
+
+JMLC can be configured to gather runtime statistics, as in the MLContext API, by calling Connection's `setStatistics()`
+method with a value of `true`. JMLC can also be configured to gather statistics on the memory used by matrices and
+frames in the DML script. To enable collection of memory statistics, call Connection's `gatherMemStats()` method
+with a value of `true`. When finegrained statistics are enabled in `SystemML.conf`, JMLC will also report the variables
+in the DML script which used the most memory. By default, the memory use reported will be an overestimte of the actual
+memory required to run the program. When finegrained statistics are enabled, JMLC will gather more accurate statistics
+by keeping track of garbage collection events and reducing the memory estimate accordingly. The most accurate way to
+determine the memory required by a script is to run the script in a single thread and enable finegrained statistics.
+
+An example showing how to enable statistics in JMLC is presented in the section below.
 
 ---
 
@@ -114,11 +126,19 @@ the resulting `"predicted_y"` matrix. We repeat this process. When done, we clos
  
         // obtain connection to SystemML
         Connection conn = new Connection();
+
+        // turn on gathering of runtime statistics and memory use
+        conn.setStatistics(true);
+        conn.gatherMemStats(true);
  
         // read in and precompile DML script, registering inputs and outputs
         String dml = conn.readScript("scoring-example.dml");
         PreparedScript script = conn.prepareScript(dml, new String[] { "W", "X" }, new String[] { "predicted_y" }, false);
- 
+
+        // obtain the runtime plan generated by SystemML
+        String plan = script.explain();
+        System.out.println(plan);
+
         double[][] mtx = matrix(4, 3, new double[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 });
         double[][] result = null;
  
@@ -127,6 +147,10 @@ the resulting `"predicted_y"` matrix. We repeat this process. When done, we clos
         script.setMatrix("X", randomMatrix(3, 3, -1, 1, 0.7));
         result = script.executeScript().getMatrix("predicted_y");
         displayMatrix(result);
+
+        // print the resulting runtime statistics
+        String stats = script.statistics();
+        System.out.println(stats);
  
         script.setMatrix("W", mtx);
         script.setMatrix("X", randomMatrix(3, 3, -1, 1, 0.7));