You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by de...@apache.org on 2017/08/03 21:09:48 UTC
[2/2] systemml git commit: [MINOR] Link to Perf Testing from Release Process

[MINOR] Link to Perf Testing from Release Process

Add link to Release Process that points to Perf Testing doc.
Add title and license to Perf Testing doc.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/bf0245c6
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/bf0245c6
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/bf0245c6

Branch: refs/heads/gh-pages
Commit: bf0245c698734aed572ff91ac644312a9cb77bdd
Parents: 83b9a22
Author: Deron Eriksson <de...@apache.org>
Authored: Thu Aug 3 14:04:30 2017 -0700
Committer: Deron Eriksson <de...@apache.org>
Committed: Thu Aug 3 14:04:30 2017 -0700

----------------------------------------------------------------------
 python-performance-test.md | 63 ++++++++++++++++++++++++++++++++++-------
 release-process.md         |  5 +++-
 2 files changed, 56 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/bf0245c6/python-performance-test.md
----------------------------------------------------------------------
diff --git a/python-performance-test.md b/python-performance-test.md
index 02d3e34..3d29f01 100644
--- a/python-performance-test.md
+++ b/python-performance-test.md
@@ -1,15 +1,46 @@
+---
+layout: global
+title: SystemML Performance Testing
+description: Description of SystemML performance testing.
+displayTitle: SystemML Performance Testing
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+
 # Performance Testing Algorithms User Manual
 
 This user manual contains details on how to conduct automated performance tests. Work was mostly done in this [PR](https://github.com/apache/systemml/pull/537) and part of [SYSTEMML-1451](https://issues.apache.org/jira/browse/SYSTEMML-1451). Our aim was to move from existing `bash` based performance tests to automatic `python` based automatic performance tests.
 
-### Architecture
-Our performance tests suit contains `7` families namely `binomial`, `multinomial`, `stats1`, `stats2`, `regression1`, `regression2`, `clustering`. Within these families we have algorithms grouped under it. Typically a family is a set of algorithms that require the same data generation script. 
+
+## Architecture
+
+Our performance tests suit contains `7` families namely `binomial`, `multinomial`, `stats1`, `stats2`, `regression1`, `regression2`, `clustering`. Within these families we have algorithms grouped under it. Typically a family is a set of algorithms that require the same data generation script.
 
 - Exceptions: `regression1`, `regression2` and `binomial`. We decide to include these algorithms in separate families to keep the architecture simple.
 
 ![System ML Architecture](img/performance-test/perf_test_arch.png)
 
-On a very high level use construct a string with arguments required to run each operation. Once this string is constructed we use the subprocess module to execute this string and extract time from the standard out. 
+On a very high level use construct a string with arguments required to run each operation. Once this string is constructed we use the subprocess module to execute this string and extract time from the standard out.
 
 We also use `json` module write our configurations to a json file. This ensure that our operations are easy to debug.
 
@@ -32,8 +63,10 @@ In `train.py` script we have functions required to generate training output. We
 The file `predict.py` contains all functions for all algorithms in the performance test that contain predict script. We return the required configuration packet as a result of this script, that contains key as the algorithm to run and values with location to read predict json files from.
 
 In the file(s) `utils_*.py` we have all the helper functions required in our performance test. These functions do operations like write `json` files, extract time from std out etc.
- 
-### Adding New Algorithms
+
+
+## Adding New Algorithms
+
 While adding a new algorithm we need know if it has to be part of the any pre existing family. If this algorithm depends on a new data generation script we would need to create a new family. Steps below to take below to add a new algorithm.
 
 Following changes to `run_perftest.py`:
@@ -72,7 +105,9 @@ Following changes to `predict.py`:
 - Check for possible errors if these folders/files do not exist. (Please see the troubleshooting section).
 - Note: `predict.py` will not be executed if the current algorithm being executed does not have predict script.
 
-### Current Default Settings
+
+## Current Default Settings
+
 Default setting for our performance test below:
 
 - Matrix size to 10,000 rows and 100 columns.
@@ -80,7 +115,9 @@ Default setting for our performance test below:
 - Operation modes `data-gen`, `train` and `predict` in sequence.
 - Matrix type set to `all`. Which will generate `dense`, `sparse` matrices for all relevant algorithms.
 
-### Examples
+
+## Examples
+
 Some examples of SystemML performance test with arguments shown below:
 
 `./scripts/perftest/python/run_perftest.py --family binomial clustering multinomial regression1 regression2 stats1 stats2
@@ -110,7 +147,9 @@ Run performance test for all algorithms under the family `regression2` and log w
 `./scripts/perftest/python/run_perftest.py --family binomial clustering multinomial regression1 regression2 stats1 stats2 --config-dir /Users/krishna/open-source/systemml/scripts/perftest/temp3 --temp-dir hdfs://localhost:9000/temp3`
 Run performance test for all algorithms using HDFS.
 
-### Operational Notes
+
+## Operational Notes
+
 All performance test depend mainly on two scripts for execution `systemml-standalone.py` and `systemml-spark-submit.py`. Incase we need to change standalone or spark parameters we need to manually change these parameters in their respective scripts.
 
 Constants like `DATA_FORMAT` currently set to `csv` and `MATRIX_TYPE_DICT` with `density` set to `0.9` and `sparsity` set to `0.01` are hardcoded in the performance test scripts. They can be changed easily as they are defined at the top of their respective operational scripts.
@@ -118,7 +157,7 @@ Constants like `DATA_FORMAT` currently set to `csv` and `MATRIX_TYPE_DICT` with
 The logs contain the following information below comma separated.
 
 algorithm | run_type | intercept | matrix_type | data_shape | time_sec
---- | --- | --- | --- | --- | --- | 
+--- | --- | --- | --- | --- | --- |
 multinomial|data-gen|0|dense|10k_100| 0.33
 MultiLogReg|train|0|10k_100|dense|6.956
 MultiLogReg|predict|0|10k_100|dense|4.780
@@ -140,8 +179,10 @@ Currently we only support time difference between algorithms in different versio
 
 Note: Please pip install `https://github.com/burnash/gspread` to use google docs client.
 
-### Troubleshooting
-We can debug the performance test by making changes in the following locations based on 
+
+## Troubleshooting
+
+We can debug the performance test by making changes in the following locations based on
 
 - Please see `utils_exec.py` function `subprocess_exec`.
 - Please see `run_perftest.py`. Changing the verbosity level to `0` allows us to log more information while the script runs.

http://git-wip-us.apache.org/repos/asf/systemml/blob/bf0245c6/release-process.md
----------------------------------------------------------------------
diff --git a/release-process.md b/release-process.md
index 987ab30..4a31f8b 100644
--- a/release-process.md
+++ b/release-process.md
@@ -366,9 +366,12 @@ For examples, see the [Spark MLContext Programming Guide](http://apache.github.i
 
 <a href="#release-candidate-checklist">Up to Checklist</a>
 
-Verify that the performance suite located at scripts/perftest/ executes on Spark and Hadoop. Testing should
+Verify that the performance suite executes on Spark and Hadoop. Testing should
 include 80MB, 800MB, 8GB, and 80GB data sizes.
 
+For more information, please see [SystemML Performance Testing](python-performance-test.html).
+
+
 # Run NN Unit Tests for GPU
 
 <a href="#release-candidate-checklist">Up to Checklist</a>