You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by de...@apache.org on 2017/08/28 17:37:08 UTC

[1/4] systemml git commit: [SYSTEMML-1828, 1832] New rewrite for merging statement block sequences

Repository: systemml
Updated Branches:
  refs/heads/gh-pages bf0245c69 -> 4e22b91ea


[SYSTEMML-1828,1832] New rewrite for merging statement block sequences

This patch introduces a new statement block rewrite for merging DAGs of
subsequent last-level statement blocks. After constant folding and the
removal of unnecessary branches, we often end up with such sequences of
statement blocks. Since many rewrites and operator fusion work on the
granularity of individual DAGs, these unnecessary DAG cuts cause missed
optimization opportunities, especially in the context of operator fusion
(i.e., codegen). We now merge such sequences in awareness of rewrites
that explicitly split DAGs (to create recompilation points).

Apart from the new merge rewrite, this patch also fixes the IPA rewrite
pass that applies static rewrites per IPA round. The repeated
application of the statement block rewrite for injecting spark
checkpoints for variables used read-only in loops introduced redundant
statement blocks and checkpoints. The IPA rewrite pass now explicitly
excludes this rewrite.

Additionally, this patch also modifies the related tests to use
'while(FALSE){}' instead of 'if(1==1){}' as a DAG cut, and fixes some
minor compilation issues that showed up due to the increases
optimization scope.

Overall, there are many scripts and patterns that benefit from these
changes. For example, on 1 epoch of lenet w/ codegen, this patch
improved end-to-end performance from 328s to 297s due to increased
fusion opportunities and fewer compiled spark instructions (70 vs 82).


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/16f72cff
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/16f72cff
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/16f72cff

Branch: refs/heads/gh-pages
Commit: 16f72cffb8d2a776bcde78d90df43622ee19d2eb
Parents: bf0245c
Author: Matthias Boehm <mb...@gmail.com>
Authored: Mon Aug 7 22:21:32 2017 -0700
Committer: Matthias Boehm <mb...@gmail.com>
Committed: Wed Aug 9 13:52:51 2017 -0700

----------------------------------------------------------------------
 dml-language-reference.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/16f72cff/dml-language-reference.md
----------------------------------------------------------------------
diff --git a/dml-language-reference.md b/dml-language-reference.md
index d5e200d..bd66a42 100644
--- a/dml-language-reference.md
+++ b/dml-language-reference.md
@@ -1782,7 +1782,7 @@ The following DML utilizes the `transformencode()` function.
     jspec = read("/user/ml/homes.tfspec_recode2.json", data_type="scalar", value_type="string");
     [X, M] = transformencode(target=F1, spec=jspec);
     print(toString(X));
-    if(1==1){}
+    while(FALSE){}
     print(toString(M));
 
 The transformed matrix X and output M are as follows.


[4/4] systemml git commit: [MINOR][DOC] Perf Test Google sheets API

Posted by de...@apache.org.
[MINOR][DOC] Perf Test Google sheets API

Instructions on how to configure the google client API for performance tests.

Closes #642.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/4e22b91e
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/4e22b91e
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/4e22b91e

Branch: refs/heads/gh-pages
Commit: 4e22b91ea6d486998fc38748d3178dd1b8f739da
Parents: fdc2be2
Author: krishnakalyan3 <kr...@gmail.com>
Authored: Mon Aug 28 10:32:55 2017 -0700
Committer: Deron Eriksson <de...@apache.org>
Committed: Mon Aug 28 10:32:55 2017 -0700

----------------------------------------------------------------------
 python-performance-test.md | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/4e22b91e/python-performance-test.md
----------------------------------------------------------------------
diff --git a/python-performance-test.md b/python-performance-test.md
index 25e1f35..b47b7c9 100644
--- a/python-performance-test.md
+++ b/python-performance-test.md
@@ -148,6 +148,27 @@ Run performance test for all algorithms under the family `regression2` and log w
 Run performance test for all algorithms using HDFS.
 
 
+## Google sheets API
+
+Steps below to configure google client API:
+
+- Navigate to [Google APIs Console](https://console.developers.google.com/apis/).
+- Create a new project.
+- Click Enable API. Search for and enable the Google Drive API.
+- Create credentials for a Web Server to access Application Data.
+- Name the service account and grant it a Project Role of Editor.
+- Download the JSON file.
+- Copy the JSON file to your code directory and rename it to client_secret.json
+
+Steps below to configure google sheets:
+
+- Create a new spread sheet with google sheets.
+- Create seperate sheets for `singlenode` and `hybrid_spark`.
+- Find the  client_email inside client_secret.json and save it.
+- Back in your spreadsheet, click the Share button in the top right, and paste the client email into the People field to give it edit rights for each sheet.
+- Click Send
+
+
 ## Result Consolidation and Plotting
 We have two scripts, `stats.py` forpulling results from google docs and `update.py` to updating results to google docs or local file system.
 
@@ -159,6 +180,7 @@ Example of `stats.py` below
 `  ./stats.py --auth ../key/client_json.json --exec-type singlenode --plot stats1_data-gen_none_dense_10k_100`
 `--plot` argument needs the name of the composite key that you would like to compare results over. If this argument is not specified the results would be grouped by keys.
 
+
 ## Operational Notes
 
 All performance test depend mainly on two scripts for execution `systemml-standalone.py` and `systemml-spark-submit.py`. Incase we need to change standalone or spark parameters we need to manually change these parameters in their respective scripts.
@@ -198,7 +220,6 @@ Matrix Shape | Approximate Data Size
 10M_1k|80GB
 100M_1k|800GB
 
-
 For example the command below runs performance test for all data sizes described above
 `run_perftest.py --family binomial clustering multinomial regression1 regression2 stats1 stats2 --mat-shape 10k_1k 100k_1k 1M_1k 10M_1k 100M_1k --master yarn-client  --temp-dir hdfs://localhost:9000/user/systemml`
 


[3/4] systemml git commit: [SYSTEMML-1451][Phase3] phase 3 work

Posted by de...@apache.org.
[SYSTEMML-1451][Phase3] phase 3 work

- Offline CSV support
- Family bug fix
- Plots
- Doc Update
- Stats update
- Bug train, predict append family name

Closes #604


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/fdc2be22
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/fdc2be22
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/fdc2be22

Branch: refs/heads/gh-pages
Commit: fdc2be22b66151f798a6e2b5be439bd616d24494
Parents: 07bd40a
Author: krishnakalyan3 <kr...@gmail.com>
Authored: Sat Aug 26 11:52:59 2017 -0700
Committer: Nakul Jindal <na...@gmail.com>
Committed: Sat Aug 26 11:52:59 2017 -0700

----------------------------------------------------------------------
 python-performance-test.md | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/fdc2be22/python-performance-test.md
----------------------------------------------------------------------
diff --git a/python-performance-test.md b/python-performance-test.md
index ce36c2d..25e1f35 100644
--- a/python-performance-test.md
+++ b/python-performance-test.md
@@ -148,6 +148,17 @@ Run performance test for all algorithms under the family `regression2` and log w
 Run performance test for all algorithms using HDFS.
 
 
+## Result Consolidation and Plotting
+We have two scripts, `stats.py` forpulling results from google docs and `update.py` to updating results to google docs or local file system.
+
+Example of `update.py` would be below
+`./scripts/perftest/python/google_docs/update.py --file  ../../temp/perf_test_singlenode.out --exec-type singlenode --tag 2 --append test.csv` 
+The arguments being `--file` path of the perf-test output, `--exec-type` execution mode used to generate the perf-test output, `--tag` being the realease version or a unique name, `--append` being an optional argument that would append the a local csv file. If instead of `--append` the `--auth` argument needs the location of the `google api key` file.
+
+Example of `stats.py` below 
+`  ./stats.py --auth ../key/client_json.json --exec-type singlenode --plot stats1_data-gen_none_dense_10k_100`
+`--plot` argument needs the name of the composite key that you would like to compare results over. If this argument is not specified the results would be grouped by keys.
+
 ## Operational Notes
 
 All performance test depend mainly on two scripts for execution `systemml-standalone.py` and `systemml-spark-submit.py`. Incase we need to change standalone or spark parameters we need to manually change these parameters in their respective scripts.
@@ -158,7 +169,7 @@ The logs contain the following information below comma separated.
 
 algorithm | run_type | intercept | matrix_type | data_shape | time_sec
 --- | --- | --- | --- | --- | --- |
-multinomial|data-gen|0|dense|10k_100| 0.33
+multinomial|data-gen|0|10k_100|dense| 0.33
 MultiLogReg|train|0|10k_100|dense|6.956
 MultiLogReg|predict|0|10k_100|dense|4.780
 
@@ -187,9 +198,12 @@ Matrix Shape | Approximate Data Size
 10M_1k|80GB
 100M_1k|800GB
 
+
 For example the command below runs performance test for all data sizes described above
 `run_perftest.py --family binomial clustering multinomial regression1 regression2 stats1 stats2 --mat-shape 10k_1k 100k_1k 1M_1k 10M_1k 100M_1k --master yarn-client  --temp-dir hdfs://localhost:9000/user/systemml`
 
+By default data generated in `hybrid_spark` execution mode is in the current users `hdfs` home directory.
+
 Note: Please use this command `pip3 install -r requirements.txt` before using the perftest scripts.
 
 


[2/4] systemml git commit: [DOC][HOTFIX] updatest to the performance test scripts

Posted by de...@apache.org.
[DOC][HOTFIX] updatest to the performance test scripts

Closes #616


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/07bd40a4
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/07bd40a4
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/07bd40a4

Branch: refs/heads/gh-pages
Commit: 07bd40a43bf1bbed2ac9e2ec95fcf7949cc35801
Parents: 16f72cf
Author: krishnakalyan3 <kr...@gmail.com>
Authored: Mon Aug 14 15:18:50 2017 -0700
Committer: Nakul Jindal <na...@gmail.com>
Committed: Mon Aug 14 15:18:50 2017 -0700

----------------------------------------------------------------------
 python-performance-test.md | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/07bd40a4/python-performance-test.md
----------------------------------------------------------------------
diff --git a/python-performance-test.md b/python-performance-test.md
index 3d29f01..ce36c2d 100644
--- a/python-performance-test.md
+++ b/python-performance-test.md
@@ -177,7 +177,20 @@ In the example above `--tag` can be a major/minor systemml version and `--auth`
 Currently we only support time difference between algorithms in different versions. This can be obtained by running the script below
 `./stats.py --auth client_json.json --exec-mode singlenode --tags 1.0 2.0`
 
-Note: Please pip install `https://github.com/burnash/gspread` to use google docs client.
+We pass different `matrix shapes` using `--mat-shape` argument.
+
+Matrix Shape | Approximate Data Size 
+--- | --- |
+10k_1k|80MB
+100k_1k|800MB
+1M_1k|8GB
+10M_1k|80GB
+100M_1k|800GB
+
+For example the command below runs performance test for all data sizes described above
+`run_perftest.py --family binomial clustering multinomial regression1 regression2 stats1 stats2 --mat-shape 10k_1k 100k_1k 1M_1k 10M_1k 100M_1k --master yarn-client  --temp-dir hdfs://localhost:9000/user/systemml`
+
+Note: Please use this command `pip3 install -r requirements.txt` before using the perftest scripts.
 
 
 ## Troubleshooting