You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by gw...@apache.org on 2016/07/19 22:49:46 UTC
incubator-systemml git commit: [SYSTEMML-764] Add Univar-Stats.dml
labeled console output
Repository: incubator-systemml
Updated Branches:
refs/heads/master b584aecf6 -> 1035699c3
[SYSTEMML-764] Add Univar-Stats.dml labeled console output
Added console output. Removed the existing table listing number and
name of univariate statistics to avoid redundancy.
Closes #192.
Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/1035699c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/1035699c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/1035699c
Branch: refs/heads/master
Commit: 1035699c3b23bef916eab4738a9d2d64e98d9d6e
Parents: b584aec
Author: Sandeep Narayanaswami <sa...@capitalone.com>
Authored: Tue Jul 19 15:36:28 2016 -0700
Committer: Glenn Weidner <gw...@us.ibm.com>
Committed: Tue Jul 19 15:36:28 2016 -0700
----------------------------------------------------------------------
docs/quick-start-guide.md | 103 ++++++++++++++++++++++++++---------------
1 file changed, 66 insertions(+), 37 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1035699c/docs/quick-start-guide.md
----------------------------------------------------------------------
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index ed6bd3a..f05db25 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -145,25 +145,78 @@ for each feature column using the algorithm `Univar-Stats.dml` which requires 3
* `X`: location of the input data file to analyze
* `TYPES`: location of the file that contains the feature column types encoded by integer numbers: `1` = scale, `2` = nominal, `3` = ordinal
-* `STATS`: location of the output matrix of computed statistics will be stored
+* `STATS`: location where the output matrix of computed statistics is to be stored
We need to create a file `types.csv` that describes the type of each column in
-the data along with it's metadata file `types.csv.mtd`.
+the data along with its metadata file `types.csv.mtd`.
$ echo '1,1,1,2' > data/types.csv
$ echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd
-To run the `Univar-Stats.dml` algorithm, issue the following command:
-
- $ ./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx
-
-The resulting matrix has one row per each univariate statistic and one column
-per input feature. The output file `univarOut.mtx` describes that
-matrix. The elements of the first column denote the number of the statistic,
-the elements of the second column refer to the number of the feature column in
-the input data, and the elements of the third column show the value of the
-univariate statistic.
+To run the `Univar-Stats.dml` algorithm, issue the following command (we set the optional argument `CONSOLE_OUTPUT` to `TRUE` to print the statistics to the console):
+
+ $ ./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx CONSOLE_OUTPUT=TRUE
+
+ [...]
+ -------------------------------------------------
+ Feature [1]: Scale
+ (01) Minimum | 30.0
+ (02) Maximum | 83.0
+ (03) Range | 53.0
+ (04) Mean | 52.45751633986928
+ (05) Variance | 116.71458266366658
+ (06) Std deviation | 10.803452349303281
+ (07) Std err of mean | 0.6175922641866753
+ (08) Coeff of variation | 0.20594669940735139
+ (09) Skewness | 0.1450718616532357
+ (10) Kurtosis | -0.6150152487211726
+ (11) Std err of skewness | 0.13934809593495995
+ (12) Std err of kurtosis | 0.277810485320835
+ (13) Median | 52.0
+ (14) Interquartile mean | 52.16013071895425
+ -------------------------------------------------
+ Feature [2]: Scale
+ (01) Minimum | 58.0
+ (02) Maximum | 69.0
+ (03) Range | 11.0
+ (04) Mean | 62.85294117647059
+ (05) Variance | 10.558630665380907
+ (06) Std deviation | 3.2494046632238507
+ (07) Std err of mean | 0.18575610076612029
+ (08) Coeff of variation | 0.051698529971741194
+ (09) Skewness | 0.07798443581479181
+ (10) Kurtosis | -1.1324380182967442
+ (11) Std err of skewness | 0.13934809593495995
+ (12) Std err of kurtosis | 0.277810485320835
+ (13) Median | 63.0
+ (14) Interquartile mean | 62.80392156862745
+ -------------------------------------------------
+ Feature [3]: Scale
+ (01) Minimum | 0.0
+ (02) Maximum | 52.0
+ (03) Range | 52.0
+ (04) Mean | 4.026143790849673
+ (05) Variance | 51.691117539912135
+ (06) Std deviation | 7.189653506248555
+ (07) Std err of mean | 0.41100513466216837
+ (08) Coeff of variation | 1.7857418611299172
+ (09) Skewness | 2.954633471088322
+ (10) Kurtosis | 11.425776549251449
+ (11) Std err of skewness | 0.13934809593495995
+ (12) Std err of kurtosis | 0.277810485320835
+ (13) Median | 1.0
+ (14) Interquartile mean | 1.2483660130718954
+ -------------------------------------------------
+ Feature [4]: Categorical (Nominal)
+ (15) Num of categories | 2
+ (16) Mode | 1
+ (17) Num of modes | 1
+
+
+The `Univar-Stats.dml` script writes the computed statistics to the `univarOut.mtx` file. The matrix has one row per univariate statistic and one column per input feature. The first column gives the number of the statistic
+(see above table), the second column gives the number of the feature column in
+the input data, and the third column gives the value of the univariate statistic.
1 1 30.0
1 2 58.0
@@ -210,31 +263,6 @@ univariate statistic.
16 4 1.0
17 4 1.0
-The following table lists the number and name of each univariate statistic. The row
-numbers below correspond to the elements of the first column in the output
-matrix above. The signs "+" show applicability to scale or/and to categorical
-features.
-
- | Row | Name of Statistic | Scale | Categ. |
- | :-: |:-------------------------- |:-----:| :-----:|
- | 1 | Minimum | + | |
- | 2 | Maximum | + | |
- | 3 | Range | + | |
- | 4 | Mean | + | |
- | 5 | Variance | + | |
- | 6 | Standard deviation | + | |
- | 7 | Standard error of mean | + | |
- | 8 | Coefficient of variation | + | |
- | 9 | Skewness | + | |
- | 10 | Kurtosis | + | |
- | 11 | Standard error of skewness | + | |
- | 12 | Standard error of kurtosis | + | |
- | 13 | Median | + | |
- | 14 | Inter quartile mean | + | |
- | 15 | Number of categories | | + |
- | 16 | Mode | | + |
- | 17 | Number of modes | | + |
-
<br/>
<br/>
@@ -368,3 +396,4 @@ the memory available to the JVM, i.e:
<br/>
+`this is code`
\ No newline at end of file