You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemds.apache.org by ja...@apache.org on 2021/05/04 07:06:47 UTC
[systemds] branch master updated: [DOC] Group by aggregate using
linear algebra usage (#1251)
This is an automated email from the ASF dual-hosted git repository.
janardhan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/systemds.git
The following commit(s) were added to refs/heads/master by this push:
new 31d9faf [DOC] Group by aggregate using linear algebra usage (#1251)
31d9faf is described below
commit 31d9faf708eb158a5a4d6603a3494b245d0481ae
Author: j143 <j1...@protonmail.com>
AuthorDate: Tue May 4 12:36:36 2021 +0530
[DOC] Group by aggregate using linear algebra usage (#1251)
* Reusing code from the SystemML tutorial KDD 2017
---
docs/site/dml-vs-r-guide.md | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/docs/site/dml-vs-r-guide.md b/docs/site/dml-vs-r-guide.md
index 8b44ee8..1c1fdd7 100644
--- a/docs/site/dml-vs-r-guide.md
+++ b/docs/site/dml-vs-r-guide.md
@@ -165,3 +165,47 @@ J = matrix ("10 20 25 26 28 31 50 67 79", rows = 1, cols = 9)
res = X + table (matrix (1, rows = 1, cols = ncol (J)), J, 10)
```
+#### Group by aggregate using Linear Algebra
+
+Given a matrix PCV as (Position, Category, Value), sort PCV by category, and within each category
+by value in descending order.
+
+- create indicator vector for category changes
+- create distinct categories, and
+- perform linear algebra operations.
+
+```dml
+# category data
+C = matrix ('50 40 20 10 30 20 40 20 30', rows = 9, cols = 1)
+# value data
+V = matrix ('20 11 49 33 94 29 48 74 57', rows = 9, cols = 1)
+
+# 1. PCV representation
+PCV = cbind (cbind (seq (1, nrow (C), 1), C), V)
+PCV = order (target = PCV, by = 3, decreasing = TRUE, index.return = FALSE)
+PCV = order (target = PCV, by = 2, decreasing = FALSE, index.return = FALSE)
+
+# 2. Find all rows of PCV where the category has a new value, in comparison to
+# the previous row
+
+is_new_C = matrix (1, rows = 1, cols = 1);
+if (nrow (C) > 1) {
+ is_new_C = rbind (is_new_C, (PCV [1:nrow(C) - 1, 2] < PCV [2:nrow(C), 2]));
+}
+
+# 3. Associate each category with its index
+
+index_C = cumsum (is_new_C); # cumsum
+
+# 4. For each category, compute:
+# - the list of distinct categories
+# - the maximum value for each category
+# - 0-1 aggregation matrix that adds records of the same category
+
+distinct_C = removeEmpty (target = PCV [, 2], margin = "rows", select = is_new_C);
+max_V_per_C = removeEmpty (target = PCV [, 3], margin = "rows", select = is_new_C);
+C_indicator = table (index_C, PCV [, 1], max (index_C), nrow (C)); # table
+
+# 5. Perform aggregation, here sum values per category
+sum_V_per_C = C_indicator %*% V
+```