You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemds.apache.org by ja...@apache.org on 2021/05/04 06:39:32 UTC

[systemds] branch master updated: [DOC] DML recipes for operations over matrices (#1250)

This is an automated email from the ASF dual-hosted git repository.

janardhan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/systemds.git


The following commit(s) were added to refs/heads/master by this push:
     new 3ff9d2c  [DOC] DML recipes for operations over matrices (#1250)
3ff9d2c is described below

commit 3ff9d2c1c50a25108ef7adc5cfa506f55e0630ba
Author: j143 <j1...@protonmail.com>
AuthorDate: Tue May 4 12:09:23 2021 +0530

    [DOC] DML recipes for operations over matrices (#1250)
    
    * Removing duplicates (sorted or unsorted)
    * Sparse matrix representation
    * Indexing operations
---
 docs/site/dml-vs-r-guide.md | 65 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/docs/site/dml-vs-r-guide.md b/docs/site/dml-vs-r-guide.md
index 0a5aa37..8b44ee8 100644
--- a/docs/site/dml-vs-r-guide.md
+++ b/docs/site/dml-vs-r-guide.md
@@ -100,3 +100,68 @@ write(evalues, "evalues", format=ofmt)
 # Here args[2] will be a string denoting output directory
 writeMM(as(evalues, "CsparseMatrix"), paste(args[2],"evalues", sep=""));
 ```
+
+### Recipes
+
+#### Construct a matrix (sparse)
+
+(rowIndex, colIndex, values) triplets
+
+```dml
+I = matrix ("1 3 3", rows = 3, cols = 1)
+J = matrix ("2 3 4", rows = 3, cols = 1)
+V = matrix ("10 20 30", rows = 3, cols = 1)
+
+M = table (I, J, V)
+```
+
+#### Find and remove duplicates in columns or rows
+
+##### Assuming values are sorted
+
+```dml
+X = matrix ("1 2 2 3 5 6 7 8 9 9", rows = 10, cols = 1)
+
+# Compare the current value with the next value
+I = rbind (matrix (1,1,1), (X[1:nrow (X)-1,] != X[2:nrow (X),]))
+# Select only the unique items
+res = removeEmpty (target = X, margin = "rows", select = I)
+```
+
+##### Values may not be sorted in order
+
+Method 1:
+
+```dml
+X = matrix ("1 8 2 3 9 7 6 5 2 9", rows = 10, cols = 1)
+
+# group and count duplicates
+I = aggregate (target = X, groups = X[,1], fn = "count")
+# select groups
+res = removeEmpty (target = seq (1, max (X[,1])), margin = "rows", select = (I != 0))
+```
+
+Method 2:
+
+First order and then remove duplicates
+
+```dml
+X = matrix ("3 2 1 3 3 4 5 10", rows = 8, cols = 1)
+
+X = order (target = X, by = 1)
+I = rbind (matrix (1,1,1), (X[1:nrow (X)-1,] != X[2:nrow (X),]))
+res = removeEmpty (target = X, margin = "rows", select = I)
+```
+
+#### Set based indexing
+
+Given a matrix X, with a indicator matrix I with indices into X. Use I to perform an operation
+on X. For eg., add a value 10 to the cells (in X) indicated by I.
+
+```dml
+X = matrix (1, rows = 1, cols = 100)
+J = matrix ("10 20 25 26 28 31 50 67 79", rows = 1, cols = 9)
+
+res = X + table (matrix (1, rows = 1, cols = ncol (J)), J, 10)
+```
+