You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemds.apache.org by mb...@apache.org on 2021/09/05 17:03:13 UTC
[systemds] branch master updated: [SYSTEMDS-3120] Fix vectorized
correctTypos (permutation matmult)
This is an automated email from the ASF dual-hosted git repository.
mboehm7 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/systemds.git
The following commit(s) were added to refs/heads/master by this push:
new 09199ee [SYSTEMDS-3120] Fix vectorized correctTypos (permutation matmult)
09199ee is described below
commit 09199ee7e1d026a1045361a37678c9bc8d776d1c
Author: Matthias Boehm <mb...@gmail.com>
AuthorDate: Sun Sep 5 19:02:43 2021 +0200
[SYSTEMDS-3120] Fix vectorized correctTypos (permutation matmult)
This patch fixes an oversight in the new vectorized correctTypos:
instead of using a permutation matrix multiply (as intented) to shuffle
frequencies into the right positions, it used an element-wise modulo.
The fix is a single character, fixes correctness, and makes the
correctTypos built-in even faster.
---
scripts/builtin/correctTypos.dml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/builtin/correctTypos.dml b/scripts/builtin/correctTypos.dml
index 55b7246..26f051f 100644
--- a/scripts/builtin/correctTypos.dml
+++ b/scripts/builtin/correctTypos.dml
@@ -179,7 +179,7 @@ buildDictionary = function(Frame[String] S)
[ID,M] = transformencode(target=S, spec="{ids:true,recode:[1]}");
dstr = map(M[,1], "s -> UtilFunctions.splitRecodeEntry(s)[0]");
dcodes = map(M[,1], "s -> UtilFunctions.splitRecodeEntry(s)[1]");
- frequencies = table(seq(1,nrow(dstr)),as.matrix(dcodes)) %% table(ID, 1);
+ frequencies = table(seq(1,nrow(dstr)),as.matrix(dcodes)) %*% table(ID, 1);
dict = cbind(dstr, as.frame(frequencies));
}