You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemds.apache.org by mb...@apache.org on 2021/09/05 17:03:13 UTC

[systemds] branch master updated: [SYSTEMDS-3120] Fix vectorized correctTypos (permutation matmult)

This is an automated email from the ASF dual-hosted git repository.

mboehm7 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/systemds.git


The following commit(s) were added to refs/heads/master by this push:
     new 09199ee  [SYSTEMDS-3120] Fix vectorized correctTypos (permutation matmult)
09199ee is described below

commit 09199ee7e1d026a1045361a37678c9bc8d776d1c
Author: Matthias Boehm <mb...@gmail.com>
AuthorDate: Sun Sep 5 19:02:43 2021 +0200

    [SYSTEMDS-3120] Fix vectorized correctTypos (permutation matmult)
    
    This patch fixes an oversight in the new vectorized correctTypos:
    instead of using a permutation matrix multiply (as intented) to shuffle
    frequencies into the right positions, it used an element-wise modulo.
    The fix is a single character, fixes correctness, and makes the
    correctTypos built-in even faster.
---
 scripts/builtin/correctTypos.dml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/builtin/correctTypos.dml b/scripts/builtin/correctTypos.dml
index 55b7246..26f051f 100644
--- a/scripts/builtin/correctTypos.dml
+++ b/scripts/builtin/correctTypos.dml
@@ -179,7 +179,7 @@ buildDictionary = function(Frame[String] S)
   [ID,M] = transformencode(target=S, spec="{ids:true,recode:[1]}");
   dstr = map(M[,1], "s -> UtilFunctions.splitRecodeEntry(s)[0]");
   dcodes = map(M[,1], "s -> UtilFunctions.splitRecodeEntry(s)[1]");
-  frequencies = table(seq(1,nrow(dstr)),as.matrix(dcodes)) %% table(ID, 1);
+  frequencies = table(seq(1,nrow(dstr)),as.matrix(dcodes)) %*% table(ID, 1);
   dict = cbind(dstr, as.frame(frequencies));
 }