You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by GitBox <gi...@apache.org> on 2020/05/19 05:52:39 UTC

[GitHub] [systemml] deutschmn opened a new pull request #916: Add to_one_hot builtin function

deutschmn opened a new pull request #916:
URL: https://github.com/apache/systemml/pull/916


   Adds a builtin function `to_one_hot` which transforms a vector containing integers into a one-hot-encoded matrix


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] Baunsgaard commented on pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on pull request #916:
URL: https://github.com/apache/systemml/pull/916#issuecomment-632581106


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] deutschmn commented on a change in pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
deutschmn commented on a change in pull request #916:
URL: https://github.com/apache/systemml/pull/916#discussion_r427911363



##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.lops.LopProperties;
+import org.apache.sysds.runtime.matrix.data.MatrixValue;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+
+import static org.junit.Assert.fail;
+
+public class BuiltinToOneHotTest extends AutomatedTestBase {
+    private final static String TEST_NAME = "to_one_hot";
+    private final static String TEST_DIR = "functions/builtin/";
+    private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinToOneHotTest.class.getSimpleName() + "/";
+
+    private final static double eps = 0;
+    private final static int rows = 10;
+    private final static int cols = 1;
+    private final static int numClasses = 10;
+
+    @Override
+    public void setUp() {
+        addTestConfiguration(TEST_NAME,new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[]{"B"}));
+    }
+
+    @Test
+    public void runSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP, false);
+    }
+
+    @Test
+    public void runFailingSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP, true);
+    }
+
+    private void runToOneHotTest(boolean scalar, boolean sparse,
+                                 LopProperties.ExecType instType, boolean shouldFail) {
+        Types.ExecMode platformOld = setExecMode(instType);
+
+        try
+        {
+            loadTestConfiguration(getTestConfiguration(TEST_NAME));
+
+            //generate actual dataset
+            double[][] doubles = getRandomMatrix(rows, cols, 1, numClasses, 1, 7);
+
+            // round them
+            double[][] A = new double[rows][cols];
+            for(int i = 0; i < rows; i++) {
+                for(int j = 0; j < cols; j++) {
+                    A[i][j] = Math.round(doubles[i][j]);
+                }
+            }
+
+            int max = -1;
+
+            for(int i = 0; i < rows; i++) {
+                if(A[i][0] > max) {
+                    max = (int) A[i][0];
+                }
+            }
+
+            // script fails if numClasses provided is smaller than maximum value in A
+            int numClassesPassed = shouldFail ? max - 1 : max;
+
+            String HOME = SCRIPT_DIR + TEST_DIR;
+            fullDMLScriptName = HOME + TEST_NAME + ".dml";
+            programArgs = new String[]{"-explain", "-args", input("A"), String.format("%d", numClassesPassed),
+                    output("B") };
+
+
+            writeInputMatrixWithMTD("A", A, false);
+
+            runTest(true, false, null, -1);

Review comment:
       I did actually try that, but `stop` doesn't seem to throw an exception. Therefore I detect it below by checking if the out-file hasn't been written. Any suggestions how I else could do it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] Baunsgaard commented on a change in pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #916:
URL: https://github.com/apache/systemml/pull/916#discussion_r429069619



##########
File path: src/main/java/org/apache/sysds/common/Builtins.java
##########
@@ -173,6 +173,7 @@
 	TAN("tan", false),
 	TANH("tanh", false),
 	TRACE("trace", false),
+	TO_ONE_HOT("to_one_hot", true),

Review comment:
       After discussing this with Matthias, we think that we should try to stick to camelCase. The same goes for the parameters. for the internal enum you can keep it as TO_ONE_HOT.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] deutschmn commented on a change in pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
deutschmn commented on a change in pull request #916:
URL: https://github.com/apache/systemml/pull/916#discussion_r427883613



##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,77 @@
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.lops.LopProperties;
+import org.apache.sysds.runtime.matrix.data.MatrixValue;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.HashMap;
+
+public class BuiltinToOneHotTest extends AutomatedTestBase {
+    private final static String TEST_NAME = "to_one_hot";
+    private final static String TEST_DIR = "functions/builtin/";
+    private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinToOneHotTest.class.getSimpleName() + "/";
+
+    private final static double eps = 0;
+    private final static int rows = 10;
+    private final static int cols = 1;
+    private final static int numClasses = 10;
+
+    @Override
+    public void setUp() {
+        addTestConfiguration(TEST_NAME,new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[]{"B"}));

Review comment:
       I added another test case that checks the stop condition if an incorrect number of classes is passed. The simple test case already runs on random input and verifies what the function does using a Java implementation in the test case. However, feel free to add more, if you deem it necessary. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] deutschmn commented on a change in pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
deutschmn commented on a change in pull request #916:
URL: https://github.com/apache/systemml/pull/916#discussion_r427878926



##########
File path: src/main/java/org/apache/sysds/common/Builtins.java
##########
@@ -173,6 +173,7 @@
 	TAN("tan", false),
 	TANH("tanh", false),
 	TRACE("trace", false),
+	TO_ONE_HOT("to_one_hot", true),

Review comment:
       I have got no particular opinion on this, but I found a function called `one_hot` in an R package: https://www.rdocumentation.org/packages/mltools/versions/0.3.5/topics/one_hot 
   
   Just let me know your final decision and I will change it accordingly. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] Baunsgaard commented on a change in pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #916:
URL: https://github.com/apache/systemml/pull/916#discussion_r427895560



##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,77 @@
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.lops.LopProperties;
+import org.apache.sysds.runtime.matrix.data.MatrixValue;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.HashMap;
+
+public class BuiltinToOneHotTest extends AutomatedTestBase {
+    private final static String TEST_NAME = "to_one_hot";
+    private final static String TEST_DIR = "functions/builtin/";
+    private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinToOneHotTest.class.getSimpleName() + "/";
+
+    private final static double eps = 0;
+    private final static int rows = 10;
+    private final static int cols = 1;
+    private final static int numClasses = 10;
+
+    @Override
+    public void setUp() {
+        addTestConfiguration(TEST_NAME,new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[]{"B"}));
+    }
+
+    @Test
+    public void runSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP);
+    }
+
+    private void runToOneHotTest(boolean scalar, boolean sparse, LopProperties.ExecType instType) {
+        Types.ExecMode platformOld = setExecMode(instType);
+
+        try
+        {
+            loadTestConfiguration(getTestConfiguration(TEST_NAME));
+
+            String HOME = SCRIPT_DIR + TEST_DIR;
+            fullDMLScriptName = HOME + TEST_NAME + ".dml";
+            programArgs = new String[]{"-explain", "-args", input("A"), String.format("%d", numClasses),
+                    output("B") };
+
+            //generate actual dataset
+            double[][] doubles = getRandomMatrix(rows, cols, 1, numClasses, 1, 7);
+
+            // round them
+            double[][] A = new double[rows][cols];
+            for(int i = 0; i < rows; i++) {
+                for(int j = 0; j < cols; j++) {
+                    A[i][j] = Math.round(doubles[i][j]);
+                }
+            }

Review comment:
       We have a util function to round all values. `TestUtils.round();` then you don't need this double for loop.
   
   `TestUtils.round(getRandomMatrix(rows, cols, 1, numClasses, 1, 7));`

##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.lops.LopProperties;
+import org.apache.sysds.runtime.matrix.data.MatrixValue;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+
+import static org.junit.Assert.fail;
+
+public class BuiltinToOneHotTest extends AutomatedTestBase {
+    private final static String TEST_NAME = "to_one_hot";
+    private final static String TEST_DIR = "functions/builtin/";
+    private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinToOneHotTest.class.getSimpleName() + "/";
+
+    private final static double eps = 0;
+    private final static int rows = 10;
+    private final static int cols = 1;
+    private final static int numClasses = 10;
+
+    @Override
+    public void setUp() {
+        addTestConfiguration(TEST_NAME,new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[]{"B"}));
+    }
+
+    @Test
+    public void runSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP, false);
+    }
+
+    @Test
+    public void runFailingSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP, true);
+    }
+
+    private void runToOneHotTest(boolean scalar, boolean sparse,
+                                 LopProperties.ExecType instType, boolean shouldFail) {
+        Types.ExecMode platformOld = setExecMode(instType);
+
+        try
+        {
+            loadTestConfiguration(getTestConfiguration(TEST_NAME));
+
+            //generate actual dataset
+            double[][] doubles = getRandomMatrix(rows, cols, 1, numClasses, 1, 7);
+
+            // round them
+            double[][] A = new double[rows][cols];
+            for(int i = 0; i < rows; i++) {
+                for(int j = 0; j < cols; j++) {
+                    A[i][j] = Math.round(doubles[i][j]);
+                }
+            }
+
+            int max = -1;
+
+            for(int i = 0; i < rows; i++) {
+                if(A[i][0] > max) {
+                    max = (int) A[i][0];
+                }
+            }
+
+            // script fails if numClasses provided is smaller than maximum value in A
+            int numClassesPassed = shouldFail ? max - 1 : max;
+
+            String HOME = SCRIPT_DIR + TEST_DIR;
+            fullDMLScriptName = HOME + TEST_NAME + ".dml";
+            programArgs = new String[]{"-explain", "-args", input("A"), String.format("%d", numClassesPassed),
+                    output("B") };
+
+
+            writeInputMatrixWithMTD("A", A, false);
+
+            runTest(true, false, null, -1);
+
+            if(!shouldFail) {
+                HashMap<MatrixValue.CellIndex, Double> expected = new HashMap<MatrixValue.CellIndex, Double>();
+                for(int i = 0; i < A.length; i++) {
+                    for(int j = 0; j < A[i].length; j++) {
+                        // indices start with 1 here
+                        expected.put(new MatrixValue.CellIndex(i + 1, (int) A[i][j]), 1.0);
+                    }
+                }

Review comment:
       maybe make this into an helper function to construct the result?
   (the double for loop +  expected construction)

##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.lops.LopProperties;
+import org.apache.sysds.runtime.matrix.data.MatrixValue;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+
+import static org.junit.Assert.fail;
+
+public class BuiltinToOneHotTest extends AutomatedTestBase {
+    private final static String TEST_NAME = "to_one_hot";
+    private final static String TEST_DIR = "functions/builtin/";
+    private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinToOneHotTest.class.getSimpleName() + "/";
+
+    private final static double eps = 0;
+    private final static int rows = 10;
+    private final static int cols = 1;
+    private final static int numClasses = 10;
+
+    @Override
+    public void setUp() {
+        addTestConfiguration(TEST_NAME,new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[]{"B"}));
+    }
+
+    @Test
+    public void runSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP, false);
+    }
+
+    @Test
+    public void runFailingSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP, true);
+    }
+
+    private void runToOneHotTest(boolean scalar, boolean sparse,
+                                 LopProperties.ExecType instType, boolean shouldFail) {
+        Types.ExecMode platformOld = setExecMode(instType);
+
+        try
+        {
+            loadTestConfiguration(getTestConfiguration(TEST_NAME));
+
+            //generate actual dataset
+            double[][] doubles = getRandomMatrix(rows, cols, 1, numClasses, 1, 7);
+
+            // round them
+            double[][] A = new double[rows][cols];
+            for(int i = 0; i < rows; i++) {
+                for(int j = 0; j < cols; j++) {
+                    A[i][j] = Math.round(doubles[i][j]);
+                }
+            }
+
+            int max = -1;
+
+            for(int i = 0; i < rows; i++) {
+                if(A[i][0] > max) {
+                    max = (int) A[i][0];
+                }
+            }
+
+            // script fails if numClasses provided is smaller than maximum value in A
+            int numClassesPassed = shouldFail ? max - 1 : max;
+
+            String HOME = SCRIPT_DIR + TEST_DIR;
+            fullDMLScriptName = HOME + TEST_NAME + ".dml";
+            programArgs = new String[]{"-explain", "-args", input("A"), String.format("%d", numClassesPassed),
+                    output("B") };
+
+
+            writeInputMatrixWithMTD("A", A, false);
+
+            runTest(true, false, null, -1);

Review comment:
       our run Test has an argument to detect if the execution should fail.
   
   `runTest(true, false, DMLScriptException.class, -1);`
   
   This will eliminate your if else bellow.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] asfgit closed pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #916:
URL: https://github.com/apache/systemml/pull/916


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] mboehm7 commented on pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #916:
URL: https://github.com/apache/systemml/pull/916#issuecomment-633139626


   LGTM - thanks @deutschmn for the patch and @Baunsgaard for the review. During the merge I just fixed minor issues in the test (tabs over spaces in Java code, static `computeExpectedResult`, and less verbose data generation).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] Baunsgaard commented on a change in pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #916:
URL: https://github.com/apache/systemml/pull/916#discussion_r427115317



##########
File path: scripts/builtin/to_one_hot.dml
##########
@@ -0,0 +1,42 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# One-hot encodes a vector
+
+# INPUT PARAMETERS:
+# --------------------------------------------------------------------------------------------
+# NAME          TYPE    DEFAULT   MEANING
+# --------------------------------------------------------------------------------------------
+# X             matrix  ---       vector with N integer entries between 1 and num_classes
+# num_classes   int     ---       number of columns, must be >= largest value in X
+
+# Output: 
+# --------------------------------------------------------------------------------------------
+# NAME          TYPE     MEANING
+# -------------------------------------------------------------------------------------------
+# Y             matrix   one-hot-encoded matrix with shape (N, num_classes)
+# -------------------------------------------------------------------------------------------
+
+m_to_one_hot = function(matrix[double] X, integer num_classes)
+        return (matrix[double] Y) {
+    assert(num_classes >= max(X));
+    Y = table(seq(1, nrow(X)), X, nrow(X), num_classes);
+}

Review comment:
       new line in end of file would be nice.

##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,77 @@
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.lops.LopProperties;
+import org.apache.sysds.runtime.matrix.data.MatrixValue;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.HashMap;
+
+public class BuiltinToOneHotTest extends AutomatedTestBase {
+    private final static String TEST_NAME = "to_one_hot";
+    private final static String TEST_DIR = "functions/builtin/";
+    private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinToOneHotTest.class.getSimpleName() + "/";
+
+    private final static double eps = 0;
+    private final static int rows = 10;
+    private final static int cols = 1;
+    private final static int numClasses = 10;
+
+    @Override
+    public void setUp() {
+        addTestConfiguration(TEST_NAME,new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[]{"B"}));

Review comment:
       I would say that one test case of a "nice" input is insufficient. If you want to see behavior, since i just added a Buildin function myself i suggest something in the direction of (Up for debate):
   https://github.com/apache/systemml/blob/e80145b344068d68e53094e437524108d5f0f00a/src/test/java/org/apache/sysds/test/functions/builtin/BuiltinConfusionMatrixTest.java
   

##########
File path: src/main/java/org/apache/sysds/common/Builtins.java
##########
@@ -173,6 +173,7 @@
 	TAN("tan", false),
 	TANH("tanh", false),
 	TRACE("trace", false),
+	TO_ONE_HOT("to_one_hot", true),

Review comment:
       This is a very Pythonic way of naming the function, I Just looked at the other function definitions and it is not very consistent in the naming. But the few fundamental functions taken from R is named using camelCase. I would suggest using that.
   Any opinions @Shafaq-Siddiqi , is there an official decision?

##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,77 @@
+package org.apache.sysds.test.functions.builtin;

Review comment:
       Missing License

##########
File path: scripts/builtin/to_one_hot.dml
##########
@@ -0,0 +1,42 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# One-hot encodes a vector
+
+# INPUT PARAMETERS:
+# --------------------------------------------------------------------------------------------
+# NAME          TYPE    DEFAULT   MEANING
+# --------------------------------------------------------------------------------------------
+# X             matrix  ---       vector with N integer entries between 1 and num_classes
+# num_classes   int     ---       number of columns, must be >= largest value in X
+
+# Output: 
+# --------------------------------------------------------------------------------------------
+# NAME          TYPE     MEANING
+# -------------------------------------------------------------------------------------------
+# Y             matrix   one-hot-encoded matrix with shape (N, num_classes)
+# -------------------------------------------------------------------------------------------
+
+m_to_one_hot = function(matrix[double] X, integer num_classes)
+        return (matrix[double] Y) {
+    assert(num_classes >= max(X));

Review comment:
       this would be nice if changedto:
   ```
   if(condition)
     stop("Good Error Message")
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] Baunsgaard commented on a change in pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #916:
URL: https://github.com/apache/systemml/pull/916#discussion_r427926194



##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.lops.LopProperties;
+import org.apache.sysds.runtime.matrix.data.MatrixValue;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+
+import static org.junit.Assert.fail;
+
+public class BuiltinToOneHotTest extends AutomatedTestBase {
+    private final static String TEST_NAME = "to_one_hot";
+    private final static String TEST_DIR = "functions/builtin/";
+    private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinToOneHotTest.class.getSimpleName() + "/";
+
+    private final static double eps = 0;
+    private final static int rows = 10;
+    private final static int cols = 1;
+    private final static int numClasses = 10;
+
+    @Override
+    public void setUp() {
+        addTestConfiguration(TEST_NAME,new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[]{"B"}));
+    }
+
+    @Test
+    public void runSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP, false);
+    }
+
+    @Test
+    public void runFailingSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP, true);
+    }
+
+    private void runToOneHotTest(boolean scalar, boolean sparse,
+                                 LopProperties.ExecType instType, boolean shouldFail) {
+        Types.ExecMode platformOld = setExecMode(instType);
+
+        try
+        {
+            loadTestConfiguration(getTestConfiguration(TEST_NAME));
+
+            //generate actual dataset
+            double[][] doubles = getRandomMatrix(rows, cols, 1, numClasses, 1, 7);
+
+            // round them
+            double[][] A = new double[rows][cols];
+            for(int i = 0; i < rows; i++) {
+                for(int j = 0; j < cols; j++) {
+                    A[i][j] = Math.round(doubles[i][j]);
+                }
+            }
+
+            int max = -1;
+
+            for(int i = 0; i < rows; i++) {
+                if(A[i][0] > max) {
+                    max = (int) A[i][0];
+                }
+            }
+
+            // script fails if numClasses provided is smaller than maximum value in A
+            int numClassesPassed = shouldFail ? max - 1 : max;
+
+            String HOME = SCRIPT_DIR + TEST_DIR;
+            fullDMLScriptName = HOME + TEST_NAME + ".dml";
+            programArgs = new String[]{"-explain", "-args", input("A"), String.format("%d", numClassesPassed),
+                    output("B") };
+
+
+            writeInputMatrixWithMTD("A", A, false);
+
+            runTest(true, false, null, -1);

Review comment:
       oh, so I did not notice that thanks for pointing it out. :+1: I have to fix this in my own tests as well.
   
   The second argument have to be true if an exception is expected, (so weird really have to be modified). But there is no exception but then again as you say there is no exception in a stop condition.
   
   So conclusion is that there is no great way of detecting if a stop is called in SystemDS in testing other than looking at if the script produced the output file? @mboehm7  ?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] Baunsgaard commented on a change in pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #916:
URL: https://github.com/apache/systemml/pull/916#discussion_r429074721



##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.lops.LopProperties;
+import org.apache.sysds.runtime.matrix.data.MatrixValue;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+
+import static org.junit.Assert.fail;
+
+public class BuiltinToOneHotTest extends AutomatedTestBase {
+    private final static String TEST_NAME = "to_one_hot";
+    private final static String TEST_DIR = "functions/builtin/";
+    private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinToOneHotTest.class.getSimpleName() + "/";
+
+    private final static double eps = 0;
+    private final static int rows = 10;
+    private final static int cols = 1;
+    private final static int numClasses = 10;
+
+    @Override
+    public void setUp() {
+        addTestConfiguration(TEST_NAME,new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[]{"B"}));
+    }
+
+    @Test
+    public void runSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP, false);
+    }
+
+    @Test
+    public void runFailingSimpleTest() {
+        runToOneHotTest(false, false, LopProperties.ExecType.CP, true);
+    }
+
+    private void runToOneHotTest(boolean scalar, boolean sparse,
+                                 LopProperties.ExecType instType, boolean shouldFail) {
+        Types.ExecMode platformOld = setExecMode(instType);
+
+        try
+        {
+            loadTestConfiguration(getTestConfiguration(TEST_NAME));
+
+            //generate actual dataset
+            double[][] doubles = getRandomMatrix(rows, cols, 1, numClasses, 1, 7);
+
+            // round them
+            double[][] A = new double[rows][cols];
+            for(int i = 0; i < rows; i++) {
+                for(int j = 0; j < cols; j++) {
+                    A[i][j] = Math.round(doubles[i][j]);
+                }
+            }
+
+            int max = -1;
+
+            for(int i = 0; i < rows; i++) {
+                if(A[i][0] > max) {
+                    max = (int) A[i][0];
+                }
+            }
+
+            // script fails if numClasses provided is smaller than maximum value in A
+            int numClassesPassed = shouldFail ? max - 1 : max;
+
+            String HOME = SCRIPT_DIR + TEST_DIR;
+            fullDMLScriptName = HOME + TEST_NAME + ".dml";
+            programArgs = new String[]{"-explain", "-args", input("A"), String.format("%d", numClassesPassed),
+                    output("B") };
+
+
+            writeInputMatrixWithMTD("A", A, false);
+
+            runTest(true, false, null, -1);

Review comment:
       Update, I have looked into options so that if we call the system it does not require us to either check stdOut, or look if a file has been created. Unfortunately neither is available at the moment, so we will have to make some.
   
   A current idea is that the exception thrown from a stop should not be caught and just print, but be thrown all the way out to a user, but with the twist that it should still provide the same print as before if executed normally not a stack trace.
   
   Since it is another feature, I suggest not to include it in this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] deutschmn commented on a change in pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
deutschmn commented on a change in pull request #916:
URL: https://github.com/apache/systemml/pull/916#discussion_r429124337



##########
File path: src/main/java/org/apache/sysds/common/Builtins.java
##########
@@ -173,6 +173,7 @@
 	TAN("tan", false),
 	TANH("tanh", false),
 	TRACE("trace", false),
+	TO_ONE_HOT("to_one_hot", true),

Review comment:
       Alright 👍 I just updated the naming in the most recent commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] Baunsgaard commented on pull request #916: Add to_one_hot builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on pull request #916:
URL: https://github.com/apache/systemml/pull/916#issuecomment-630669419


   Also it seems like the commit is not claimed by you. This means that you wont accumulate credit associated with your account. So you might want to fix that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org