You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by GitBox <gi...@apache.org> on 2020/05/19 08:27:53 UTC

[GitHub] [systemml] Baunsgaard commented on a change in pull request #916: Add to_one_hot builtin function

Baunsgaard commented on a change in pull request #916:
URL: https://github.com/apache/systemml/pull/916#discussion_r427115317



##########
File path: scripts/builtin/to_one_hot.dml
##########
@@ -0,0 +1,42 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# One-hot encodes a vector
+
+# INPUT PARAMETERS:
+# --------------------------------------------------------------------------------------------
+# NAME          TYPE    DEFAULT   MEANING
+# --------------------------------------------------------------------------------------------
+# X             matrix  ---       vector with N integer entries between 1 and num_classes
+# num_classes   int     ---       number of columns, must be >= largest value in X
+
+# Output: 
+# --------------------------------------------------------------------------------------------
+# NAME          TYPE     MEANING
+# -------------------------------------------------------------------------------------------
+# Y             matrix   one-hot-encoded matrix with shape (N, num_classes)
+# -------------------------------------------------------------------------------------------
+
+m_to_one_hot = function(matrix[double] X, integer num_classes)
+        return (matrix[double] Y) {
+    assert(num_classes >= max(X));
+    Y = table(seq(1, nrow(X)), X, nrow(X), num_classes);
+}

Review comment:
       new line in end of file would be nice.

##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,77 @@
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.lops.LopProperties;
+import org.apache.sysds.runtime.matrix.data.MatrixValue;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.HashMap;
+
+public class BuiltinToOneHotTest extends AutomatedTestBase {
+    private final static String TEST_NAME = "to_one_hot";
+    private final static String TEST_DIR = "functions/builtin/";
+    private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinToOneHotTest.class.getSimpleName() + "/";
+
+    private final static double eps = 0;
+    private final static int rows = 10;
+    private final static int cols = 1;
+    private final static int numClasses = 10;
+
+    @Override
+    public void setUp() {
+        addTestConfiguration(TEST_NAME,new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[]{"B"}));

Review comment:
       I would say that one test case of a "nice" input is insufficient. If you want to see behavior, since i just added a Buildin function myself i suggest something in the direction of (Up for debate):
   https://github.com/apache/systemml/blob/e80145b344068d68e53094e437524108d5f0f00a/src/test/java/org/apache/sysds/test/functions/builtin/BuiltinConfusionMatrixTest.java
   

##########
File path: src/main/java/org/apache/sysds/common/Builtins.java
##########
@@ -173,6 +173,7 @@
 	TAN("tan", false),
 	TANH("tanh", false),
 	TRACE("trace", false),
+	TO_ONE_HOT("to_one_hot", true),

Review comment:
       This is a very Pythonic way of naming the function, I Just looked at the other function definitions and it is not very consistent in the naming. But the few fundamental functions taken from R is named using camelCase. I would suggest using that.
   Any opinions @Shafaq-Siddiqi , is there an official decision?

##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinToOneHotTest.java
##########
@@ -0,0 +1,77 @@
+package org.apache.sysds.test.functions.builtin;

Review comment:
       Missing License

##########
File path: scripts/builtin/to_one_hot.dml
##########
@@ -0,0 +1,42 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# One-hot encodes a vector
+
+# INPUT PARAMETERS:
+# --------------------------------------------------------------------------------------------
+# NAME          TYPE    DEFAULT   MEANING
+# --------------------------------------------------------------------------------------------
+# X             matrix  ---       vector with N integer entries between 1 and num_classes
+# num_classes   int     ---       number of columns, must be >= largest value in X
+
+# Output: 
+# --------------------------------------------------------------------------------------------
+# NAME          TYPE     MEANING
+# -------------------------------------------------------------------------------------------
+# Y             matrix   one-hot-encoded matrix with shape (N, num_classes)
+# -------------------------------------------------------------------------------------------
+
+m_to_one_hot = function(matrix[double] X, integer num_classes)
+        return (matrix[double] Y) {
+    assert(num_classes >= max(X));

Review comment:
       this would be nice if changedto:
   ```
   if(condition)
     stop("Good Error Message")
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org