You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2021/04/17 21:21:50 UTC

[GitHub] [systemds] mboehm7 commented on a change in pull request #1217: Add Naive Bayes Predict built-in

mboehm7 commented on a change in pull request #1217:
URL: https://github.com/apache/systemds/pull/1217#discussion_r615304992



##########
File path: scripts/builtin/naivebayesPredict.dml
##########
@@ -0,0 +1,48 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+m_naivebayesPredict = function(Matrix[Double] D, Matrix[Double] C, Double probabilities = 1, Boolean verbose = TRUE)
+ return (Matrix[Double] YRaw,Matrix[Double] Y)
+{
+	numRows = nrow(D);
+	[prior, conditionals] = naivebayes(D=D, C=C, laplace=probabilities, verbose=verbose);

Review comment:
       Please do not call naive bayes training during the prediction, but adapt the function signature accordingly.

##########
File path: scripts/builtin/naivebayesPredict.dml
##########
@@ -0,0 +1,48 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+m_naivebayesPredict = function(Matrix[Double] D, Matrix[Double] C, Double probabilities = 1, Boolean verbose = TRUE)
+ return (Matrix[Double] YRaw,Matrix[Double] Y)
+{
+	numRows = nrow(D);
+	[prior, conditionals] = naivebayes(D=D, C=C, laplace=probabilities, verbose=verbose);
+
+	ones = matrix(1, rows=numRows, cols=1)
+	D_w_ones = cbind(D, ones)
+	model = cbind(conditionals, prior)
+	YRaw = D_w_ones %*% t(log(model))

Review comment:
       or better see the predict script

##########
File path: scripts/builtin/naivebayesPredict.dml
##########
@@ -0,0 +1,48 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+m_naivebayesPredict = function(Matrix[Double] D, Matrix[Double] C, Double probabilities = 1, Boolean verbose = TRUE)
+ return (Matrix[Double] YRaw,Matrix[Double] Y)
+{

Review comment:
       The function signature does not match the naive bayes predict algorithm.

##########
File path: scripts/builtin/naivebayesPredict.dml
##########
@@ -0,0 +1,48 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+m_naivebayesPredict = function(Matrix[Double] D, Matrix[Double] C, Double probabilities = 1, Boolean verbose = TRUE)
+ return (Matrix[Double] YRaw,Matrix[Double] Y)
+{
+	numRows = nrow(D);
+	[prior, conditionals] = naivebayes(D=D, C=C, laplace=probabilities, verbose=verbose);
+
+	ones = matrix(1, rows=numRows, cols=1)
+	D_w_ones = cbind(D, ones)
+	model = cbind(conditionals, prior)
+	YRaw = D_w_ones %*% t(log(model))

Review comment:
       ```
   dimensions = as.scalar(prior[nrow(prior),1])
   prior = prior[1:(nrow(prior)-1),]
   
   conditionals = read($conditionals)
   
   numRows = nrow(D)
   
   ones = matrix(1, rows=numRows, cols=1)
   D_w_ones = cbind(D, ones)
   model = cbind(conditionals, prior)
   
   log_probs = D_w_ones %*% t(log(model))
   
   
   mx = rowMaxs(log_probs)
   ones = matrix(1, rows=1, cols=nrow(prior))
   probs = log_probs - mx %*% ones
   probs = exp(probs)/(rowSums(exp(probs)) %*% ones)
   
   ```

##########
File path: scripts/builtin/naivebayesPredict.dml
##########
@@ -0,0 +1,48 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+m_naivebayesPredict = function(Matrix[Double] D, Matrix[Double] C, Double probabilities = 1, Boolean verbose = TRUE)
+ return (Matrix[Double] YRaw,Matrix[Double] Y)
+{
+	numRows = nrow(D);
+	[prior, conditionals] = naivebayes(D=D, C=C, laplace=probabilities, verbose=verbose);
+
+	ones = matrix(1, rows=numRows, cols=1)
+	D_w_ones = cbind(D, ones)
+	model = cbind(conditionals, prior)
+	YRaw = D_w_ones %*% t(log(model))
+
+	if(min(C) < 1)
+	  stop("Stopping due to invalid argument: Label vector (Y) must be recoded")
+
+	Y = rowIndexMax(YRaw)
+			
+	if(verbose){
+      num_classes = nrow(prior)
+	  num_classes_ground_truth = max(C)
+
+	  if(num_classes < num_classes_ground_truth)
+	    num_classes = num_classes_ground_truth
+	  confusion_mat = table(Y, C, num_classes, num_classes)
+	  write(confusion_mat, verbose, format="csv")

Review comment:
       The builtin functions should be side-effect free, which means there should not be a write, but instead return such intermediates so the function caller can write it out if necessary.

##########
File path: src/main/java/org/apache/sysds/common/Builtins.java
##########
@@ -182,6 +182,7 @@
 	NORMALIZE("normalize", true),
 	NROW("nrow", false),
 	NAIVEBAYES("naivebayes", true, false),
+	NAIVEBAYESPREDICT("naivebayesPredict", true, false),

Review comment:
       For the enum type, you might want to use underscores for making it more readable, but please do it consistently for NAIVEBAYES as well.

##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinNaiveBayesPredictTest.java
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.runtime.matrix.data.MatrixValue.CellIndex;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+public class BuiltinNaiveBayesPredictTest extends AutomatedTestBase
+{
+	private final static String TEST_NAME = "NaiveBayesPredict";
+	private final static String TEST_DIR = "functions/builtin/";
+	private final static String TEST_CLASS_DIR = TEST_DIR + BuiltinNaiveBayesPredictTest.class.getSimpleName() + "/";
+	private final static int numClasses = 10;
+
+	public double eps = 0.0000001;

Review comment:
       1e-7 for better readability.

##########
File path: src/test/scripts/functions/builtin/NaiveBayesPredict.R
##########
@@ -0,0 +1,74 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+args <- commandArgs(TRUE)
+
+library("Matrix")
+
+D = as.matrix(readMM(paste(args[1], "D.mtx", sep="")))
+C = as.matrix(readMM(paste(args[1], "C.mtx", sep="")))

Review comment:
       This seams again to replicate the naiveBayes training instead of scoring. Could we maybe compare against the existing R builtin function.

##########
File path: scripts/builtin/naivebayesPredict.dml
##########
@@ -0,0 +1,48 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+m_naivebayesPredict = function(Matrix[Double] D, Matrix[Double] C, Double probabilities = 1, Boolean verbose = TRUE)
+ return (Matrix[Double] YRaw,Matrix[Double] Y)
+{
+	numRows = nrow(D);

Review comment:
       Globally fix the indentation to two-spaces for dml script instead of tabs. Note that we use tabs only for Java source code.

##########
File path: src/test/scripts/functions/builtin/NaiveBayesPredict.dml
##########
@@ -0,0 +1,27 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+D = read($1);
+C = read($2);
+[YRaw,Y] = naivebayesPredict(D=D, C=C, probabilities= $4);

Review comment:
       please use spaces before/after assignments consistently (see assignment of $4).

##########
File path: scripts/builtin/naivebayesPredict.dml
##########
@@ -0,0 +1,48 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+m_naivebayesPredict = function(Matrix[Double] D, Matrix[Double] C, Double probabilities = 1, Boolean verbose = TRUE)

Review comment:
       Please rename to naiveBayesPredict (as the corresponding algorithm in R is called naiveBayes). Along this line please also change the name of the training builtin function to naiveBayes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org