You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2021/02/22 21:00:04 UTC

[GitHub] [systemds] vanessavieira opened a new pull request #1189: [WIP-DIA] Km and Cox functions

vanessavieira opened a new pull request #1189:
URL: https://github.com/apache/systemds/pull/1189


   * Added DML script for Km and Cox based on existing ones (improving script, removing read/write statements, etc)
   * Added functions to Builtin.java
   * Added set-up for test class for both Km and Cox (still missing tests)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-803561497


   > ok thanks @vanessavieira - I now did some basic cleanups of the scripts, and removed the empty comparison files. Could you please associate your github account with the email address you used for the commits. Thanks.
   > 
   > I'll do the output comparison in the tests and related fixes in the builtin functions as a follow up project.
   
   All right, thank you! From what I see in the settings my github is already associated with my primary email address (vieira.vanessasoares@gmail.com).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Shafaq-Siddiqi commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
Shafaq-Siddiqi commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-790642122


   Hi,
   Thank you for your contribution. The PR looks in a good shape. Just one minor task if you could please add the R scripts and compare the functions with equivalent R functions. You can have a look into other built-in i.e., winsorize, to see how we are comparing them against R functions. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira commented on a change in pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira commented on a change in pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#discussion_r585022263



##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinCoxTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.junit.Test;
+
+public class BuiltinCoxTest extends AutomatedTestBase
+{
+	private final static String TEST_NAME = "cox";
+	private final static String TEST_DIR = "functions/builtin/";
+	private final static String TEST_CLASS_DIR = TEST_DIR + BuiltinCoxTest.class.getSimpleName() + "/";
+
+	@Override
+	public void setUp() {
+		addTestConfiguration(TEST_NAME, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME,new String[]{"B"}));
+	}
+
+	@Test
+	public void testFunction() {
+		runCoxTest(50, 2.0, 1.5, 0.8, 100, 0.05, 1.0,0.000001, 100, 0);
+	}
+	
+	public void runCoxTest(int numRecords, double scaleWeibull, double shapeWeibull, double prob,
+						   int numFeatures, double sparsity, double alpha, double tol, int moi, int mii) {
+		Types.ExecMode platformOld = setExecMode(Types.ExecMode.SINGLE_NODE);
+		loadTestConfiguration(getTestConfiguration(TEST_NAME));
+		String HOME = SCRIPT_DIR + TEST_DIR;
+		fullDMLScriptName = HOME + TEST_NAME + ".dml";
+
+		programArgs = new String[]{
+				"-nvargs", "M=" + output("M"), "S=" + output("S"), "T=" + output("T"), "COV=" + output("COV"),
+				"RT=" + output("RT"), "XO=" + output("XO"), "n=" + numRecords, "l=" + scaleWeibull,
+				"v=" + shapeWeibull, "p=" + prob, "m=" + numFeatures, "sp=" + sparsity,
+				"alpha=" + alpha, "tol=" + tol, "moi=" + moi, "mii=" + mii};
+
+		runTest(true, false, null, -1);

Review comment:
       I see and I agree I should be testing the output. The problem I faced with testing is how to know the output I am expecting. First I needed to know how the input should look like and for that, I used parts of the data generation for survival analysis script (https://github.com/apache/systemds/blob/master/scripts/datagen/genRandData4SurvAnalysis.dml) but with that I don't really have a way of knowing what should be the output. Do you think I should create an R script that accepts the same input and then checks if the output returned is the same? I'm not sure how much trouble that would be or if there's already both algorithms implemented in R 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira commented on a change in pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira commented on a change in pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#discussion_r585023614



##########
File path: src/test/scripts/functions/builtin/cox.dml
##########
@@ -0,0 +1,69 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+num_records = $n;
+lambda = $l;
+p_event = $p;
+# parameters related to the cox model
+num_features = $m;
+sparsity = $sp;
+p_censor = 1 - p_event; # prob. that record is censored
+
+v = $v;
+# generate feature matrix
+X_t = rand (rows = num_records, cols = num_features, min = 1, max = 5, pdf = "uniform", sparsity = sparsity);

Review comment:
       I tried that in the beginning but I was a bit unsure if what I coded in Java was really reflecting the data generation of this script (https://github.com/apache/systemds/blob/master/scripts/datagen/genRandData4SurvAnalysis.dml). I could try that again though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-790056109


   * Fixed identation issues for cox and km
   * Added a seed to the random data generator for running the script
   * For cox.dml I still face the SingleMatrixException. I did fix the return for S and some of T and for some of M. The parts of the script that are commented with # indicates the line where when it's run, I have the exception. I still could not figure it out why.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-795503012


   > > > Hi,
   > > > Thank you for your contribution. The PR looks in a good shape. Just one minor task if you could please add the R scripts and compare the functions with equivalent R functions. You can have a look into other built-in i.e., winsorize, to see how we are comparing them against R functions.
   > > 
   > > 
   > > Could you help me figure it out how I should have the input in the R script? You gave me the idea to generate the input data within the test/*.dml script and so I am not really sure how I should do with the R script
   > 
   > You need to search for the Km() and cox() functions in the R library then create a cox.R file and call the R implementation of cox() on the same dataset that you are using in cox.dml file and write the results. In the BuiltinCox.java file run the R script and read both R and DML output and compare them. Do the same for km. You can refer to winsorize.R script to see how the inputs are read and written.
   
   Yes, I understand that. I know what needs to be done. The issue is that I am generating the input inside the test dml script. I have tried storing this generated input and use it as an output in the java test (to use it as an input for the R script) but that did not work. I will close the pull request since I feel stuck for quite a while and will probably not meet the requirements for the  project to be finished.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira commented on a change in pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira commented on a change in pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#discussion_r585024111



##########
File path: scripts/builtin/km.dml
##########
@@ -0,0 +1,656 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+#  
+# Builtin function that implements the analysis of survival data with KAPLAN-MEIER estimates
+#
+# INPUT   PARAMETERS:
+# ---------------------------------------------------------------------------------------------
+# NAME    TYPE     DEFAULT      MEANING
+# ---------------------------------------------------------------------------------------------
+# X       String   ---          Location to read the input matrix X containing the survival data: 
+#								timestamps, whether event occurred (1) or data is censored (0), and a number of factors (categorical features) 
+#								for grouping and/or stratifying 

Review comment:
       Oh, I must have missed that information. Will add space instead of tabs. Thanks :) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on a change in pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#discussion_r585447508



##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinCoxTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.junit.Test;
+
+public class BuiltinCoxTest extends AutomatedTestBase
+{
+	private final static String TEST_NAME = "cox";
+	private final static String TEST_DIR = "functions/builtin/";
+	private final static String TEST_CLASS_DIR = TEST_DIR + BuiltinCoxTest.class.getSimpleName() + "/";
+
+	@Override
+	public void setUp() {
+		addTestConfiguration(TEST_NAME, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME,new String[]{"B"}));
+	}
+
+	@Test
+	public void testFunction() {
+		runCoxTest(50, 2.0, 1.5, 0.8, 100, 0.05, 1.0,0.000001, 100, 0);
+	}
+	
+	public void runCoxTest(int numRecords, double scaleWeibull, double shapeWeibull, double prob,
+						   int numFeatures, double sparsity, double alpha, double tol, int moi, int mii) {
+		Types.ExecMode platformOld = setExecMode(Types.ExecMode.SINGLE_NODE);
+		loadTestConfiguration(getTestConfiguration(TEST_NAME));
+		String HOME = SCRIPT_DIR + TEST_DIR;
+		fullDMLScriptName = HOME + TEST_NAME + ".dml";
+
+		programArgs = new String[]{
+				"-nvargs", "M=" + output("M"), "S=" + output("S"), "T=" + output("T"), "COV=" + output("COV"),
+				"RT=" + output("RT"), "XO=" + output("XO"), "n=" + numRecords, "l=" + scaleWeibull,
+				"v=" + shapeWeibull, "p=" + prob, "m=" + numFeatures, "sp=" + sparsity,
+				"alpha=" + alpha, "tol=" + tol, "moi=" + moi, "mii=" + mii};
+
+		runTest(true, false, null, -1);

Review comment:
       baunsgaard@tugraz.at




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on a change in pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#discussion_r585082788



##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinCoxTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.junit.Test;
+
+public class BuiltinCoxTest extends AutomatedTestBase
+{
+	private final static String TEST_NAME = "cox";
+	private final static String TEST_DIR = "functions/builtin/";
+	private final static String TEST_CLASS_DIR = TEST_DIR + BuiltinCoxTest.class.getSimpleName() + "/";
+
+	@Override
+	public void setUp() {
+		addTestConfiguration(TEST_NAME, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME,new String[]{"B"}));
+	}
+
+	@Test
+	public void testFunction() {
+		runCoxTest(50, 2.0, 1.5, 0.8, 100, 0.05, 1.0,0.000001, 100, 0);
+	}
+	
+	public void runCoxTest(int numRecords, double scaleWeibull, double shapeWeibull, double prob,
+						   int numFeatures, double sparsity, double alpha, double tol, int moi, int mii) {
+		Types.ExecMode platformOld = setExecMode(Types.ExecMode.SINGLE_NODE);
+		loadTestConfiguration(getTestConfiguration(TEST_NAME));
+		String HOME = SCRIPT_DIR + TEST_DIR;
+		fullDMLScriptName = HOME + TEST_NAME + ".dml";
+
+		programArgs = new String[]{
+				"-nvargs", "M=" + output("M"), "S=" + output("S"), "T=" + output("T"), "COV=" + output("COV"),
+				"RT=" + output("RT"), "XO=" + output("XO"), "n=" + numRecords, "l=" + scaleWeibull,
+				"v=" + shapeWeibull, "p=" + prob, "m=" + numFeatures, "sp=" + sparsity,
+				"alpha=" + alpha, "tol=" + tol, "moi=" + moi, "mii=" + mii};
+
+		runTest(true, false, null, -1);

Review comment:
       it is a challenge to test the algorithms, if there is an equivalent algorithm in R then that would be fine to compare with. But best would be if we could test it ourselves. My suggestion is to keep it simple, and try to think of the algorithm as a black box. If you know what it does, then try to deconstruct what behavior is expected based on as simple an input as possible. and then expand from there.
   
   I don't expect you to test in all edge cases, just that we know it works on commonly expected inputs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-803386237


   ok thanks @vanessavieira - I now did some basic cleanups of the scripts, and removed the empty comparison files. Could you please associate your github account with the email address you used for the commits. Thanks.
   
   I'll do the output comparison in the tests and related fixes in the builtin functions as a follow up project. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] asfgit closed pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1189:
URL: https://github.com/apache/systemds/pull/1189


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-795698216


   thanks for your effort @vanessavieira - I'll take a look into the remaining issues and merge this next week.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Shafaq-Siddiqi commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
Shafaq-Siddiqi commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-794317442


   > > Hi,
   > > Thank you for your contribution. The PR looks in a good shape. Just one minor task if you could please add the R scripts and compare the functions with equivalent R functions. You can have a look into other built-in i.e., winsorize, to see how we are comparing them against R functions.
   > 
   > Could you help me figure it out how I should have the input in the R script? You gave me the idea to generate the input data within the test/*.dml script and so I am not really sure how I should do with the R script
   
   You need to search for the Km() and cox() functions in the R library then create a cox.R file and call the R implementation of cox() on the same dataset that you are using in cox.dml file and write the results. In the BuiltinCox.java file run the R script and read both R and DML output and compare them. Do the same for km. You can refer to winsorize.R script to see how the inputs are read and written. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-803614808


   Aaa, I see. I also had another institutional email from my bachelor's university (vsv@ic.ufal.br) so I removed that one and added my tugraz email (vanessa.soaresvieira@student.tugraz.at) to my github account. Hope it's all good now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-803630982


   awesome - thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira commented on a change in pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira commented on a change in pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#discussion_r585404749



##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinCoxTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.junit.Test;
+
+public class BuiltinCoxTest extends AutomatedTestBase
+{
+	private final static String TEST_NAME = "cox";
+	private final static String TEST_DIR = "functions/builtin/";
+	private final static String TEST_CLASS_DIR = TEST_DIR + BuiltinCoxTest.class.getSimpleName() + "/";
+
+	@Override
+	public void setUp() {
+		addTestConfiguration(TEST_NAME, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME,new String[]{"B"}));
+	}
+
+	@Test
+	public void testFunction() {
+		runCoxTest(50, 2.0, 1.5, 0.8, 100, 0.05, 1.0,0.000001, 100, 0);
+	}
+	
+	public void runCoxTest(int numRecords, double scaleWeibull, double shapeWeibull, double prob,
+						   int numFeatures, double sparsity, double alpha, double tol, int moi, int mii) {
+		Types.ExecMode platformOld = setExecMode(Types.ExecMode.SINGLE_NODE);
+		loadTestConfiguration(getTestConfiguration(TEST_NAME));
+		String HOME = SCRIPT_DIR + TEST_DIR;
+		fullDMLScriptName = HOME + TEST_NAME + ".dml";
+
+		programArgs = new String[]{
+				"-nvargs", "M=" + output("M"), "S=" + output("S"), "T=" + output("T"), "COV=" + output("COV"),
+				"RT=" + output("RT"), "XO=" + output("XO"), "n=" + numRecords, "l=" + scaleWeibull,
+				"v=" + shapeWeibull, "p=" + prob, "m=" + numFeatures, "sp=" + sparsity,
+				"alpha=" + alpha, "tol=" + tol, "moi=" + moi, "mii=" + mii};
+
+		runTest(true, false, null, -1);

Review comment:
       @Baunsgaard could you send me your email so I can maybe describe better some questions I still have regarding this? Unfortunately both algorithms are not simple enough so I know exactly the output in those simple cases. I feel extremely lost. Even in simple cases I am not sure how the formatting should be for the timestamp generation, for example. And by using the script for data generation I have no way of knowing exactly what the input is to know what the output should be as well. If you could send me your email we could maybe talk better to what I am expected to do here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira closed pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira closed pull request #1189:
URL: https://github.com/apache/systemds/pull/1189


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on a change in pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#discussion_r584967658



##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinCoxTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.junit.Test;
+
+public class BuiltinCoxTest extends AutomatedTestBase
+{
+	private final static String TEST_NAME = "cox";
+	private final static String TEST_DIR = "functions/builtin/";
+	private final static String TEST_CLASS_DIR = TEST_DIR + BuiltinCoxTest.class.getSimpleName() + "/";
+
+	@Override
+	public void setUp() {
+		addTestConfiguration(TEST_NAME, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME,new String[]{"B"}));
+	}
+
+	@Test
+	public void testFunction() {
+		runCoxTest(50, 2.0, 1.5, 0.8, 100, 0.05, 1.0,0.000001, 100, 0);
+	}
+	
+	public void runCoxTest(int numRecords, double scaleWeibull, double shapeWeibull, double prob,
+						   int numFeatures, double sparsity, double alpha, double tol, int moi, int mii) {
+		Types.ExecMode platformOld = setExecMode(Types.ExecMode.SINGLE_NODE);
+		loadTestConfiguration(getTestConfiguration(TEST_NAME));
+		String HOME = SCRIPT_DIR + TEST_DIR;
+		fullDMLScriptName = HOME + TEST_NAME + ".dml";
+
+		programArgs = new String[]{
+				"-nvargs", "M=" + output("M"), "S=" + output("S"), "T=" + output("T"), "COV=" + output("COV"),
+				"RT=" + output("RT"), "XO=" + output("XO"), "n=" + numRecords, "l=" + scaleWeibull,
+				"v=" + shapeWeibull, "p=" + prob, "m=" + numFeatures, "sp=" + sparsity,
+				"alpha=" + alpha, "tol=" + tol, "moi=" + moi, "mii=" + mii};
+
+		runTest(true, false, null, -1);

Review comment:
       This is not sufficient testing, since this only runs the script. but does not verify that the output is correct.
   I would suggest making as simple inputs as possible where you know a certain output is produced, and then verify these.

##########
File path: src/test/java/org/apache/sysds/test/functions/builtin/BuiltinKmTest.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.builtin;
+
+import org.apache.sysds.common.Types;
+import org.apache.sysds.lops.LopProperties;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.apache.sysds.test.TestConfiguration;
+import org.apache.sysds.test.TestUtils;
+import org.junit.Test;
+
+public class BuiltinKmTest extends AutomatedTestBase
+{
+	private final static String TEST_NAME = "km";
+	private final static String TEST_DIR = "functions/builtin/";
+	private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinKmTest.class.getSimpleName() + "/";
+
+	@Override
+	public void setUp() {
+		TestUtils.clearAssertionInformation();
+		addTestConfiguration(TEST_NAME, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME,new String[]{"C"}));
+	}
+
+	@Test
+	public void testKmDefaultConfiguration() {
+		runKmTest(50, 2.0, 1.5, 0.8, 2,
+				1, 10, 0.05,"greenwood", "log", "none");
+	}
+	@Test
+	public void testKmErrTypePeto() {
+		runKmTest(50, 2.0, 1.5, 0.8, 2,
+				1, 10, 0.05,"peto", "log", "none");
+	}
+	@Test
+	public void testKmConfTypePlain() {
+		runKmTest(50, 2.0, 1.5, 0.8, 2,
+				1, 10, 0.05,"greenwood", "plain", "none");
+	}
+	@Test
+	public void testKmConfTypeLogLog() {
+		runKmTest(50, 2.0, 1.5, 0.8, 2,
+				1, 10, 0.05,"greenwood", "log-log", "none");
+	}
+
+	private void runKmTest(int numRecords, double scaleWeibull, double shapeWeibull, double prob,
+						   int numCatFeaturesGroup, int numCatFeaturesStrat, int maxNumLevels, double alpha, String err_type,
+						   String conf_type, String test_type) {
+		Types.ExecMode platformOld = setExecMode(LopProperties.ExecType.SPARK);
+		loadTestConfiguration(getTestConfiguration(TEST_NAME));
+		String HOME = SCRIPT_DIR + TEST_DIR;
+		fullDMLScriptName = HOME + TEST_NAME + ".dml";
+
+		programArgs = new String[]{
+				"-nvargs", "O=" + output("O"), "M=" + output("M"), "T=" + output("T"),
+				"T_GROUPS_OE=" + output("T_GROUPS_OE"), "n=" + numRecords, "l=" + scaleWeibull,
+				"v=" + shapeWeibull, "p=" + prob, "g=" + numCatFeaturesGroup, "s=" + numCatFeaturesStrat,
+				"f=" + maxNumLevels, "alpha=" + alpha, "err_type=" + err_type,
+				"conf_type=" + conf_type, "test_type=" + test_type};
+
+		runTest(true, false, null, -1);

Review comment:
       same testing problem here.

##########
File path: src/test/scripts/functions/builtin/cox.dml
##########
@@ -0,0 +1,69 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+num_records = $n;
+lambda = $l;
+p_event = $p;
+# parameters related to the cox model
+num_features = $m;
+sparsity = $sp;
+p_censor = 1 - p_event; # prob. that record is censored
+
+v = $v;
+# generate feature matrix
+X_t = rand (rows = num_records, cols = num_features, min = 1, max = 5, pdf = "uniform", sparsity = sparsity);

Review comment:
       give it a seed to make it not random, or generate a known matrix inside your tests and then read this from disk.
   Some of the other builtin functions test could inspire.

##########
File path: scripts/builtin/km.dml
##########
@@ -0,0 +1,656 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+#  
+# Builtin function that implements the analysis of survival data with KAPLAN-MEIER estimates
+#
+# INPUT   PARAMETERS:
+# ---------------------------------------------------------------------------------------------
+# NAME    TYPE     DEFAULT      MEANING
+# ---------------------------------------------------------------------------------------------
+# X       String   ---          Location to read the input matrix X containing the survival data: 
+#								timestamps, whether event occurred (1) or data is censored (0), and a number of factors (categorical features) 
+#								for grouping and/or stratifying 

Review comment:
       our coding conventions try to use spaces instead of tabs in dml scripts.
   this is why it looks strange on GitHub, because it is not consistent.
   Try to replace all tabs with spaces. In the documentation up here, make the new lines spaces allign with the above line.

##########
File path: scripts/builtin/km.dml
##########
@@ -0,0 +1,656 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+#  
+# Builtin function that implements the analysis of survival data with KAPLAN-MEIER estimates
+#
+# INPUT   PARAMETERS:
+# ---------------------------------------------------------------------------------------------
+# NAME    TYPE     DEFAULT      MEANING
+# ---------------------------------------------------------------------------------------------
+# X       String   ---          Location to read the input matrix X containing the survival data: 
+#								timestamps, whether event occurred (1) or data is censored (0), and a number of factors (categorical features) 
+#								for grouping and/or stratifying 
+# TE	  String   ---          Column indices of X which contain timestamps (first entry) and event information (second entry) 
+# GI	  String   ---          Column indices of X corresponding to the factors to be used for grouping
+# SI	  String   ---          Column indices of X corresponding to the factors to be used for stratifying				
+# O       String   ---          Location to write the matrix containing the results of the Kaplan-Meier analysis; see below for the description
+# M       String   ---          Location to write Matrix M containing the following statistic: total number of events, median and its confidence intervals; 
+#								if survival data for multiple groups and strata are provided each row of M contains the above statistics per group and stratum
+# T 	  String   " "			If survival data from multiple groups available and ttype=log-rank or wilcoxon, 
+#								location to write the matrix containing result of the (stratified) test for comparing multiple groups
+# alpha   Double   0.05         Parameter to compute 100*(1-alpha)% confidence intervals for the survivor function and its median 
+# err_type   String   "greenwood"  Parameter to specify the error type according to "greenwood" (the default) or "peto"
+# conf_type   String   "log"        Parameter to modify the confidence interval; "plain" keeps the lower and upper bound of
+#								the confidence interval unmodified,	"log" (the default) corresponds to logistic transformation and 
+#								"log-log" corresponds to the complementary log-log transformation 
+# test_type   String   "none"   	If survival data for multiple groups is available specifies which test to perform for comparing
+#								survival data across multiple groups: "none" (the default) "log-rank" or "wilcoxon" test   
+# fmtO     String   "text"       The output format of results of the Kaplan-Meier analysis, such as "text" or "csv"
+# ---------------------------------------------------------------------------------------------
+# OUTPUT: 
+# 1- Matrix KM whose dimension depends on the number of groups (denoted by g) and strata (denoted by s) in the data: 
+#	each collection of 7 consecutive columns in KM corresponds to a unique combination of groups and strata in the data with the following schema
+# 	1. col: timestamp
+# 	2. col: no. at risk
+# 	3. col: no. of events
+# 	4. col: Kaplan-Meier estimate of survivor function surv
+# 	5. col: standard error of surv
+# 	6. col: lower 100*(1-alpha)% confidence interval for surv
+# 	7. col: upper 100*(1-alpha)% confidence interval for surv
+# 2- Matrix M whose dimension depends on the number of groups (g) and strata (s) in the data (k denotes the number of factors used for grouping 
+#	,i.e., ncol(GI) and l denotes the number of factors used for stratifying, i.e., ncol(SI))
+#	M[,1:k]: unique combination of values in the k factors used for grouping 
+#	M[,(k+1):(k+l)]: unique combination of values in the l factors used for stratifying
+#	M[,k+l+1]: total number of records
+#	M[,k+l+2]: total number of events
+#	M[,k+l+3]: median of surv
+#	M[,k+l+4]: lower 100*(1-alpha)% confidence interval of the median of surv 
+#	M[,k+l+5]: upper 100*(1-alpha)% confidence interval of the median of surv
+#	If the number of groups and strata is equal to 1, M will have 4 columns with 
+#	M[,1]: total number of events
+#	M[,2]: median of surv
+#	M[,3]: lower 100*(1-alpha)% confidence interval of the median of surv 
+#	M[,4]: upper 100*(1-alpha)% confidence interval of the median of surv
+# 3- If survival data from multiple groups available and ttype=log-rank or wilcoxon, a 1 x 4 matrix T and an g x 5 matrix T_GROUPS_OE with
+#	T_GROUPS_OE[,1] = no. of events	
+#	T_GROUPS_OE[,2] = observed value (O)
+#	T_GROUPS_OE[,3] = expected value (E)
+#	T_GROUPS_OE[,4] = (O-E)^2/E
+#	T_GROUPS_OE[,5] = (O-E)^2/V 	 
+#	T[1,1] = no. of groups
+#	T[1,2] = degree of freedom for Chi-squared distributed test statistic
+#	T[1,3] = test statistic 
+#	T[1,4] = P-value
+# -------------------------------------------------------------------------------------------
+
+m_km = function(Matrix[Double] X, Matrix[Double] TE, Matrix[Double] GI, Matrix[Double] SI,
+    Double alpha = 0.05, String err_type = "greenwood",
+    String conf_type = "log", String test_type = "none")
+  return (Matrix[Double] O, Matrix[Double] M,
+  Matrix[Double] T, Matrix[Double] T_GROUPS_OE) {
+
+
+if (ncol(GI) != 0 & nrow(GI) != 0)  {

Review comment:
       indent the content of the function.

##########
File path: src/test/scripts/functions/builtin/cox.dml
##########
@@ -0,0 +1,69 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+num_records = $n;
+lambda = $l;
+p_event = $p;
+# parameters related to the cox model
+num_features = $m;
+sparsity = $sp;
+p_censor = 1 - p_event; # prob. that record is censored
+
+v = $v;
+# generate feature matrix
+X_t = rand (rows = num_records, cols = num_features, min = 1, max = 5, pdf = "uniform", sparsity = sparsity);
+# generate coefficients
+B = rand (rows = num_features, cols = 1, min = -1.0, max = 1.0, pdf = "uniform", sparsity = 1.0);
+
+# generate timestamps
+U = rand (rows = num_records, cols = 1, min = 0.000000001, max = 1);
+T = (-log (U) / (lambda * exp (X_t %*% B)) ) ^ (1/v);
+
+Y = matrix (0, rows = num_records, cols = 2);
+event = floor (rand (rows = num_records, cols = 1, min = (1 - p_censor), max = (1 + p_event)));
+n_time = sum (event);
+Y[,2] = event;
+
+# binning of event times
+min_T = min (T);
+max_T = max (T);
+# T = T - min_T;
+len = max_T - min_T;
+num_bins = len / n_time;
+T = ceil (T / num_bins);
+
+# print ("min(T) " + min(T) + " max(T) " + max(T));
+Y[,1] = T;
+
+X = cbind (Y, X_t);
+
+TE = matrix ("1 2", rows = 2, cols = 1);
+F = seq (1, num_features);
+R = matrix (0, rows = 1, cols = 1);
+
+[M, S, T, COV, RT, XO] = cox(X, TE, F, R, $alpha, $tol, $moi, $mii);
+
+write(M, $M);
+write(S, $S);
+write(T, $T);
+write(COV, $COV);
+write(RT, $RT);
+write(XO, $XO);

Review comment:
       good that you write out the results, now use them to verify.

##########
File path: scripts/builtin/cox.dml
##########
@@ -0,0 +1,478 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+#  
+# THIS SCRIPT FITS A COX PROPORTIONAL HAZARD REGRESSION MODEL.
+# The Breslow method is used for handling ties and the regression parameters 
+# are computed using trust region newton method with conjugate gradient 
+# 
+# INPUT   PARAMETERS:
+# ---------------------------------------------------------------------------------------------
+# NAME    TYPE     DEFAULT      MEANING
+# ---------------------------------------------------------------------------------------------
+# X       String   ---          Location to read the input matrix X containing the survival data containing the following information
+# 								 - 1: timestamps 
+#								 - 2: whether an event occurred (1) or data is censored (0)
+#								 - 3: feature vectors 	
+# TE	  String   ---          Column indices of X as a column vector which contain timestamp (first row) and event information (second row)
+# F 	  String   " "			Column indices of X as a column vector which are to be used for fitting the Cox model
+# R   	  String   " "	        If factors (categorical variables) are available in the input matrix X, location to read matrix R containing 
+#								the start and end indices of the factors in X
+#								 - R[,1]: start indices
+#								 - R[,2]: end indices	
+#								Alternatively, user can specify the indices of the baseline level of each factor which needs to be removed from X; 
+#								in this case the start and end indices corresponding to the baseline level need to be the same;
+#								if R is not provided by default all variables are considered to be continuous 
+# M       String   ---          Location to store the results of Cox regression analysis including estimated regression parameters of the fitted 
+#								Cox model (the betas), their standard errors, confidence intervals, and P-values 
+# S       String   " "          Location to store a summary of some statistics of the fitted cox proportional hazard model including			
+#								no. of records, no. of events, log-likelihood, AIC, Rsquare (Cox & Snell), and max possible Rsquare; 
+#								by default is standard output  
+# T       String   " "          Location to store the results of Likelihood ratio test, Wald test, and Score (log-rank) test of the fitted model;
+#								by default is standard output 
+# COV	  String   ---			Location to store the variance-covariance matrix of the betas
+# RT      String   ---			Location to store matrix RT containing the order-preserving recoded timestamps from X 
+# XO      String   ---			Location to store sorted input matrix by the timestamps 
+# MF      String   ---          Location to store column indices of X excluding the baseline factors if available
+# alpha   Double   0.05         Parameter to compute a 100*(1-alpha)% confidence interval for the betas  
+# tol     Double   0.000001     Tolerance ("epsilon")
+# moi     Int      100     		Max. number of outer (Newton) iterations
+# mii     Int      0      		Max. number of inner (conjugate gradient) iterations, 0 = no max   
+# fmt     String   "text"       Matrix output format, usually "text" or "csv" (for matrices only)
+# ---------------------------------------------------------------------------------------------
+# OUTPUT: 
+# 1- A D x 7 matrix M, where D denotes the number of covariates, with the following schema:
+#	M[,1]: betas	
+#	M[,2]: exp(betas)
+#	M[,3]: standard error of betas
+#	M[,4]: Z 
+#	M[,5]: P-value
+#	M[,6]: lower 100*(1-alpha)% confidence interval of betas
+#	M[,7]: upper 100*(1-alpha)% confidence interval of betas
+#
+# Two matrices containing a summary of some statistics of the fitted model:
+# 1- File S with the following format 
+#	- row 1: no. of observations
+#	- row 2: no. of events
+#   - row 3: log-likelihood 
+#	- row 4: AIC
+#	- row 5: Rsquare (Cox & Snell)
+#	- row 6: max possible Rsquare
+# 2- File T with the following format
+#	- row 1: Likelihood ratio test statistic, degree of freedom, P-value
+#	- row 2: Wald test statistic, degree of freedom, P-value
+#	- row 3: Score (log-rank) test statistic, degree of freedom, P-value
+# 
+# Additionally, the following matrices are stored (needed for prediction)
+# 1- A column matrix RT that contains the order-preserving recoded timestamps from X 
+# 2- Matrix XO which is matrix X with sorted timestamps  
+# 3- Variance-covariance matrix of the betas COV
+# 4- A column matrix MF that contains the column indices of X with the baseline factors removed (if available)
+# -------------------------------------------------------------------------------------------
+m_cox = function(Matrix[Double] X, Matrix[Double] TE, Matrix[Double] F, Matrix[Double] R,
+        Double alpha = 0.05, Double tol = 0.000001, Int moi = 100, Int mii = 0)
+        return (Matrix[Double] M, Matrix[Double] S, Matrix[Double] T,
+                Matrix[Double] COV, Matrix[Double] RT, Matrix[Double] XO) {
+
+X_orig = X;

Review comment:
       here also indent the content of the function




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-786854374


   Finished implementation + testing. Unfortunately I have some issues with returning the values from the cox script, I keep having a SingularMatrixException but I do not know how to fix it. I have tried many things but it did not work so far. Any help from the reviewers is much appreciated. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-803583929


   well, your commits used your tugraz email address and thus github is unable to link the commit to your handle - you can associate multiple emails addresses with your github account.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] vanessavieira commented on pull request #1189: [DIA] Km and Cox functions

Posted by GitBox <gi...@apache.org>.
vanessavieira commented on pull request #1189:
URL: https://github.com/apache/systemds/pull/1189#issuecomment-791388900


   > Hi,
   > Thank you for your contribution. The PR looks in a good shape. Just one minor task if you could please add the R scripts and compare the functions with equivalent R functions. You can have a look into other built-in i.e., winsorize, to see how we are comparing them against R functions.
   
   Could you help me figure it out how I should have the input in the R script? You gave me the idea to generate the input data within the test/*.dml script and so I am not really sure how I should do with the R script


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org