You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2021/04/20 00:05:25 UTC

[GitHub] [systemds] fathollahzadeh opened a new pull request #1232: [DISLIBSVM] Add Distributed Read/Write LIBSVM on Spark

fathollahzadeh opened a new pull request #1232:
URL: https://github.com/apache/systemds/pull/1232


   1. Allows mapping RDD partitions of LIBSVM rows into a set of partial binary blocks
   2. Add file format properties to support custom delimiters between data instances and index value
   3. Update Read/Write CP instructions to support the file format properties
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] phaniarnab commented on pull request #1232: [DISLIBSVM] Add Distributed Read/Write LIBSVM on Spark

Posted by GitBox <gi...@apache.org>.
phaniarnab commented on pull request #1232:
URL: https://github.com/apache/systemds/pull/1232#issuecomment-829993289


   Thanks, @fathollahzadeh for fixing the space issues. I had a look into the changes and it already looks quite good. However, it would be great if you can fix the automatic application of code style to all the files. Currently, it is very hard to spot the actual changes done by your commits. In addition to that, the code style introduced some weird formatting all over the common files. Other than that, the file VariableCPInstruction.java still has spaces instead of tabs.
   Meanwhile, I will try to have a detailed look into the changes, run some tests, rebase, resolve the conflicts and make a commit to run the tests in github.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] fathollahzadeh commented on pull request #1232: [DISLIBSVM] Add Distributed Read/Write LIBSVM on Spark

Posted by GitBox <gi...@apache.org>.
fathollahzadeh commented on pull request #1232:
URL: https://github.com/apache/systemds/pull/1232#issuecomment-831155486


   > Thanks, @fathollahzadeh for fixing the space issues. I had a look into the changes and it already looks quite good. However, it would be great if you can fix the automatic application of code style to all the files. Currently, it is very hard to spot the actual changes done by your commits. In addition to that, the code style introduced some weird formatting all over the common files. Other than that, the file VariableCPInstruction.java still has spaces instead of tabs.
   > Meanwhile, I will try to have a detailed look into the changes, run some tests, rebase, resolve the conflicts and make a commit to run the tests in github.
   
   Thanks, @phaniarnab  for the review. I remove all unnecessary code styles in changed files. Now, the code style applied just for the task updates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] fathollahzadeh commented on a change in pull request #1232: [DISLIBSVM] Add Distributed Read/Write LIBSVM on Spark

Posted by GitBox <gi...@apache.org>.
fathollahzadeh commented on a change in pull request #1232:
URL: https://github.com/apache/systemds/pull/1232#discussion_r624988922



##########
File path: src/test/java/org/apache/sysds/test/functions/io/libsvm/WriteLIBSVMTest.java
##########
@@ -27,70 +27,70 @@
 
 public abstract class WriteLIBSVMTest extends WriteLIBSVMTestBase {
 
-  protected abstract int getId();
+	protected abstract int getId();
 
-  protected String getInputLIBSVMFileName() {
-    return "transfusion_W" + getId() + ".libsvm";
-  }
+	protected String getInputLIBSVMFileName() {
+		return "transfusion_W" + getId() + ".libsvm";
+	}
 
-  @Test public void testlibsvm1_Seq_CP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, false, " ", ":", false);
-  }
+	@Test public void testlibsvm1_Seq_CP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, false, " ", ":", false);
+	}
 
-  @Test public void testlibsvm2_Seq_CP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, false, " ", ":", true);
-  }
+	@Test public void testlibsvm2_Seq_CP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, false, " ", ":", true);
+	}
 
-  @Test public void testlibsvm1_Pllel_CP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, true, " ", ":", true);
-  }
+	@Test public void testlibsvm1_Pllel_CP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, true, " ", ":", true);
+	}
 
-  @Test public void testlibsvm2_Pllel_CP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, true, " ", ":", false);
-  }
+	@Test public void testlibsvm2_Pllel_CP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, true, " ", ":", false);
+	}
 
-  @Test public void testlibsvm1_SP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SPARK, false, " ", ":", true);
-  }
+	@Test public void testlibsvm1_SP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SPARK, false, " ", ":", true);
+	}
 
-  @Test public void testlibsvm2_SP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SPARK, false, " ", ":", false);
-  }
+	@Test public void testlibsvm2_SP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SPARK, false, " ", ":", false);
+	}
 
-  protected void runWriteLIBSVMTest(int testNumber, ExecMode platform, boolean parallel, String sep, String indSep,
-    boolean sparse) {
+	protected void runWriteLIBSVMTest(int testNumber, ExecMode platform, boolean parallel, String sep, String indSep,
+		boolean sparse) {
 
-    ExecMode oldPlatform = rtplatform;
-    rtplatform = platform;
+		ExecMode oldPlatform = rtplatform;
+		rtplatform = platform;
 
-    boolean sparkConfigOld = DMLScript.USE_LOCAL_SPARK_CONFIG;
-    if(rtplatform == ExecMode.SPARK)
-      DMLScript.USE_LOCAL_SPARK_CONFIG = true;
+		boolean sparkConfigOld = DMLScript.USE_LOCAL_SPARK_CONFIG;
+		if(rtplatform == ExecMode.SPARK)
+			DMLScript.USE_LOCAL_SPARK_CONFIG = true;
 
-    boolean oldpar = CompilerConfig.FLAG_PARREADWRITE_TEXT;
+		boolean oldpar = CompilerConfig.FLAG_PARREADWRITE_TEXT;
 
-    try {
+		try {
 
-      CompilerConfig.FLAG_PARREADWRITE_TEXT = parallel;
+			CompilerConfig.FLAG_PARREADWRITE_TEXT = parallel;
 
-      TestConfiguration config = getTestConfiguration(getTestName());
-      loadTestConfiguration(config);
+			TestConfiguration config = getTestConfiguration(getTestName());
+			loadTestConfiguration(config);
 
-      String HOME = SCRIPT_DIR + TEST_DIR;
-      String inputMatrixName = HOME + INPUT_DIR + getInputLIBSVMFileName();
-      String dmlOutput = output("dml.scalar");
-      String libsvmOutputName = output("libsvm_write" + testNumber + ".data");
+			String HOME = SCRIPT_DIR + TEST_DIR;
+			String inputMatrixName = HOME + INPUT_DIR + getInputLIBSVMFileName();
+			String dmlOutput = output("dml.scalar");
+			String libsvmOutputName = output("libsvm_write" + testNumber + ".data");
 
-      fullDMLScriptName = HOME + getTestName() + "_" + testNumber + ".dml";
-      programArgs = new String[] {"-args", inputMatrixName, dmlOutput, libsvmOutputName, sep, indSep,
-        Boolean.toString(sparse)};
+			fullDMLScriptName = HOME + getTestName() + "_" + testNumber + ".dml";
+			programArgs = new String[] {"-args", inputMatrixName, dmlOutput, libsvmOutputName, sep, indSep,
+				Boolean.toString(sparse)};
 
-      runTest(true, false, null, -1);
-    }
-    finally {
-      rtplatform = oldPlatform;
-      CompilerConfig.FLAG_PARREADWRITE_TEXT = oldpar;
-      DMLScript.USE_LOCAL_SPARK_CONFIG = sparkConfigOld;
-    }
-  }
+			runTest(true, false, null, -1);
+		}

Review comment:
       Tests added to verify read/write. Thank!.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] phaniarnab commented on pull request #1232: [DISLIBSVM] Add Distributed Read/Write LIBSVM on Spark

Posted by GitBox <gi...@apache.org>.
phaniarnab commented on pull request #1232:
URL: https://github.com/apache/systemds/pull/1232#issuecomment-826888345


   Looks like all tabs are replaced by spaces in all the files your commits touched. Your IDE probably has automatically applied these changes. Can you please have a look @fathollahzadeh? In addition to that, indentation is broken in the new classes. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] phaniarnab commented on a change in pull request #1232: [DISLIBSVM] Add Distributed Read/Write LIBSVM on Spark

Posted by GitBox <gi...@apache.org>.
phaniarnab commented on a change in pull request #1232:
URL: https://github.com/apache/systemds/pull/1232#discussion_r623782088



##########
File path: src/test/java/org/apache/sysds/test/functions/io/libsvm/WriteLIBSVMTest.java
##########
@@ -27,70 +27,70 @@
 
 public abstract class WriteLIBSVMTest extends WriteLIBSVMTestBase {
 
-  protected abstract int getId();
+	protected abstract int getId();
 
-  protected String getInputLIBSVMFileName() {
-    return "transfusion_W" + getId() + ".libsvm";
-  }
+	protected String getInputLIBSVMFileName() {
+		return "transfusion_W" + getId() + ".libsvm";
+	}
 
-  @Test public void testlibsvm1_Seq_CP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, false, " ", ":", false);
-  }
+	@Test public void testlibsvm1_Seq_CP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, false, " ", ":", false);
+	}
 
-  @Test public void testlibsvm2_Seq_CP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, false, " ", ":", true);
-  }
+	@Test public void testlibsvm2_Seq_CP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, false, " ", ":", true);
+	}
 
-  @Test public void testlibsvm1_Pllel_CP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, true, " ", ":", true);
-  }
+	@Test public void testlibsvm1_Pllel_CP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, true, " ", ":", true);
+	}
 
-  @Test public void testlibsvm2_Pllel_CP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, true, " ", ":", false);
-  }
+	@Test public void testlibsvm2_Pllel_CP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SINGLE_NODE, true, " ", ":", false);
+	}
 
-  @Test public void testlibsvm1_SP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SPARK, false, " ", ":", true);
-  }
+	@Test public void testlibsvm1_SP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SPARK, false, " ", ":", true);
+	}
 
-  @Test public void testlibsvm2_SP() {
-    runWriteLIBSVMTest(getId(), ExecMode.SPARK, false, " ", ":", false);
-  }
+	@Test public void testlibsvm2_SP() {
+		runWriteLIBSVMTest(getId(), ExecMode.SPARK, false, " ", ":", false);
+	}
 
-  protected void runWriteLIBSVMTest(int testNumber, ExecMode platform, boolean parallel, String sep, String indSep,
-    boolean sparse) {
+	protected void runWriteLIBSVMTest(int testNumber, ExecMode platform, boolean parallel, String sep, String indSep,
+		boolean sparse) {
 
-    ExecMode oldPlatform = rtplatform;
-    rtplatform = platform;
+		ExecMode oldPlatform = rtplatform;
+		rtplatform = platform;
 
-    boolean sparkConfigOld = DMLScript.USE_LOCAL_SPARK_CONFIG;
-    if(rtplatform == ExecMode.SPARK)
-      DMLScript.USE_LOCAL_SPARK_CONFIG = true;
+		boolean sparkConfigOld = DMLScript.USE_LOCAL_SPARK_CONFIG;
+		if(rtplatform == ExecMode.SPARK)
+			DMLScript.USE_LOCAL_SPARK_CONFIG = true;
 
-    boolean oldpar = CompilerConfig.FLAG_PARREADWRITE_TEXT;
+		boolean oldpar = CompilerConfig.FLAG_PARREADWRITE_TEXT;
 
-    try {
+		try {
 
-      CompilerConfig.FLAG_PARREADWRITE_TEXT = parallel;
+			CompilerConfig.FLAG_PARREADWRITE_TEXT = parallel;
 
-      TestConfiguration config = getTestConfiguration(getTestName());
-      loadTestConfiguration(config);
+			TestConfiguration config = getTestConfiguration(getTestName());
+			loadTestConfiguration(config);
 
-      String HOME = SCRIPT_DIR + TEST_DIR;
-      String inputMatrixName = HOME + INPUT_DIR + getInputLIBSVMFileName();
-      String dmlOutput = output("dml.scalar");
-      String libsvmOutputName = output("libsvm_write" + testNumber + ".data");
+			String HOME = SCRIPT_DIR + TEST_DIR;
+			String inputMatrixName = HOME + INPUT_DIR + getInputLIBSVMFileName();
+			String dmlOutput = output("dml.scalar");
+			String libsvmOutputName = output("libsvm_write" + testNumber + ".data");
 
-      fullDMLScriptName = HOME + getTestName() + "_" + testNumber + ".dml";
-      programArgs = new String[] {"-args", inputMatrixName, dmlOutput, libsvmOutputName, sep, indSep,
-        Boolean.toString(sparse)};
+			fullDMLScriptName = HOME + getTestName() + "_" + testNumber + ".dml";
+			programArgs = new String[] {"-args", inputMatrixName, dmlOutput, libsvmOutputName, sep, indSep,
+				Boolean.toString(sparse)};
 
-      runTest(true, false, null, -1);
-    }
-    finally {
-      rtplatform = oldPlatform;
-      CompilerConfig.FLAG_PARREADWRITE_TEXT = oldpar;
-      DMLScript.USE_LOCAL_SPARK_CONFIG = sparkConfigOld;
-    }
-  }
+			runTest(true, false, null, -1);
+		}

Review comment:
       Can you please add a way to verify if the written data is correct? You can read the output file in a R script and compare the sums. You can follow the example of WriteCSVTest.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] phaniarnab commented on pull request #1232: [DISLIBSVM] Add Distributed Read/Write LIBSVM on Spark

Posted by GitBox <gi...@apache.org>.
phaniarnab commented on pull request #1232:
URL: https://github.com/apache/systemds/pull/1232#issuecomment-833877840


   LGTM. Thanks, @fathollahzadeh for the patch. While merging I fixed a few formating issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] fathollahzadeh commented on pull request #1232: [DISLIBSVM] Add Distributed Read/Write LIBSVM on Spark

Posted by GitBox <gi...@apache.org>.
fathollahzadeh commented on pull request #1232:
URL: https://github.com/apache/systemds/pull/1232#issuecomment-826900325


   > Looks like all tabs are replaced by spaces in all the files your commits touched. Your IDE probably has automatically applied these changes. Can you please have a look @fathollahzadeh? In addition to that, indentation is broken in the new classes.
   
   @phaniarnab Thanks for reviewing the code. I'll fix the changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] phaniarnab closed pull request #1232: [DISLIBSVM] Add Distributed Read/Write LIBSVM on Spark

Posted by GitBox <gi...@apache.org>.
phaniarnab closed pull request #1232:
URL: https://github.com/apache/systemds/pull/1232


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org