You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by GitBox <gi...@apache.org> on 2020/04/20 12:43:56 UTC

[GitHub] [systemml] kev-inn opened a new pull request #893: Federated Frames

kev-inn opened a new pull request #893:
URL: https://github.com/apache/systemml/pull/893


   This PR adds federated frames. It also uses the `federated` builtin, which now has an additional parameter `type`. `type` can either be set to `"matrix"` or `"frame"` (case-insensitive).
   SPARK execution mode is not supported (as with many other federated commands), once we have a better integration of federation during instruction building (always knowing if hop/lop is federated) both CP and SPARK should be supported.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] kev-inn commented on a change in pull request #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
kev-inn commented on a change in pull request #893:
URL: https://github.com/apache/systemml/pull/893#discussion_r416600131



##########
File path: src/test/java/org/apache/sysds/test/functions/federated/FederatedConstructionTest.java
##########
@@ -62,28 +70,59 @@ public void setUp() {
 	}
 
 	@Test
-	public void federatedConstructionCP() {
-		federatedConstruction(Types.ExecMode.SINGLE_NODE);
+	public void federatedMatrixConstructionCP() {
+		federatedMatrixConstruction(Types.ExecMode.SINGLE_NODE);
+	}
+
+	@Test
+	public void federatedMatrixConstructionSP() {
+		federatedMatrixConstruction(Types.ExecMode.SPARK);
 	}
 
+	public void federatedMatrixConstruction(Types.ExecMode execMode) {
+		getAndLoadTestConfiguration(TEST_NAME);
+		// write input matrix
+		double[][] A = getRandomMatrix(rows, cols, -1, 1, 1, 1234);
+		writeInputMatrixWithMTD("A", A, false, new MatrixCharacteristics(rows, cols, blocksize, rows * cols));
+		federatedConstruction(execMode, MATRIX_TEST_FILE_NAME, "A", null);
+	}
+	
 	@Test
-	public void federatedConstructionSP() {
-		federatedConstruction(Types.ExecMode.SPARK);
+	public void federatedFrameConstructionCP() throws IOException {
+		federatedFrameConstruction(Types.ExecMode.SINGLE_NODE);
+	}
+	
+	/* like other federated functionality, SPARK execution mode is not yet working (waiting for better integration

Review comment:
       This is a rather large change and not specific to federated construction, it is clear that a better integration is necessary in the future so I am not sure if a TODO is helpful or confusing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] kev-inn commented on a change in pull request #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
kev-inn commented on a change in pull request #893:
URL: https://github.com/apache/systemml/pull/893#discussion_r416601940



##########
File path: src/main/java/org/apache/sysds/runtime/util/UtilFunctions.java
##########
@@ -41,6 +36,18 @@
 import org.apache.sysds.runtime.matrix.data.Pair;
 import org.apache.sysds.runtime.meta.TensorCharacteristics;
 
+import java.util.ArrayList;

Review comment:
       I agree, sadly Intellj sometimes decides to reorder imports and reverting that gives me some trouble most of the time. I don't see benefit for any import order so I can try to revert it or the person merging it in can decide to revert it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] j143 commented on a change in pull request #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
j143 commented on a change in pull request #893:
URL: https://github.com/apache/systemml/pull/893#discussion_r417472523



##########
File path: src/test/java/org/apache/sysds/test/TestUtils.java
##########
@@ -268,78 +268,171 @@ else if (Integer.parseInt(expRcn[2]) != Integer.parseInt(rcn[2])) {
 			fail("unable to read file: " + e.getMessage());
 		}
 	}
-
+	
 	/**
 	 * <p>
-	 * Compares the expected values calculated in Java by testcase and which are
-	 * in the normal filesystem, with those calculated by SystemDS located in
-	 * HDFS
+	 * Read the cell values of the expected file and actual files. Schema is used for correct parsing if the file is a
+	 * frame and if it is null FP64 will be used for all values (useful for Matrices).
 	 * </p>
-	 * 
-	 * @param expectedFile
-	 *            file with expected values, which is located in OS filesystem
-	 * @param actualDir
-	 *            file with actual values, which is located in HDFS
-	 * @param epsilon
-	 *            tolerance for value comparison
+	 *
+	 * @param schema         the schema of the frame, can be null (for FP64)
+	 * @param expectedFile   the file with expected values
+	 * @param actualDir      the directory where the actual values were written
+	 * @param expectedValues the HashMap where the expected values will be written to
+	 * @param actualValues   the HashMap where the actual values will be written to
 	 */
-	@SuppressWarnings("resource")
-	public static void compareDMLMatrixWithJavaMatrix(String expectedFile, String actualDir, double epsilon) {
+	private static void readActualAndExpectedFile(ValueType[] schema, String expectedFile, String actualDir,
+		HashMap<CellIndex, Object> expectedValues, HashMap<CellIndex, Object> actualValues) {
 		try {
 			Path outDirectory = new Path(actualDir);
 			Path compareFile = new Path(expectedFile);
 			FileSystem fs = IOUtilFunctions.getFileSystem(outDirectory, conf);
 			FSDataInputStream fsin = fs.open(compareFile);
-			
-			HashMap<CellIndex, Double> expectedValues = new HashMap<>();
-			
-			try( BufferedReader compareIn = new BufferedReader(new InputStreamReader(fsin)) ) {
+
+			try(BufferedReader compareIn = new BufferedReader(new InputStreamReader(fsin))) {

Review comment:
       Can this be rebased with master.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] kev-inn commented on a change in pull request #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
kev-inn commented on a change in pull request #893:
URL: https://github.com/apache/systemml/pull/893#discussion_r416604138



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/caching/FrameObject.java
##########
@@ -155,15 +182,46 @@ public long getNumColumns() {
 		return dc.getCols();
 	}
 	
+	private FrameBlock readFederated() {
+		FrameBlock result = new FrameBlock(_schema);
+		// provide long support?

Review comment:
       `readFederated` is a kind of temporary way of working with federated frames and still provide most of the functionality already implemented for frames. This requires the size to be int size. In the future we probably want to prohibit getting the whole frame from the federated workers anyway so I am not sure if we should spare much effort on this, or even do a TODO.
   Please let me know your thoughts considering those aspects, I would opt for TODO if you believe that to be necessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] kev-inn commented on issue #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
kev-inn commented on issue #893:
URL: https://github.com/apache/systemml/pull/893#issuecomment-616591517


   The tests are a bit to simplistic atm so I might add some more


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] j143 commented on a change in pull request #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
j143 commented on a change in pull request #893:
URL: https://github.com/apache/systemml/pull/893#discussion_r415398205



##########
File path: src/main/java/org/apache/sysds/runtime/util/UtilFunctions.java
##########
@@ -41,6 +36,18 @@
 import org.apache.sysds.runtime.matrix.data.Pair;
 import org.apache.sysds.runtime.meta.TensorCharacteristics;
 
+import java.util.ArrayList;

Review comment:
       Changes such as 
   1. Formatting (whitespace addition/removal)
   2. Rearragement of imports
   
   These kind of changes, we cannot be decisive. Now, the reviewing person needs to be able give OK for the functionality and formatting changes.
   **TL;DR**: If there is a need for change of formatting opening a separate PR would be more helpful.
   

##########
File path: src/test/java/org/apache/sysds/test/functions/federated/FederatedConstructionTest.java
##########
@@ -62,28 +70,59 @@ public void setUp() {
 	}
 
 	@Test
-	public void federatedConstructionCP() {
-		federatedConstruction(Types.ExecMode.SINGLE_NODE);
+	public void federatedMatrixConstructionCP() {
+		federatedMatrixConstruction(Types.ExecMode.SINGLE_NODE);
+	}
+
+	@Test
+	public void federatedMatrixConstructionSP() {
+		federatedMatrixConstruction(Types.ExecMode.SPARK);
 	}
 
+	public void federatedMatrixConstruction(Types.ExecMode execMode) {
+		getAndLoadTestConfiguration(TEST_NAME);
+		// write input matrix
+		double[][] A = getRandomMatrix(rows, cols, -1, 1, 1, 1234);
+		writeInputMatrixWithMTD("A", A, false, new MatrixCharacteristics(rows, cols, blocksize, rows * cols));
+		federatedConstruction(execMode, MATRIX_TEST_FILE_NAME, "A", null);
+	}
+	
 	@Test
-	public void federatedConstructionSP() {
-		federatedConstruction(Types.ExecMode.SPARK);
+	public void federatedFrameConstructionCP() throws IOException {
+		federatedFrameConstruction(Types.ExecMode.SINGLE_NODE);
+	}
+	
+	/* like other federated functionality, SPARK execution mode is not yet working (waiting for better integration

Review comment:
       Should we mark this a TODO.

##########
File path: src/test/java/org/apache/sysds/test/TestUtils.java
##########
@@ -268,78 +268,171 @@ else if (Integer.parseInt(expRcn[2]) != Integer.parseInt(rcn[2])) {
 			fail("unable to read file: " + e.getMessage());
 		}
 	}
-
+	
 	/**
 	 * <p>
-	 * Compares the expected values calculated in Java by testcase and which are
-	 * in the normal filesystem, with those calculated by SystemDS located in
-	 * HDFS
+	 * Read the cell values of the expected file and actual files. Schema is used for correct parsing if the file is a
+	 * frame and if it is null FP64 will be used for all values (useful for Matrices).
 	 * </p>
-	 * 
-	 * @param expectedFile
-	 *            file with expected values, which is located in OS filesystem
-	 * @param actualDir
-	 *            file with actual values, which is located in HDFS
-	 * @param epsilon
-	 *            tolerance for value comparison
+	 *
+	 * @param schema         the schema of the frame, can be null (for FP64)
+	 * @param expectedFile   the file with expected values
+	 * @param actualDir      the directory where the actual values were written
+	 * @param expectedValues the HashMap where the expected values will be written to
+	 * @param actualValues   the HashMap where the actual values will be written to
 	 */
-	@SuppressWarnings("resource")
-	public static void compareDMLMatrixWithJavaMatrix(String expectedFile, String actualDir, double epsilon) {
+	private static void readActualAndExpectedFile(ValueType[] schema, String expectedFile, String actualDir,
+		HashMap<CellIndex, Object> expectedValues, HashMap<CellIndex, Object> actualValues) {
 		try {
 			Path outDirectory = new Path(actualDir);
 			Path compareFile = new Path(expectedFile);
 			FileSystem fs = IOUtilFunctions.getFileSystem(outDirectory, conf);
 			FSDataInputStream fsin = fs.open(compareFile);
-			
-			HashMap<CellIndex, Double> expectedValues = new HashMap<>();
-			
-			try( BufferedReader compareIn = new BufferedReader(new InputStreamReader(fsin)) ) {
+
+			try(BufferedReader compareIn = new BufferedReader(new InputStreamReader(fsin))) {

Review comment:
       The test configuration has been modified, would you mind rebasing the latest.

##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/caching/FrameObject.java
##########
@@ -155,15 +182,46 @@ public long getNumColumns() {
 		return dc.getCols();
 	}
 	
+	private FrameBlock readFederated() {
+		FrameBlock result = new FrameBlock(_schema);
+		// provide long support?

Review comment:
       `int` may not be sufficient. Either this be marked a TODO or rationally assume to that cases would fall within range.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] mboehm7 commented on pull request #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #893:
URL: https://github.com/apache/systemml/pull/893#issuecomment-623011729


   LGTM - thanks @kev-inn this is a great generalization. 
   
   During the merge I slightly changed the `FrameObject`/`MatrixObject` integration by moving the `fedMapping` meta data and related methods up to `CacheableData` as it's the same for matrices and frames. Similarly, I brought the read from federated frames into the same form as federated matrices. Finally, there were some minor issues such as code duplication (parsing of strings to value-type-specific objects), unnecessary imports, and missing static modifiers of methods which don't access class members. But again, all minor issues - overall this is great. Now we can move to federated data preparation (transform).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] kev-inn commented on a change in pull request #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
kev-inn commented on a change in pull request #893:
URL: https://github.com/apache/systemml/pull/893#discussion_r417660073



##########
File path: src/test/java/org/apache/sysds/test/TestUtils.java
##########
@@ -268,78 +268,171 @@ else if (Integer.parseInt(expRcn[2]) != Integer.parseInt(rcn[2])) {
 			fail("unable to read file: " + e.getMessage());
 		}
 	}
-
+	
 	/**
 	 * <p>
-	 * Compares the expected values calculated in Java by testcase and which are
-	 * in the normal filesystem, with those calculated by SystemDS located in
-	 * HDFS
+	 * Read the cell values of the expected file and actual files. Schema is used for correct parsing if the file is a
+	 * frame and if it is null FP64 will be used for all values (useful for Matrices).
 	 * </p>
-	 * 
-	 * @param expectedFile
-	 *            file with expected values, which is located in OS filesystem
-	 * @param actualDir
-	 *            file with actual values, which is located in HDFS
-	 * @param epsilon
-	 *            tolerance for value comparison
+	 *
+	 * @param schema         the schema of the frame, can be null (for FP64)
+	 * @param expectedFile   the file with expected values
+	 * @param actualDir      the directory where the actual values were written
+	 * @param expectedValues the HashMap where the expected values will be written to
+	 * @param actualValues   the HashMap where the actual values will be written to
 	 */
-	@SuppressWarnings("resource")
-	public static void compareDMLMatrixWithJavaMatrix(String expectedFile, String actualDir, double epsilon) {
+	private static void readActualAndExpectedFile(ValueType[] schema, String expectedFile, String actualDir,
+		HashMap<CellIndex, Object> expectedValues, HashMap<CellIndex, Object> actualValues) {
 		try {
 			Path outDirectory = new Path(actualDir);
 			Path compareFile = new Path(expectedFile);
 			FileSystem fs = IOUtilFunctions.getFileSystem(outDirectory, conf);
 			FSDataInputStream fsin = fs.open(compareFile);
-			
-			HashMap<CellIndex, Double> expectedValues = new HashMap<>();
-			
-			try( BufferedReader compareIn = new BufferedReader(new InputStreamReader(fsin)) ) {
+
+			try(BufferedReader compareIn = new BufferedReader(new InputStreamReader(fsin))) {

Review comment:
       sure, will do as soon as I have time to fix up everything




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] j143 commented on pull request #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
j143 commented on pull request #893:
URL: https://github.com/apache/systemml/pull/893#issuecomment-619402366


   Hi @kev-inn , If this is a more work. This can be broken down into 2 to 3 PR s which makes it very easier for review and merging. 
   
   Anyway, should I review this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] kev-inn commented on pull request #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
kev-inn commented on pull request #893:
URL: https://github.com/apache/systemml/pull/893#issuecomment-619435229


   > 
   > 
   > Hi @kev-inn , If this is a more work. This can be broken down into 2 to 3 PR s which makes it very easier for review and merging.
   > 
   > Anyway, should I review this?
   
   Well splitting this up is quite hard, because the additions need each other to work.
   Yes, please do, although like I said some more in depth tests will be added soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] j143 commented on a change in pull request #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
j143 commented on a change in pull request #893:
URL: https://github.com/apache/systemml/pull/893#discussion_r417470308



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/caching/FrameObject.java
##########
@@ -155,15 +182,46 @@ public long getNumColumns() {
 		return dc.getCols();
 	}
 	
+	private FrameBlock readFederated() {
+		FrameBlock result = new FrameBlock(_schema);
+		// provide long support?

Review comment:
       If this is the case, kindly write this info as a comment referencing the issues such as SYSTEMDS-##. We can keep it as descriptive as possible and can even include numeric examples to be explanatory.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemml] kev-inn edited a comment on issue #893: Federated Frames

Posted by GitBox <gi...@apache.org>.
kev-inn edited a comment on issue #893:
URL: https://github.com/apache/systemml/pull/893#issuecomment-616591517


   The tests are a bit too simplistic atm so I might add some more


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org