You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2021/04/23 14:18:56 UTC

[GitHub] [systemds] sebwrede opened a new pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

sebwrede opened a new pull request #1237:
URL: https://github.com/apache/systemds/pull/1237


   This PR adds federated output flag propagation to AggregateBinaryFEDInstruction. This requires partial aggregation of federated data, which means that several new processing cases have been added to the instruction. Additionally, a new field called `_overlapNum` has been added to the FederatedRange class, which ensures that ranges of identical dimensions can be added to the federation map, and a new federation type called `FType.PART` has been added to represent partially aggregated federated data. The `readBlobFromFederated` method in MatrixObject.java has been adapted to handle partially aggregated data by aggregating the data when the method is called. 
   The reviewer of this PR should especially evaluate the changes made to FederationMap and FederatedRange to support the partially aggregated data and whether this partially aggregated data is properly processed.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on a change in pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#discussion_r625893054



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedRange.java
##########
@@ -81,6 +88,10 @@ public long getSize() {
 			size *= getSize(i);
 		return size;
 	}
+
+	public int getOverlapNum(){
+		return _overlapNum;

Review comment:
       well in compression the overlap state only happens on a right matrix multiplication, but in federated i can already imagine it happening many times, if you want to convince me on another name it is fine, but i think having the same name would be nice.
   
   I guess you chose PART because it is a "PARTial aggregate value". I chose overlap because it is matrices that could be imagined to be on top of each other and added together to produce the result. I think that the name should reflect what it is, and if we have the same concept more places it should be called the same, that said i don't know what other ppl call it when they have distributed parts that could be reduced to the actual value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on a change in pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
sebwrede commented on a change in pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#discussion_r625849884



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedRange.java
##########
@@ -94,12 +105,19 @@ public int compareTo(FederatedRange o) {
 			if ( _beginDims[i] > o._beginDims[i])
 				return 1;
 		}
+		if (_overlapNum < o._overlapNum)
+			return -1;
+		if (_overlapNum > o._overlapNum)
+			return 1;

Review comment:
       See other comment about putting into map. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on a change in pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
sebwrede commented on a change in pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#discussion_r625867368



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedRange.java
##########
@@ -81,6 +88,10 @@ public long getSize() {
 			size *= getSize(i);
 		return size;
 	}
+
+	public int getOverlapNum(){
+		return _overlapNum;

Review comment:
       That would not be a good solution since it would affect all other federated ranges as well. The solution I implemented here will only affect the behavior of the overlapping ranges and will not change how the existing federated ranges behave. 
   I did not really like this either when I added the _overlapNum, but it was a solution that would work for the case I added without negatively affecting anything else. 
   Another option would be to generate a random ID instead of setting an overlap number, since I do not need this specific number anyway. The problem is that I also don't need a randomly generated ID, so it would be overkill to make the implementation when all I need is any integer. 
   Another solution would be to add overlap as a boolean and then check if this boolean is true in compareTo, toString, equals, and hashCode. If it is true then I should include the object pointer in some way in the computation. But then the question is again if this is too much compared to just adding a unique integer from the beginning.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on a change in pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
sebwrede commented on a change in pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#discussion_r625849142



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedRange.java
##########
@@ -81,6 +88,10 @@ public long getSize() {
 			size *= getSize(i);
 		return size;
 	}
+
+	public int getOverlapNum(){
+		return _overlapNum;

Review comment:
       This is to ensure that I can put similar ranges in a map without replacing any values. The range needs to be unique to achieve this. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 edited a comment on pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
mboehm7 edited a comment on pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#issuecomment-841874157


   Thanks for the patch and the detailed discussion @sebwrede and @Baunsgaard. I started merging it but need to finalize this tomorrow (so please no new commits on this PR).
   
   As a high-level summary:
   * The modification of the FederatedRange is actually unnecessary as we predominately iterate over the entry set - now we simply convert this map to a list of pairs and avoid any workarounds for comparison.
   * In anticipation of @OlgaOvcharenko PR (currently WIP) for handling broadcasts and sliced broadcasts as federated data as well, we should clearly separate the partitioning type (ROW, COL, NONE, MIXED) and the replication type (NONE, FULL, PARTIAL) where PARTIAL might be OVERLAP. We can then define Federated Types like ROW(partitioning=ROW, replication=NONE)
   * Currently the federatedOutput flags are conditionally compiled into the instructions. We should add a three-valued logic here (true, false, none) where true/false forced the instruction to act a certain way, while none preserves the current heuristic runtime semantics.
   * We need to fix (i.e., remove) all the stateful string concatenation  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
sebwrede commented on pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#issuecomment-832783435


   I am not completely convinced that the latest commit is an improvement. 
   What do you think, @Baunsgaard ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on a change in pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#discussion_r625876026



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedRange.java
##########
@@ -81,6 +88,10 @@ public long getSize() {
 			size *= getSize(i);
 		return size;
 	}
+
+	public int getOverlapNum(){
+		return _overlapNum;

Review comment:
       I don't agree,
   
   Actually now that i think about it, you would not need a boolean nor a overlapping index, since you are using the "PART" (soon to be "OVERLAP") to mark if a given federated matrix is overlapping, therefore you should not need any of this logic to handle any incoming logic, just switch on the federated type.
   
   again the compare part is easily solved via object compare since it contains a "random" ID in form of its object pointer that is unique for each object.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#issuecomment-841874157


   Thanks for the patch and the detailed discussion @sebwrede and @sebwrede. I started merging it but need to finalize this tomorrow (no please no new commits on this PR).
   
   As a high-level summary:
   * The modification of the FederatedRange is actually unnecessary as we predominately iterate over the entry set - now we simply convert this map to a list of pairs and avoid any workarounds for comparison.
   * In anticipation of @OlgaOvcharenko PR (currently WIP) for handling broadcasts and sliced broadcasts as federated data as well, we should clearly separate the partitioning type (ROW, COL, NONE, MIXED) and the replication type (NONE, FULL, PARTIAL) where PARTIAL might be OVERLAP.
   * Currently the federatedOutput flags are conditionally compiled into the instructions. We should add a three-valued logic here (true, false, none) where true/false forced the instruction to act a certain way, while none preserves the current heuristic runtime semantics.
   * We need to fix (i.e., remove) all the stateful string concatenation  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
sebwrede commented on pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#issuecomment-833301963


   > Why not allow the two federated ranges to be equal if the arrays are the same and they are both overlapping?
   > then you don't have to care about the hash of this element, making the implementation cleaner.
   > In general why does the ranges need to implement comparable, if it is just for sorting then allowing them to be equal is fine.
   
   The hashCode, equals, and compareTo are used when putting the range into a federation map. Depending on the data structure chosen for the federation map, which in most cases is a TreeMap, one or more of the methods will need to separate between the overlapping ranges. This is the point of adding some kind of uniqueness to the ranges. 
   If I change one of the three methods, then I should change all of them to include this unique value. Otherwise, it will be really confusing later on if there is a difference between putting into a HashMap vs a TreeMap. 
   Additionally, I add the uniqueness to the toString method so that we can see the difference in the print. 
   
   > it would be cool if you enabled some of the tests in the python API, since i added overlapping tests there sometime back and never got to use them
   
   Where do I find this? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] asfgit closed pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1237:
URL: https://github.com/apache/systemds/pull/1237


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on a change in pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#discussion_r625831261



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/caching/MatrixObject.java
##########
@@ -548,27 +550,54 @@ protected MatrixBlock readBlobFromFederated(FederationMap fedMap, long[] dims)
 		throws IOException
 	{
 		// TODO sparse optimization
-		MatrixBlock ret = new MatrixBlock((int) dims[0], (int) dims[1], false);
 		List<Pair<FederatedRange, Future<FederatedResponse>>> readResponses = fedMap.requestFederatedData();
 		try {
-			for (Pair<FederatedRange, Future<FederatedResponse>> readResponse : readResponses) {
-				FederatedRange range = readResponse.getLeft();
-				FederatedResponse response = readResponse.getRight().get();
-				// add result
-				int[] beginDimsInt = range.getBeginDimsInt();
-				int[] endDimsInt = range.getEndDimsInt();
-				MatrixBlock multRes = (MatrixBlock) response.getData()[0];
-				ret.copy(beginDimsInt[0], endDimsInt[0] - 1,
-					beginDimsInt[1], endDimsInt[1] - 1, multRes, false);
-				ret.setNonZeros(ret.getNonZeros() + multRes.getNonZeros());
+			if ( fedMap.getType() == FederationMap.FType.PART )
+				return aggregateResponses(readResponses);
+			else {
+				return bindResponses(readResponses, dims);
 			}
 		}
-		catch (Exception e) {
+		catch(Exception e) {
 			throw new DMLRuntimeException("Federated matrix read failed.", e);
 		}
-		
+	}
+
+	/**
+	 * Bind data from federated workers based on non-overlapping federated ranges.
+	 * @param readResponses responses from federated workers containing the federated ranges and data
+	 * @param dims dimensions of output MatrixBlock
+	 * @return MatrixBlock of consolidated data
+	 * @throws Exception in case of problems with getting data from responses
+	 */
+	private MatrixBlock bindResponses(List<Pair<FederatedRange, Future<FederatedResponse>>> readResponses, long[] dims)
+	throws Exception {
+		MatrixBlock ret = new MatrixBlock((int) dims[0], (int) dims[1], false);
+		for(Pair<FederatedRange, Future<FederatedResponse>> readResponse : readResponses) {
+			FederatedRange range = readResponse.getLeft();
+			FederatedResponse response = readResponse.getRight().get();
+			// add result
+			int[] beginDimsInt = range.getBeginDimsInt();
+			int[] endDimsInt = range.getEndDimsInt();
+			MatrixBlock multRes = (MatrixBlock) response.getData()[0];
+			ret.copy(beginDimsInt[0], endDimsInt[0] - 1, beginDimsInt[1], endDimsInt[1] - 1, multRes, false);
+			ret.setNonZeros(ret.getNonZeros() + multRes.getNonZeros());
+		}
 		return ret;
 	}
+
+	/**
+	 * Aggregate partially aggregated data from federated workers
+	 * by adding values with the same index in different federated locations.
+	 * @param readResponses responses from federated workers containing the federated data
+	 * @return MatrixBlock of consolidated, aggregated data
+	 */
+	private MatrixBlock aggregateResponses(List<Pair<FederatedRange, Future<FederatedResponse>>> readResponses) {
+		List<Future<FederatedResponse>> dataParts = new ArrayList<>();
+		for ( Pair<FederatedRange, Future<FederatedResponse>> readResponse : readResponses )
+			dataParts.add(readResponse.getValue());
+		return FederationUtils.aggAdd(dataParts.toArray(new Future[0]));

Review comment:
       I Like it, it is basically the same as I'm doing with the Overlapping in compression.

##########
File path: src/main/java/org/apache/sysds/lops/MatMultCP.java
##########
@@ -73,15 +73,15 @@ public String toString() {
 	@Override
 	public String getInstructions(String input1, String input2, String output) {
 		if(!useTranspose) {
-			return InstructionUtils.concatOperands(getExecType().name(),
+			InstructionUtils.concatBaseOperands(getExecType().name(),

Review comment:
       I don't like this change because it hides where the variable is.
   also i'm not sure if it works, since strings behave differently in java than other variables (immutable),

##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedRange.java
##########
@@ -81,6 +88,10 @@ public long getSize() {
 			size *= getSize(i);
 		return size;
 	}
+
+	public int getOverlapNum(){
+		return _overlapNum;

Review comment:
       In my experience it does not matter how many overlaps you have, just have a boolean specifying if there is overlap or not.

##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedRange.java
##########
@@ -94,12 +105,19 @@ public int compareTo(FederatedRange o) {
 			if ( _beginDims[i] > o._beginDims[i])
 				return 1;
 		}
+		if (_overlapNum < o._overlapNum)
+			return -1;
+		if (_overlapNum > o._overlapNum)
+			return 1;

Review comment:
       how is it relevant which index it is?
   if you are trying to keep the overlaps in a certain order, it is really not a needed feature. 
   But then again if it really is necessary to sort the federated ranges i guess this is an okay way.
   
   if you change _overlapNum to be a boolean, you could sort on Object pointers instead in last case.

##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederationMap.java
##########
@@ -50,6 +50,7 @@
 		ROW, // row partitioned, groups of rows
 		COL, // column partitioned, groups of columns
 		FULL, // Meaning both Row and Column indicating a single federated location and a full matrix
+		PART, // Partial aggregates in several federated locations with addition as the aggregation method

Review comment:
       Call it Overlap, to not confuse (at least me) and this overlap is a concept i have in compression as well, where i call it ... overlap.

##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederationMap.java
##########
@@ -189,9 +190,14 @@ public FederatedRequest broadcast(ScalarObject scalar) {
 		return ret;
 	}
 
+	/**
+	 * Determines if the two federation maps are aligned row/column partitions
+	 * at the same federated sites (which allows for purely federated operation)
+	 * @param that FederationMap to check alignment with
+	 * @param transposed true if that FederationMap should be transposed before checking alignment
+	 * @return true if this and that FederationMap are aligned

Review comment:
       Thanks for the documentations !!!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on a change in pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
sebwrede commented on a change in pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#discussion_r625883913



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedRange.java
##########
@@ -81,6 +88,10 @@ public long getSize() {
 			size *= getSize(i);
 		return size;
 	}
+
+	public int getOverlapNum(){
+		return _overlapNum;

Review comment:
       I would like to use the PART type as the condition for adding the object pointer to the aforementioned methods, but this is not available in FederatedRange. 
   
   Are you sure about the name "OVERLAP"? Isn't it too long compared to the others? Additionally, it is not any kind of overlap. It is specific to the partial aggregates. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 edited a comment on pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
mboehm7 edited a comment on pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#issuecomment-841874157


   Thanks for the patch and the detailed discussion @sebwrede and @sebwrede. I started merging it but need to finalize this tomorrow (so please no new commits on this PR).
   
   As a high-level summary:
   * The modification of the FederatedRange is actually unnecessary as we predominately iterate over the entry set - now we simply convert this map to a list of pairs and avoid any workarounds for comparison.
   * In anticipation of @OlgaOvcharenko PR (currently WIP) for handling broadcasts and sliced broadcasts as federated data as well, we should clearly separate the partitioning type (ROW, COL, NONE, MIXED) and the replication type (NONE, FULL, PARTIAL) where PARTIAL might be OVERLAP. We can then define Federated Types like ROW(partitioning=ROW, replication=NONE)
   * Currently the federatedOutput flags are conditionally compiled into the instructions. We should add a three-valued logic here (true, false, none) where true/false forced the instruction to act a certain way, while none preserves the current heuristic runtime semantics.
   * We need to fix (i.e., remove) all the stateful string concatenation  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on a change in pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#discussion_r625852537



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedRange.java
##########
@@ -81,6 +88,10 @@ public long getSize() {
 			size *= getSize(i);
 		return size;
 	}
+
+	public int getOverlapNum(){
+		return _overlapNum;

Review comment:
       i don't get your argument, if you allocate different objects for each range, then there should not be any problems other than you have to change the compare method to also in case of same values look at object pointers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 edited a comment on pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
mboehm7 edited a comment on pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#issuecomment-841874157






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#issuecomment-833371776


   > > Why not allow the two federated ranges to be equal if the arrays are the same and they are both overlapping?
   > > then you don't have to care about the hash of this element, making the implementation cleaner.
   > > In general why does the ranges need to implement comparable, if it is just for sorting then allowing them to be equal is fine.
   > 
   > The hashCode, equals, and compareTo are used when putting the range into a federation map. Depending on the data structure chosen for the federation map, which in most cases is a TreeMap, one or more of the methods will need to separate between the overlapping ranges. This is the point of adding some kind of uniqueness to the ranges.
   > If I change one of the three methods, then I should change all of them to include this unique value. Otherwise, it will be really confusing later on if there is a difference between putting into a HashMap vs a TreeMap.
   > Additionally, I add the uniqueness to the toString method so that we can see the difference in the print.
   
   Design wise would it not then make sense then to move the addresses into these ranges?
   This would result in the uniqueness coming from the combination of range and address(inside same object), this would also allow us to based on the equality of objects know that we have a federated location with two intermediates that perfectly overlap on the same machine, and therefore merge these remotely.
   Maybe @mboehm7  have a comment on this?
   
   > > it would be cool if you enabled some of the tests in the python API, since i added overlapping tests there sometime back and never got to use them
   > 
   > Where do I find this?
   
   In src/main/python/tests/federated/test_federated_aggregations.py and src/main/python/tests/federated/test_federated_basic.py


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#issuecomment-841874157


   Thanks for the patch and the detailed discussion @sebwrede and @sebwrede. I started merging it but need to finalize this tomorrow (no please no new commits on this PR).
   
   As a high-level summary:
   * The modification of the FederatedRange is actually unnecessary as we predominately iterate over the entry set - now we simply convert this map to a list of pairs and avoid any workarounds for comparison.
   * In anticipation of @OlgaOvcharenko PR (currently WIP) for handling broadcasts and sliced broadcasts as federated data as well, we should clearly separate the partitioning type (ROW, COL, NONE, MIXED) and the replication type (NONE, FULL, PARTIAL) where PARTIAL might be OVERLAP.
   * Currently the federatedOutput flags are conditionally compiled into the instructions. We should add a three-valued logic here (true, false, none) where true/false forced the instruction to act a certain way, while none preserves the current heuristic runtime semantics.
   * We need to fix (i.e., remove) all the stateful string concatenation  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on a change in pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
sebwrede commented on a change in pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#discussion_r625852306



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederationMap.java
##########
@@ -50,6 +50,7 @@
 		ROW, // row partitioned, groups of rows
 		COL, // column partitioned, groups of columns
 		FULL, // Meaning both Row and Column indicating a single federated location and a full matrix
+		PART, // Partial aggregates in several federated locations with addition as the aggregation method

Review comment:
       I chose this name because it matches the other names: it is short and in contrast to "FULL". 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 edited a comment on pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
mboehm7 edited a comment on pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#issuecomment-841874157


   Thanks for the patch and the detailed discussion @sebwrede and @sebwrede. I started merging it but need to finalize this tomorrow (so please no new commits on this PR).
   
   As a high-level summary:
   * The modification of the FederatedRange is actually unnecessary as we predominately iterate over the entry set - now we simply convert this map to a list of pairs and avoid any workarounds for comparison.
   * In anticipation of @OlgaOvcharenko PR (currently WIP) for handling broadcasts and sliced broadcasts as federated data as well, we should clearly separate the partitioning type (ROW, COL, NONE, MIXED) and the replication type (NONE, FULL, PARTIAL) where PARTIAL might be OVERLAP.
   * Currently the federatedOutput flags are conditionally compiled into the instructions. We should add a three-valued logic here (true, false, none) where true/false forced the instruction to act a certain way, while none preserves the current heuristic runtime semantics.
   * We need to fix (i.e., remove) all the stateful string concatenation  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#issuecomment-832800326


   > I am not completely convinced that the latest commit is an improvement.
   > What do you think, @Baunsgaard ?
   
   i don't like it either.
   
   Why not allow the two federated ranges to be equal if the arrays are the same and they are both overlapping?
   then you don't have to care about the hash of this element, making the implementation cleaner.
   In general why does the ranges need to implement comparable, if it is just for sorting then allowing them to be equal is fine.
   
   also you added more of the PART, rather than rewording it to overlap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on a change in pull request #1237: [SYSTEMDS-2604] Federated Output for AggregateBinaryFEDInstruction

Posted by GitBox <gi...@apache.org>.
sebwrede commented on a change in pull request #1237:
URL: https://github.com/apache/systemds/pull/1237#discussion_r625848126



##########
File path: src/main/java/org/apache/sysds/lops/MatMultCP.java
##########
@@ -73,15 +73,15 @@ public String toString() {
 	@Override
 	public String getInstructions(String input1, String input2, String output) {
 		if(!useTranspose) {
-			return InstructionUtils.concatOperands(getExecType().name(),
+			InstructionUtils.concatBaseOperands(getExecType().name(),

Review comment:
       I have done this in other methods and I have not yet experienced any problems with it. The issue with the hidden variable is that it does not make any sense to return it from the concatBaseOperands method because of the internals of InstructionUtils. I think it is working with a thread-local variable and this means that for subsequent calls from this getInstructions method, it will always access the same variable. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org