You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2021/01/13 21:42:42 UTC

[GitHub] [systemds] tobiasrieger opened a new pull request #1154: [SYSTEMDS-2550] Parameter Server Validation and FedParamServ Statistics

tobiasrieger opened a new pull request #1154:
URL: https://github.com/apache/systemds/pull/1154


   This PR contains a lot of valuable features for the parameter server
   
   ### Validation
   There is now a second createPS function in the ParamservBuiltinCPInstruction. If all the additional arguments are specified the parameter server is able to validate after each epoch. It will do so if LOG.info is enabled.
   This feature is implemented for the federated parameter server ONLY, but is easily implemented for the other cases too.
   
   ### Federated Parameter Server Statistics
   The federated parameter server now uses the same statistics as the regular one, where possible. Also a number of new ones were introduced:
   * Aggregated Validation Time
   * Aggregated Fed Communication Time
   * Federated Data Partitioning Time
   * Aggregated Fed Batch Weighing Time
   * Fed Worker Computation Time (This includes gradient calculation, local updates and batch slicing. The granularity can be improved if needed.)
   
   ### Other Changes
   * scaleAndPushGradients was refactored to weighAndPushGradients
   * Logging cleanup
   * Not implemented exceptions
   * Validation Functions for the Federated Parameter Server Test DML files
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] tobiasrieger commented on pull request #1154: [SYSTEMDS-2550] Parameter Server Validation and FedParamServ Statistics

Posted by GitBox <gi...@apache.org>.
tobiasrieger commented on pull request #1154:
URL: https://github.com/apache/systemds/pull/1154#issuecomment-769656081


   Thank you @Baunsgaard and @mboehm7! I have addressed the issues you raised.
   Changes:
   Fixed Logic Errors
   Only guarded the actually logging and changed the if structure a bit
   Shortened Statistics output


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1154: [SYSTEMDS-2550] Parameter Server Validation and FedParamServ Statistics

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1154:
URL: https://github.com/apache/systemds/pull/1154#issuecomment-770288532


   LGTM - thanks for the extension @tobiasrieger. I only made minor changes during the merge: (1) reverted the cumulative time measurement in Timing and changed the setup measurement accordingly, (2) added the data partitioning timing (time was measured but not collected), (3) reverted the changed test logging level to INFO, and (4) some minor formatting changes to avoid huge indentation.   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on a change in pull request #1154: [SYSTEMDS-2550] Parameter Server Validation and FedParamServ Statistics

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #1154:
URL: https://github.com/apache/systemds/pull/1154#discussion_r558057320



##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/paramserv/ParamServer.java
##########
@@ -119,7 +175,7 @@ public ListObject getResult() {
 		// so we could return directly the result model
 		return _model;
 	}
-	
+
 	protected synchronized void updateGlobalModel(int workerID, ListObject gradients) {
 		try {
 			if (LOG.isDebugEnabled()) {

Review comment:
       everything is surrounded by a is debug enabled witch is a lower level than info.

##########
File path: src/main/java/org/apache/sysds/runtime/controlprogram/paramserv/ParamServer.java
##########
@@ -154,6 +222,20 @@ protected synchronized void updateGlobalModel(int workerID, ListObject gradients
 				}
 				case ASP: {
 					updateGlobalModel(gradients);
+					if(LOG.isInfoEnabled()) {
+						// This if works similarly to the one for BSP, but divides the sync couter through the number of workers,
+						// creating "Pseudo Epochs"
+						if (LOG.isInfoEnabled() && _validationPossible &&

Review comment:
       outerif checks the same as first part of inner if.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] asfgit closed pull request #1154: [SYSTEMDS-2550] Parameter Server Validation and FedParamServ Statistics

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1154:
URL: https://github.com/apache/systemds/pull/1154


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on pull request #1154: [SYSTEMDS-2550] Parameter Server Validation and FedParamServ Statistics

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on pull request #1154:
URL: https://github.com/apache/systemds/pull/1154#issuecomment-760756975


    
   > There is now a second createPS function in the ParamservBuiltinCPInstruction. If all the additional arguments are specified the parameter server is able to validate after each epoch. It will do so if LOG.info is enabled.
   > This feature is implemented for the federated parameter server ONLY, but is easily implemented for the other cases too.
   
   I can understand why this is done with the logging. But i think this is a slight misuse of the logging in the system, since ideally we (or at least me) want no difference in execution (or as close to) when we set different logging levels. This violates it since the validation in the parameter server can take many seconds of execution time.
   
   I would suggest adding a parameter some other way to enable / disable the validation. While it is still possible that we output the validation scores through LOG.info when that other "unrelated to logging level" setting is set.
   
   But maybe I'm the only one with this opinion? @mboehm7 ?
   
   
   
   furthermore designwise, you might want to consider doing this validation in parallel with the distribution of model parameters. since we know that the controller is doing nothing while the workers are training, you could do the validation on a separate thread at no cost to execution time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1154: [SYSTEMDS-2550] Parameter Server Validation and FedParamServ Statistics

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1154:
URL: https://github.com/apache/systemds/pull/1154#issuecomment-760809439


   yes, I agree with @Baunsgaard regarding logging as it might hide bugs if there is additional state updated. Just guard the actual info logging (and string concatenation). Regarding parallelization please put a TODO in there and focus on the main parts of the validation first.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org