You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2021/10/04 08:01:55 UTC

[GitHub] [systemds] ywcb00 opened a new pull request #1409: [SYSTEMDS-3148] Federated ALS Performance Tests

ywcb00 opened a new pull request #1409:
URL: https://github.com/apache/systemds/pull/1409


   Hi,
   This PR adds the first federated performance test, ALS-CG.
   The script generates the data, creates a specific amount (in our case 5) of federated worker processes on localhost, splits the data into partitions, creates a federated object from the different partitions and workers, runs the ALS-CG algorithm with the federated object as data input, and kills the woker processes again.
   I tried to make the shell scripts for starting and killing the federated workers, and also the dml script for splitting the data and creating a federated object, as general as possible so that the same scripts can be used the further federated performance tests.
   
   Thanks for review :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] asfgit closed pull request #1409: [SYSTEMDS-3148] Federated ALS Performance Tests

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1409:
URL: https://github.com/apache/systemds/pull/1409


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on pull request #1409: [SYSTEMDS-3148] Federated ALS Performance Tests

Posted by GitBox <gi...@apache.org>.
sebwrede commented on pull request #1409:
URL: https://github.com/apache/systemds/pull/1409#issuecomment-937863036


   > Hi, This PR adds the first federated performance test, ALS-CG. The script generates the data, creates a specific amount (in our case 5) of federated worker processes on localhost, splits the data into partitions, creates a federated object from the different partitions and workers, runs the ALS-CG algorithm with the federated object as data input, and kills the woker processes again. I tried to make the shell scripts for starting and killing the federated workers, and also the dml script for splitting the data and creating a federated object, as general as possible so that the same scripts can be used the further federated performance tests.
   > 
   > Thanks for review :)
   
   Thanks for this PR!
   
   I ran the performance test and have two comments: 
   
   1. I get the following message continuously during the execution: `TRACE nio.NioEventLoop: instrumented a special java.util.Set into: sun.nio.ch.EPollSelectorImpl@`. Do you have any idea why this happens and how to fix it? 
   2. The script prints `Max iteration achieved but not converged!`. Could we increase the maximum number of iterations or change something else to make sure that it reaches its target? Otherwise it is difficult to determine from the prints whether the algorithm actually succeeded or if it was even close to succeeding. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] ywcb00 commented on pull request #1409: [SYSTEMDS-3148] Federated ALS Performance Tests

Posted by GitBox <gi...@apache.org>.
ywcb00 commented on pull request #1409:
URL: https://github.com/apache/systemds/pull/1409#issuecomment-939859345


   Thank you for the comments @sebwrede :)
   
   I set the log level of io.netty to INFO to avoid the trace logs.
   I also set the maximum number of iterations to 100 and the threshold to 0.001 so that the algorithm converges.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on pull request #1409: [SYSTEMDS-3148] Federated ALS Performance Tests

Posted by GitBox <gi...@apache.org>.
sebwrede commented on pull request #1409:
URL: https://github.com/apache/systemds/pull/1409#issuecomment-937863036


   > Hi, This PR adds the first federated performance test, ALS-CG. The script generates the data, creates a specific amount (in our case 5) of federated worker processes on localhost, splits the data into partitions, creates a federated object from the different partitions and workers, runs the ALS-CG algorithm with the federated object as data input, and kills the woker processes again. I tried to make the shell scripts for starting and killing the federated workers, and also the dml script for splitting the data and creating a federated object, as general as possible so that the same scripts can be used the further federated performance tests.
   > 
   > Thanks for review :)
   
   Thanks for this PR!
   
   I ran the performance test and have two comments: 
   
   1. I get the following message continuously during the execution: `TRACE nio.NioEventLoop: instrumented a special java.util.Set into: sun.nio.ch.EPollSelectorImpl@`. Do you have any idea why this happens and how to fix it? 
   2. The script prints `Max iteration achieved but not converged!`. Could we increase the maximum number of iterations or change something else to make sure that it reaches its target? Otherwise it is difficult to determine from the prints whether the algorithm actually succeeded or if it was even close to succeeding. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] ywcb00 commented on a change in pull request #1409: [SYSTEMDS-3148] Federated ALS Performance Tests

Posted by GitBox <gi...@apache.org>.
ywcb00 commented on a change in pull request #1409:
URL: https://github.com/apache/systemds/pull/1409#discussion_r721361104



##########
File path: scripts/perftest/fed/runALSFed.sh
##########
@@ -0,0 +1,58 @@
+#!/bin/bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+# 
+#   http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+CMD=${1:-"systemds"}
+DATADIR=${2:-"temp"}/als
+NUMFED=${3:-4}
+MAXITR=${4:-20}
+
+FILENAME=$0
+err_report() {
+  echo "Error in $FILENAME on line $1"
+}
+trap 'err_report $LINENO' ERR
+
+# Set properties
+export SYSDS_QUIET=1
+
+BASEPATH=$(dirname "$0")
+
+${BASEPATH}/../genALSData.sh systemds $DATADIR; # generate the data
+
+# start the federated workers on localhost
+${BASEPATH}/utils/startFedWorkers.sh systemds $DATADIR $NUMFED "localhost";
+
+for d in "10k_1k_dense" "10k_1k_sparse"
+do
+  # split the generated data into paritions and create a federated object
+  ${CMD} -f ${BASEPATH}/data/splitAndMakeFederated.dml \
+    --config ${BASEPATH}/../conf/SystemDS-config.xml \
+    --nvargs data=${DATADIR}/X${d} nSplit=$NUMFED transposed=FALSE \
+      target=${DATADIR}/X${d}_fed.json hosts=${DATADIR}/workers/hosts fmt="csv"
+
+  echo "-- Running ALS-CG with federated data ("$d") on "$NUMFED" federated workers" >> results/times.txt
+
+  # run the als algorithm on the federated object
+  ${BASEPATH}/../runALS.sh ${DATADIR}/X${d}_fed.json $MAXITR $DATADIR systemds;

Review comment:
       What do you think about the decision of calling the main algorithm script from the outside and passing the federated object as input?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on a change in pull request #1409: [SYSTEMDS-3148] Federated ALS Performance Tests

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #1409:
URL: https://github.com/apache/systemds/pull/1409#discussion_r721438830



##########
File path: scripts/perftest/fed/runALSFed.sh
##########
@@ -0,0 +1,58 @@
+#!/bin/bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+# 
+#   http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+CMD=${1:-"systemds"}
+DATADIR=${2:-"temp"}/als
+NUMFED=${3:-4}
+MAXITR=${4:-20}
+
+FILENAME=$0
+err_report() {
+  echo "Error in $FILENAME on line $1"
+}
+trap 'err_report $LINENO' ERR
+
+# Set properties
+export SYSDS_QUIET=1
+
+BASEPATH=$(dirname "$0")
+
+${BASEPATH}/../genALSData.sh systemds $DATADIR; # generate the data
+
+# start the federated workers on localhost
+${BASEPATH}/utils/startFedWorkers.sh systemds $DATADIR $NUMFED "localhost";
+
+for d in "10k_1k_dense" "10k_1k_sparse"
+do
+  # split the generated data into paritions and create a federated object
+  ${CMD} -f ${BASEPATH}/data/splitAndMakeFederated.dml \
+    --config ${BASEPATH}/../conf/SystemDS-config.xml \
+    --nvargs data=${DATADIR}/X${d} nSplit=$NUMFED transposed=FALSE \
+      target=${DATADIR}/X${d}_fed.json hosts=${DATADIR}/workers/hosts fmt="csv"
+
+  echo "-- Running ALS-CG with federated data ("$d") on "$NUMFED" federated workers" >> results/times.txt
+
+  # run the als algorithm on the federated object
+  ${BASEPATH}/../runALS.sh ${DATADIR}/X${d}_fed.json $MAXITR $DATADIR systemds;

Review comment:
       in general i like it the best if it is from the outside, since then the script is the same if you use federated input or normal local input.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] sebwrede commented on pull request #1409: [SYSTEMDS-3148] Federated ALS Performance Tests

Posted by GitBox <gi...@apache.org>.
sebwrede commented on pull request #1409:
URL: https://github.com/apache/systemds/pull/1409#issuecomment-943234725


   > Thank you for the comments @sebwrede :)
   > 
   > I set the log level of io.netty to INFO to avoid the trace logs. I also set the maximum number of iterations to 100 and the threshold to 0.001 so that the algorithm converges.
   
   LGTM. I think this is ready to merge, but I don't know if we should wait until after the next release. What do you think, @phaniarnab?
   
   Another point: I noticed that when I run `runAll.sh` then I get the error:
   `Error in ./runAllMultinomial.sh on line 53`
   `Error in ./runAllRegression.sh on line 51`
   This is not related to this PR. It also happens on master. I don't know if it is something with my setup, but it does not happen with the call to `runAllFed.sh`, so that is good :smiley:. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org