You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2021/05/04 17:41:08 UTC

[GitHub] [systemds] clarapueyoballarin opened a new pull request #1254: [WIP] [AMLS] shortest path

clarapueyoballarin opened a new pull request #1254:
URL: https://github.com/apache/systemds/pull/1254


   added new builtin function
   TO DO tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] asfgit closed pull request #1254: [WIP] [AMLS] shortest path

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1254:
URL: https://github.com/apache/systemds/pull/1254


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1254: [WIP] [AMLS] shortest path

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1254:
URL: https://github.com/apache/systemds/pull/1254#issuecomment-873626149


   LGTM - during the merge I fixed the remaining warnings, formatting issues (e.g., input/output documentation), and resolved the merge conflicts due to recent name changes of other tests (capitalization). The git author of the commits in this PR does not seem to match though - @clarapueyoballarin please add the used email to your github handle to link the final commit to your account.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on a change in pull request #1254: [WIP] [AMLS] shortest path

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on a change in pull request #1254:
URL: https://github.com/apache/systemds/pull/1254#discussion_r628931391



##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski

Review comment:
       Please use a similar reference style like the slicefinder() builtin function.

##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski
+#
+#
+#
+#----------------------------------
+# *******INPUT PARAMETERS*********
+#----------------------------------
+#
+#	- Adjacency matrix of the labeled graph 
+#	(also considered directed labeled graphs)
+# 	

Review comment:
       Also you might want to specify if the graph is a 0/1 representation or holds the vertex distances.

##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski
+#
+#
+#
+#----------------------------------
+# *******INPUT PARAMETERS*********
+#----------------------------------
+#
+#	- Adjacency matrix of the labeled graph 
+#	(also considered directed labeled graphs)
+# 	

Review comment:
       Could you please reformat that to a compact input/output documentation like the other scripts?

##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski
+#
+#
+#
+#----------------------------------
+# *******INPUT PARAMETERS*********
+#----------------------------------
+#
+#	- Adjacency matrix of the labeled graph 
+#	(also considered directed labeled graphs)
+# 	
+#----------------------------------
+# *******OUTPUT PARAMETERS*********
+#----------------------------------
+#	- Matrix of minimum distances (shortest-path) between vertices
+#	(the value of the ith row and the jth column of the output matrix is 
+#	the minimum distance shortest-path from vertex i to vertex j)
+#	- When the value of the minimum distance is -1, the two nodes are not
+#	connected.
+#
+#----------------------------------
+# *******INTERMEDIATE ELEMENTS*********
+#----------------------------------
+#
+#	- neighboursList: matrix (double) of nx3 dimensions, when the minimum 
+#	distance from the source node to a given node is updated,the following
+#	information will be contained in this matrix. Each row represents one 
+#	different update.
+#		- First column: predecesor node which information about the minimum 
+#		distance from the source node has been updated.
+#		- Second column: neighbour that will receive the information in the 
+#		current superstep.
+#		- Third column: updated minimum distance from the source node to the
+#		 node in the first column.
+#	- neighboursListTmp: matrix (double) of nx3 dimensions, during each superstep,
+#	the information about the new updates (not needed until the following superstep)
+#	will be saved in this matrix.
+#
+#
+#
+
+
+s_shortestPath = function(Matrix[Double] G) 

Review comment:
       should take the graph, vertex distances, and vertex id  of the single source id for comparing the shortest paths to all other nodes.

##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski
+#
+#
+#
+#----------------------------------
+# *******INPUT PARAMETERS*********
+#----------------------------------
+#
+#	- Adjacency matrix of the labeled graph 
+#	(also considered directed labeled graphs)
+# 	
+#----------------------------------
+# *******OUTPUT PARAMETERS*********
+#----------------------------------
+#	- Matrix of minimum distances (shortest-path) between vertices
+#	(the value of the ith row and the jth column of the output matrix is 
+#	the minimum distance shortest-path from vertex i to vertex j)
+#	- When the value of the minimum distance is -1, the two nodes are not
+#	connected.
+#
+#----------------------------------
+# *******INTERMEDIATE ELEMENTS*********
+#----------------------------------
+#
+#	- neighboursList: matrix (double) of nx3 dimensions, when the minimum 
+#	distance from the source node to a given node is updated,the following
+#	information will be contained in this matrix. Each row represents one 
+#	different update.
+#		- First column: predecesor node which information about the minimum 
+#		distance from the source node has been updated.
+#		- Second column: neighbour that will receive the information in the 
+#		current superstep.
+#		- Third column: updated minimum distance from the source node to the
+#		 node in the first column.
+#	- neighboursListTmp: matrix (double) of nx3 dimensions, during each superstep,
+#	the information about the new updates (not needed until the following superstep)
+#	will be saved in this matrix.
+#
+#
+#
+
+
+s_shortestPath = function(Matrix[Double] G) 
+  return (Matrix[Double] C) 
+{
+	
+	print("SHORTEST PATH CALCULATION");
+
+	if(min(G) < 0){
+		stop("All values in G must be positive")
+	}
+	
+	if(ncol(G) != nrow(G)){
+		stop("Not correct matrix dimensions")
+	}
+	
+	matrixSize = nrow(G)
+	infValue = sum(rowSums(G)) + 1 # value representing infinity, i.e. the nodes are not connected
+	
+	# initialize the matrix of minimum distances with "infinity" values:
+	minDistMatrix = matrix(infValue,rows=matrixSize,cols=matrixSize)
+	
+	for (sourceNode in 1:matrixSize){ 
+		
+		# update minimum distance from the sourceNode to itself to 0:
+		minDistMatrix[sourceNode,sourceNode] = 0
+		
+		# initialize the matrix neighboursList and neighboursListTmp:
+		neighboursList  = matrix(0,matrixSize,3)
+		neighboursListTmp = matrix(0,matrixSize*matrixSize,3)
+		
+		# find the neighbours of the sourceNode and fill in the neighboursList:
+		nodeIdx = 1
+		for(ineighbour in 1:matrixSize){
+			if (as.integer(as.scalar(G[sourceNode,ineighbour]))>0 & (ineighbour!=sourceNode)){ 
+				neighboursList[nodeIdx,1] = sourceNode
+				neighboursList[nodeIdx,2] = ineighbour
+				nodeIdx = nodeIdx+1
+			}
+		}
+		
+		
+		while( as.integer(as.scalar(neighboursList[1,1])) > 0 ){ # loop of supersteps (see documentation)
+			rawIdx = 1
+			nodeIdx=1		
+			
+			while( as.integer(as.scalar(neighboursList[rawIdx,1])) > 0 ){ # loop over the raws of neighboursList, i.e. updates of the previous superstep
+				
+				# define the currentNode, previousNode and cumulativeMinimumDistance from the corresponding raw of neighboursList:
+				currentNode = as.integer(as.scalar(neighboursList[rawIdx,2])) 
+				previousNode = as.integer(as.scalar(neighboursList[rawIdx,1])) 
+				cumulativeMinimumDistance = as.integer(as.scalar(neighboursList[rawIdx,3]))
+				
+				# consider if the minimum distance from the source node to the current node can be updated:
+				potentialMinimumDistance = cumulativeMinimumDistance + as.integer(as.scalar(G[previousNode,currentNode]))
+				if (potentialMinimumDistance < as.integer(as.scalar(minDistMatrix[sourceNode,currentNode]))){
+					minDistMatrix[sourceNode,currentNode] = potentialMinimumDistance
+	
+					# find the neighbours of currentNode to send them the updated information:
+					for(ineighbour in 1:matrixSize){
+						if (as.integer(as.scalar(G[currentNode,ineighbour]))>0 & (ineighbour!=sourceNode)){ 
+							neighboursListTmp[nodeIdx,1] = currentNode
+							neighboursListTmp[nodeIdx,2] = ineighbour
+							neighboursListTmp[nodeIdx,3] = potentialMinimumDistance
+							nodeIdx = nodeIdx+1
+						}
+					}
+				}
+				rawIdx=rawIdx+1
+			}
+			
+			# update neighboursList and neighboursListTmp for the next superstep:
+			neighboursList = neighboursListTmp
+			neighboursListTmp = matrix(0,matrixSize*matrixSize,3)
+		}	
+	}
+	
+	# define the distance between the not connected nodes as -1:
+	
+	for (irow in 1:matrixSize){

Review comment:
       can be dropped - simply initialize the vector of min distances as inf and if a node is not reachable this entry will never be updated.

##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski
+#
+#
+#
+#----------------------------------
+# *******INPUT PARAMETERS*********
+#----------------------------------
+#
+#	- Adjacency matrix of the labeled graph 
+#	(also considered directed labeled graphs)
+# 	
+#----------------------------------
+# *******OUTPUT PARAMETERS*********
+#----------------------------------
+#	- Matrix of minimum distances (shortest-path) between vertices
+#	(the value of the ith row and the jth column of the output matrix is 
+#	the minimum distance shortest-path from vertex i to vertex j)
+#	- When the value of the minimum distance is -1, the two nodes are not
+#	connected.
+#
+#----------------------------------
+# *******INTERMEDIATE ELEMENTS*********
+#----------------------------------
+#
+#	- neighboursList: matrix (double) of nx3 dimensions, when the minimum 
+#	distance from the source node to a given node is updated,the following
+#	information will be contained in this matrix. Each row represents one 
+#	different update.
+#		- First column: predecesor node which information about the minimum 
+#		distance from the source node has been updated.
+#		- Second column: neighbour that will receive the information in the 
+#		current superstep.
+#		- Third column: updated minimum distance from the source node to the
+#		 node in the first column.
+#	- neighboursListTmp: matrix (double) of nx3 dimensions, during each superstep,
+#	the information about the new updates (not needed until the following superstep)
+#	will be saved in this matrix.
+#
+#
+#
+
+
+s_shortestPath = function(Matrix[Double] G) 
+  return (Matrix[Double] C) 
+{
+	
+	print("SHORTEST PATH CALCULATION");

Review comment:
       Maybe add a verbose flag to guard such prints.

##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski
+#
+#
+#
+#----------------------------------
+# *******INPUT PARAMETERS*********
+#----------------------------------
+#
+#	- Adjacency matrix of the labeled graph 
+#	(also considered directed labeled graphs)
+# 	
+#----------------------------------
+# *******OUTPUT PARAMETERS*********
+#----------------------------------
+#	- Matrix of minimum distances (shortest-path) between vertices
+#	(the value of the ith row and the jth column of the output matrix is 
+#	the minimum distance shortest-path from vertex i to vertex j)
+#	- When the value of the minimum distance is -1, the two nodes are not
+#	connected.
+#
+#----------------------------------
+# *******INTERMEDIATE ELEMENTS*********
+#----------------------------------
+#
+#	- neighboursList: matrix (double) of nx3 dimensions, when the minimum 
+#	distance from the source node to a given node is updated,the following
+#	information will be contained in this matrix. Each row represents one 
+#	different update.
+#		- First column: predecesor node which information about the minimum 
+#		distance from the source node has been updated.
+#		- Second column: neighbour that will receive the information in the 
+#		current superstep.
+#		- Third column: updated minimum distance from the source node to the
+#		 node in the first column.
+#	- neighboursListTmp: matrix (double) of nx3 dimensions, during each superstep,
+#	the information about the new updates (not needed until the following superstep)
+#	will be saved in this matrix.
+#
+#
+#
+
+
+s_shortestPath = function(Matrix[Double] G) 
+  return (Matrix[Double] C) 
+{
+	
+	print("SHORTEST PATH CALCULATION");
+
+	if(min(G) < 0){
+		stop("All values in G must be positive")
+	}
+	
+	if(ncol(G) != nrow(G)){
+		stop("Not correct matrix dimensions")
+	}
+	
+	matrixSize = nrow(G)
+	infValue = sum(rowSums(G)) + 1 # value representing infinity, i.e. the nodes are not connected

Review comment:
       why not use a real Inf (available as constant)?

##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski
+#
+#
+#
+#----------------------------------
+# *******INPUT PARAMETERS*********
+#----------------------------------
+#
+#	- Adjacency matrix of the labeled graph 
+#	(also considered directed labeled graphs)
+# 	
+#----------------------------------
+# *******OUTPUT PARAMETERS*********
+#----------------------------------
+#	- Matrix of minimum distances (shortest-path) between vertices
+#	(the value of the ith row and the jth column of the output matrix is 
+#	the minimum distance shortest-path from vertex i to vertex j)
+#	- When the value of the minimum distance is -1, the two nodes are not
+#	connected.
+#
+#----------------------------------
+# *******INTERMEDIATE ELEMENTS*********
+#----------------------------------
+#
+#	- neighboursList: matrix (double) of nx3 dimensions, when the minimum 
+#	distance from the source node to a given node is updated,the following
+#	information will be contained in this matrix. Each row represents one 
+#	different update.
+#		- First column: predecesor node which information about the minimum 
+#		distance from the source node has been updated.
+#		- Second column: neighbour that will receive the information in the 
+#		current superstep.
+#		- Third column: updated minimum distance from the source node to the
+#		 node in the first column.
+#	- neighboursListTmp: matrix (double) of nx3 dimensions, during each superstep,
+#	the information about the new updates (not needed until the following superstep)
+#	will be saved in this matrix.
+#
+#
+#
+
+
+s_shortestPath = function(Matrix[Double] G) 
+  return (Matrix[Double] C) 
+{
+	
+	print("SHORTEST PATH CALCULATION");
+
+	if(min(G) < 0){
+		stop("All values in G must be positive")
+	}
+	
+	if(ncol(G) != nrow(G)){
+		stop("Not correct matrix dimensions")
+	}
+	
+	matrixSize = nrow(G)
+	infValue = sum(rowSums(G)) + 1 # value representing infinity, i.e. the nodes are not connected
+	
+	# initialize the matrix of minimum distances with "infinity" values:
+	minDistMatrix = matrix(infValue,rows=matrixSize,cols=matrixSize)
+	
+	for (sourceNode in 1:matrixSize){ 
+		
+		# update minimum distance from the sourceNode to itself to 0:
+		minDistMatrix[sourceNode,sourceNode] = 0
+		
+		# initialize the matrix neighboursList and neighboursListTmp:
+		neighboursList  = matrix(0,matrixSize,3)
+		neighboursListTmp = matrix(0,matrixSize*matrixSize,3)
+		
+		# find the neighbours of the sourceNode and fill in the neighboursList:
+		nodeIdx = 1
+		for(ineighbour in 1:matrixSize){

Review comment:
       You don't need to materialize these neighbors because the graph already holds this information.

##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski
+#
+#
+#
+#----------------------------------
+# *******INPUT PARAMETERS*********
+#----------------------------------
+#
+#	- Adjacency matrix of the labeled graph 
+#	(also considered directed labeled graphs)
+# 	
+#----------------------------------
+# *******OUTPUT PARAMETERS*********
+#----------------------------------
+#	- Matrix of minimum distances (shortest-path) between vertices
+#	(the value of the ith row and the jth column of the output matrix is 
+#	the minimum distance shortest-path from vertex i to vertex j)
+#	- When the value of the minimum distance is -1, the two nodes are not
+#	connected.
+#
+#----------------------------------
+# *******INTERMEDIATE ELEMENTS*********
+#----------------------------------
+#
+#	- neighboursList: matrix (double) of nx3 dimensions, when the minimum 
+#	distance from the source node to a given node is updated,the following
+#	information will be contained in this matrix. Each row represents one 
+#	different update.
+#		- First column: predecesor node which information about the minimum 
+#		distance from the source node has been updated.
+#		- Second column: neighbour that will receive the information in the 
+#		current superstep.
+#		- Third column: updated minimum distance from the source node to the
+#		 node in the first column.
+#	- neighboursListTmp: matrix (double) of nx3 dimensions, during each superstep,
+#	the information about the new updates (not needed until the following superstep)
+#	will be saved in this matrix.
+#
+#
+#
+
+
+s_shortestPath = function(Matrix[Double] G) 
+  return (Matrix[Double] C) 
+{
+	
+	print("SHORTEST PATH CALCULATION");
+
+	if(min(G) < 0){
+		stop("All values in G must be positive")
+	}
+	
+	if(ncol(G) != nrow(G)){
+		stop("Not correct matrix dimensions")
+	}
+	
+	matrixSize = nrow(G)
+	infValue = sum(rowSums(G)) + 1 # value representing infinity, i.e. the nodes are not connected
+	
+	# initialize the matrix of minimum distances with "infinity" values:
+	minDistMatrix = matrix(infValue,rows=matrixSize,cols=matrixSize)
+	
+	for (sourceNode in 1:matrixSize){ 

Review comment:
       The single source node should be given as a parameter.

##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski
+#
+#
+#
+#----------------------------------
+# *******INPUT PARAMETERS*********
+#----------------------------------
+#
+#	- Adjacency matrix of the labeled graph 
+#	(also considered directed labeled graphs)
+# 	
+#----------------------------------
+# *******OUTPUT PARAMETERS*********
+#----------------------------------
+#	- Matrix of minimum distances (shortest-path) between vertices
+#	(the value of the ith row and the jth column of the output matrix is 
+#	the minimum distance shortest-path from vertex i to vertex j)
+#	- When the value of the minimum distance is -1, the two nodes are not
+#	connected.
+#
+#----------------------------------
+# *******INTERMEDIATE ELEMENTS*********
+#----------------------------------
+#
+#	- neighboursList: matrix (double) of nx3 dimensions, when the minimum 
+#	distance from the source node to a given node is updated,the following
+#	information will be contained in this matrix. Each row represents one 
+#	different update.
+#		- First column: predecesor node which information about the minimum 
+#		distance from the source node has been updated.
+#		- Second column: neighbour that will receive the information in the 
+#		current superstep.
+#		- Third column: updated minimum distance from the source node to the
+#		 node in the first column.
+#	- neighboursListTmp: matrix (double) of nx3 dimensions, during each superstep,
+#	the information about the new updates (not needed until the following superstep)
+#	will be saved in this matrix.
+#
+#
+#
+
+
+s_shortestPath = function(Matrix[Double] G) 
+  return (Matrix[Double] C) 
+{
+	
+	print("SHORTEST PATH CALCULATION");
+
+	if(min(G) < 0){
+		stop("All values in G must be positive")
+	}
+	
+	if(ncol(G) != nrow(G)){
+		stop("Not correct matrix dimensions")
+	}
+	
+	matrixSize = nrow(G)
+	infValue = sum(rowSums(G)) + 1 # value representing infinity, i.e. the nodes are not connected
+	
+	# initialize the matrix of minimum distances with "infinity" values:
+	minDistMatrix = matrix(infValue,rows=matrixSize,cols=matrixSize)
+	
+	for (sourceNode in 1:matrixSize){ 
+		
+		# update minimum distance from the sourceNode to itself to 0:
+		minDistMatrix[sourceNode,sourceNode] = 0
+		
+		# initialize the matrix neighboursList and neighboursListTmp:
+		neighboursList  = matrix(0,matrixSize,3)
+		neighboursListTmp = matrix(0,matrixSize*matrixSize,3)
+		
+		# find the neighbours of the sourceNode and fill in the neighboursList:
+		nodeIdx = 1
+		for(ineighbour in 1:matrixSize){
+			if (as.integer(as.scalar(G[sourceNode,ineighbour]))>0 & (ineighbour!=sourceNode)){ 
+				neighboursList[nodeIdx,1] = sourceNode
+				neighboursList[nodeIdx,2] = ineighbour
+				nodeIdx = nodeIdx+1
+			}
+		}
+		
+		
+		while( as.integer(as.scalar(neighboursList[1,1])) > 0 ){ # loop of supersteps (see documentation)
+			rawIdx = 1
+			nodeIdx=1		
+			
+			while( as.integer(as.scalar(neighboursList[rawIdx,1])) > 0 ){ # loop over the raws of neighboursList, i.e. updates of the previous superstep
+				
+				# define the currentNode, previousNode and cumulativeMinimumDistance from the corresponding raw of neighboursList:
+				currentNode = as.integer(as.scalar(neighboursList[rawIdx,2])) 
+				previousNode = as.integer(as.scalar(neighboursList[rawIdx,1])) 
+				cumulativeMinimumDistance = as.integer(as.scalar(neighboursList[rawIdx,3]))
+				
+				# consider if the minimum distance from the source node to the current node can be updated:
+				potentialMinimumDistance = cumulativeMinimumDistance + as.integer(as.scalar(G[previousNode,currentNode]))
+				if (potentialMinimumDistance < as.integer(as.scalar(minDistMatrix[sourceNode,currentNode]))){
+					minDistMatrix[sourceNode,currentNode] = potentialMinimumDistance
+	
+					# find the neighbours of currentNode to send them the updated information:
+					for(ineighbour in 1:matrixSize){
+						if (as.integer(as.scalar(G[currentNode,ineighbour]))>0 & (ineighbour!=sourceNode)){ 
+							neighboursListTmp[nodeIdx,1] = currentNode
+							neighboursListTmp[nodeIdx,2] = ineighbour
+							neighboursListTmp[nodeIdx,3] = potentialMinimumDistance
+							nodeIdx = nodeIdx+1
+						}
+					}
+				}
+				rawIdx=rawIdx+1
+			}
+			
+			# update neighboursList and neighboursListTmp for the next superstep:
+			neighboursList = neighboursListTmp
+			neighboursListTmp = matrix(0,matrixSize*matrixSize,3)
+		}	
+	}
+	
+	# define the distance between the not connected nodes as -1:
+	
+	for (irow in 1:matrixSize){
+		for (icol in 1:matrixSize){
+			if (as.integer(as.scalar(minDistMatrix[irow,icol]))==infValue){
+				minDistMatrix[irow,icol] = -1
+			}
+		}
+	}
+	
+	C=minDistMatrix
+	print("SHORTEST PATH CALCULATION FINISHED, CHECK OUTPUT MATRIX OF MINIMUM DISTANCES");

Review comment:
       yes, we should add the respective unit test - this can be copied from components() too and encode a small graph with node distances and expected shortest paths. In general, always try to start with the test, which also then allows you consider and design the appropriate function API (arguments)

##########
File path: scripts/builtin/shortestPath.dml
##########
@@ -0,0 +1,157 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+#
+# Computes the minimum distances (shortest-path) between a single 
+# source vertex and every other vertex in the graph.
+# 
+# Documentation; "Pregel: A System for Large-Scale Graph Processing"
+# Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bilk, 
+# James C. Dehnert, Ikkan Horn, Naty Leiser and Grzegorz Czajkowski
+#
+#
+#
+#----------------------------------
+# *******INPUT PARAMETERS*********
+#----------------------------------
+#
+#	- Adjacency matrix of the labeled graph 
+#	(also considered directed labeled graphs)
+# 	
+#----------------------------------
+# *******OUTPUT PARAMETERS*********
+#----------------------------------
+#	- Matrix of minimum distances (shortest-path) between vertices
+#	(the value of the ith row and the jth column of the output matrix is 
+#	the minimum distance shortest-path from vertex i to vertex j)
+#	- When the value of the minimum distance is -1, the two nodes are not
+#	connected.
+#
+#----------------------------------
+# *******INTERMEDIATE ELEMENTS*********
+#----------------------------------
+#
+#	- neighboursList: matrix (double) of nx3 dimensions, when the minimum 
+#	distance from the source node to a given node is updated,the following
+#	information will be contained in this matrix. Each row represents one 
+#	different update.
+#		- First column: predecesor node which information about the minimum 
+#		distance from the source node has been updated.
+#		- Second column: neighbour that will receive the information in the 
+#		current superstep.
+#		- Third column: updated minimum distance from the source node to the
+#		 node in the first column.
+#	- neighboursListTmp: matrix (double) of nx3 dimensions, during each superstep,
+#	the information about the new updates (not needed until the following superstep)
+#	will be saved in this matrix.
+#
+#
+#
+
+
+s_shortestPath = function(Matrix[Double] G) 
+  return (Matrix[Double] C) 
+{
+	
+	print("SHORTEST PATH CALCULATION");
+
+	if(min(G) < 0){
+		stop("All values in G must be positive")
+	}
+	
+	if(ncol(G) != nrow(G)){
+		stop("Not correct matrix dimensions")
+	}
+	
+	matrixSize = nrow(G)
+	infValue = sum(rowSums(G)) + 1 # value representing infinity, i.e. the nodes are not connected
+	
+	# initialize the matrix of minimum distances with "infinity" values:
+	minDistMatrix = matrix(infValue,rows=matrixSize,cols=matrixSize)
+	
+	for (sourceNode in 1:matrixSize){ 
+		
+		# update minimum distance from the sourceNode to itself to 0:
+		minDistMatrix[sourceNode,sourceNode] = 0
+		
+		# initialize the matrix neighboursList and neighboursListTmp:
+		neighboursList  = matrix(0,matrixSize,3)
+		neighboursListTmp = matrix(0,matrixSize*matrixSize,3)
+		
+		# find the neighbours of the sourceNode and fill in the neighboursList:
+		nodeIdx = 1
+		for(ineighbour in 1:matrixSize){
+			if (as.integer(as.scalar(G[sourceNode,ineighbour]))>0 & (ineighbour!=sourceNode)){ 
+				neighboursList[nodeIdx,1] = sourceNode
+				neighboursList[nodeIdx,2] = ineighbour
+				nodeIdx = nodeIdx+1
+			}
+		}
+		
+		
+		while( as.integer(as.scalar(neighboursList[1,1])) > 0 ){ # loop of supersteps (see documentation)

Review comment:
       try to replace this complex loop with the loop from components() and modify this algorithm accordingly. Instead of propagating the max id in connected components we simply can take the current min distance of neighbors plus the distance of each node. This way the entire algorithms because a small few line vectorized loop.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] Baunsgaard commented on pull request #1254: [WIP] [AMLS] shortest path

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on pull request #1254:
URL: https://github.com/apache/systemds/pull/1254#issuecomment-860617026


   Hi, @clarapueyoballarin 
   
   I can see the tests are working, do you think this is ready for review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] clarapueyoballarin commented on pull request #1254: [WIP] [AMLS] shortest path

Posted by GitBox <gi...@apache.org>.
clarapueyoballarin commented on pull request #1254:
URL: https://github.com/apache/systemds/pull/1254#issuecomment-861783708


   > Hi, @clarapueyoballarin
   > 
   > I can see the tests are working, do you think this is ready for review?
   
   @Baunsgaard, Hello, yes, it is ready for review, but I am still open to suggestions that you may have. Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org