You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by orhankislal <gi...@git.apache.org> on 2017/02/23 18:53:50 UTC

[GitHub] incubator-madlib pull request #105: Graph:

GitHub user orhankislal opened a pull request:

    https://github.com/apache/incubator-madlib/pull/105

    Graph:

    - Create generic graph validation and help message to standardize
    future graph algorithm development.
    - Expand the design document with more detail on the graph
    representation as well as the SSSP implementation.
    
    Closes #105

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/orhankislal/incubator-madlib graph/fw_take1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #105
    
----
commit 90ce8f159cd210ec87822710662ac6c3d0244946
Author: Orhan Kislal <ok...@pivotal.io>
Date:   2017-02-21T18:26:45Z

    Graph:
    
    - Create generic graph validation and help message to standardize
    future graph algorithm development.
    - Expand the design document with more detail on the graph
    representation as well as the SSSP implementation.
    
    Closes #105

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #105: Graph:

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-madlib/pull/105


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib issue #105: Graph:

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on the issue:

    https://github.com/apache/incubator-madlib/pull/105
  
    Thanks for the comments @iyerr3. I tried to reorganize the algorithm explanation a bit, please let me know what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #105: Graph:

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/105#discussion_r104521502
  
    --- Diff: src/ports/postgres/modules/graph/graph_utils.py_in ---
    @@ -0,0 +1,102 @@
    +# coding=utf-8
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +
    +# Graph Methods
    +
    +# Please refer to the graph.sql_in file for the documentation
    +
    +"""
    +@file graph.py_in
    +
    +@namespace graph
    +"""
    +
    +import plpy
    +from utilities.control import MinWarning
    +from utilities.utilities import _assert
    +from utilities.utilities import extract_keyvalue_params
    +from utilities.utilities import unique_string
    +from utilities.validate_args import get_cols
    +from utilities.validate_args import unquote_ident
    +from utilities.validate_args import table_exists
    +from utilities.validate_args import columns_exist_in_table
    +from utilities.validate_args import table_is_empty
    +
    +
    +def validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
    +	out_table, **kwargs):
    +	"""
    +	Validates graph tables (vertex and edge) as well as the output table.
    +	"""
    +	_assert(out_table and out_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid output table name!")
    +	_assert(not table_exists(out_table),
    +		"Graph SSSP: Output table already exists!")
    +
    +	_assert(vertex_table and vertex_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid vertex table name!")
    +	_assert(table_exists(vertex_table),
    +		"Graph SSSP: Vertex table ({0}) is missing!".format(vertex_table))
    +	_assert(not table_is_empty(vertex_table),
    +		"Graph SSSP: Vertex table ({0}) is empty!".format(vertex_table))
    +
    +	_assert(edge_table and edge_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid edge table name!")
    +	_assert(table_exists(edge_table),
    +		"Graph SSSP: Edge table ({0}) is missing!".format(edge_table))
    +	_assert(not table_is_empty(edge_table),
    +		"Graph SSSP: Edge table ({0}) is empty!".format(edge_table))
    +
    +	existing_cols = set(unquote_ident(i) for i in get_cols(vertex_table))
    +	_assert(vertex_id in existing_cols,
    +		"""Graph SSSP: The vertex column {vertex_id} is not present in vertex
    +		table ({vertex_table}) """.format(**locals()))
    +	_assert(columns_exist_in_table(edge_table, edge_params.values()),
    +		"Graph SSSP: Not all columns from {0} present in edge table ({1})".
    +		format(edge_params.values(), edge_table))
    +
    +	return None
    +
    +def get_graph_usage(schema_madlib, func_name, other_text):
    +
    +	usage = """
    +----------------------------------------------------------------------------
    +                            USAGE
    +----------------------------------------------------------------------------
    + SELECT {schema_madlib}.{func_name}(
    +    vertex_table  TEXT, -- Name of the table that contains the vertex data.
    +    vertex_id     TEXT, -- Name of the column containing the vertex ids.
    +    edge_table    TEXT, -- Name of the table that contains the edge data.
    +    edge_args     TEXT, -- A comma-delimited string containing multiple
    +    			-- named arguments of the form "name=value".
    +    {other_text}
    +    out_table     TEXT  -- Name of the table to store the result of SSSP.
    +);
    --- End diff --
    
    I was trying to avoid changing the SSSP notation but I guess it is inevitable. Do you think separating `other_text` into two (`mandatory_params` and `optional_params`) could work for future graph algorithms?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #105: Graph:

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/105#discussion_r103325238
  
    --- Diff: doc/design/modules/graph.tex ---
    @@ -33,9 +34,53 @@
     
     This module implements various graph algorithms that are used in a number of applications such as social networks, telecommunications and road networks.
     
    -% \section{Graph Representation} \label{sec:graph:rep}
    +\section{Graph Framework} \label{sec:graph:fw}
    +
    +MADlib graph representation depends on two structures, a \emph{vertex} table and an \emph{edge} table. The vertex table has to have a column of vertex ids. The edge table has to have 2 columns: source vertex id, destination vertex id. For most algorithms an edge weight column is required as well. The representation assumes a directed graph, an edge from $x$ to $y$ does \emph{not} guarantee the existence of an edge from $y$ to $x$. Both of the tables may have additional columns as required. Multi-edges (multiple edges from a vertex to the same destination) and loops (edge from a vertex to itself) are allowed. For ideal performance, vertex and edge tables should be distributed on vertex id and source id respectively. This representation does not impose any ordering of vertices or edges. An example graph is given in Figure~\ref{sssp:example} and its representative tables are given in Table~\ref{sssp:rep}.
    --- End diff --
    
    Using an example considerably helps in understanding the algorithms. 
    
    1. Please reformat to width of 80 characters. 
    2. The note about performance is alogrithm-specific and would not necessarily generalize for all graphs. I suggest not talking about it here. Maybe make it a note within each algorithm description. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #105: Graph:

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/105#discussion_r103327658
  
    --- Diff: doc/design/modules/graph.tex ---
    @@ -86,3 +131,49 @@ \section{Single Source Shortest Path} \label{sec:graph:sssp}
     
     This is not a 1-to-1 pseudocode for the implementation since we don't compare the `toupdate` table records one by one but calculate the overall minimum. In addition, the comparison with `cur` values take place earlier to reduce the number of tuples in the `toupdate` table.
     
    +\subsection{Implementation Details}
    +
    +In this section, we discuss the MADlib implementation of the SSSP algorithm in depth.
    +
    +\begin{algorithm}[SSSP$(V,E,start)$] \label{alg:sssp:high}
    +\begin{algorithmic}[1]
    +	\Repeat
    +		\State Find Updates
    +		\State Apply updates to the output table
    +	\Until {There are no updates}
    +\end{algorithmic}
    +\end{algorithm}
    +
    +The implementation consists of two SQL blocks that are called sequentially inside a loop. We will follow the example graph at Figure~\ref{sssp:example} with the starting point as $v_0$. The very first update on the output table is the source vertex. Its weight is $0$ and its parent is itself ($v_0$). After this initialization step, the loop starts with Find Updates (the individual updates will be represented with <dest,value,parent> format). Looking at the example, it is clear that the updates should be <1,1,0>, <2,1,0> and <4,10,0>. We will assume this iteration is already completed and look how the next iteration of the algorithm works to explain the implementation details.
    +
    +\begin{algorithm}[Find Updates$(E,old\_update,out\_table)$]
    +\label{alg:sssp:findu}
    +\begin{lstlisting}
    +INSERT INTO new_update
    +	SELECT DISTINCT ON (y.id) y.id AS id,
    +		y.val AS val,
    +		y.parent AS parent
    +	FROM out_table INNER JOIN (
    +			SELECT edge_table.dest AS id, x.val AS val, old_update.id AS parent
    +			FROM old_update
    +				INNER JOIN edge_table
    +				ON (edge_table.src = old_update.id)
    +				INNER JOIN (
    +					SELECT edge_table.dest AS id,
    +						min(old_update.val + edge_table.weight) AS val
    +					FROM old_update INNER JOIN
    +						edge_table AS edge_table ON
    +						(edge_table.src=old_update.id)
    +					GROUP BY edge_table.dest
    +				) x
    +				ON (edge_table.dest = x.id)
    +			WHERE ABS(old_update.val + edge_table.weight - x.val) < EPSILON
    +		) AS y ON (y.id = out_table.vertex_id)
    +	WHERE y.val<out_table.weight
    +\end{lstlisting}
    +\end{algorithm}
    +
    +We begin our analysis of Find Updates function from its innermost subquery. This subquery (lines 11-16) takes a set of vertices (in the table $old\_update$) and finds the reachable vertices. In case a vertex is reachable by multiple vertices, only the path that has the minimum cost is considered. This means the input vertices need the value of their path as well. In our example, both $v_1$ and $v_2$ can reach $v_3$. In this case, we would have to use $v_2$ -> $v_3$ edge since that gives the lowest possible path value. Please note that we are aggregating the rows using the $min$ operator for each destination vertex and we are unable to return the source vertex at the same time. This means we know the value of $v_3$ should be $2$ but we cannot know its parent ($v_2$) at the same time. To solve this limitation, we combine the result with $edge$ and $old\_update$ tables (lines 7-10) and get the rows that has the same minimum value. At this point, we would have to tackle the problem 
 of tie-breaking. Vertex $v_5$ has two paths leading into: <5,2,1> and <5,2,2>. The inner subquery will return <5,2> and it will match both of these edges. However, it is redundant to keep both of them in the update list as that would require updating the same vertex multiple times in a given iteration. By using the $DISTINCT$ clause at line 2, we allow the underlying system to accept only a single one of them. Finally, we want to make sure these updates are actually leading us to shortest paths. Line 21 ensures that the values stored in the $out\_table$ does not increase and the solution does not regress throughout the iterations.
    --- End diff --
    
    Couple of suggestions on simplifying: 
    1. Use meaningful names for the query aliases and refer to those in the text explanation. 
    2. Make the explanation a list that gives (shorter) explanation of each set of lines. 
    
    Again, hard-wrapping to 80 characters would help. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #105: Graph:

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/105#discussion_r104510023
  
    --- Diff: src/ports/postgres/modules/graph/graph_utils.py_in ---
    @@ -0,0 +1,102 @@
    +# coding=utf-8
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +
    +# Graph Methods
    +
    +# Please refer to the graph.sql_in file for the documentation
    +
    +"""
    +@file graph.py_in
    +
    +@namespace graph
    +"""
    +
    +import plpy
    +from utilities.control import MinWarning
    +from utilities.utilities import _assert
    +from utilities.utilities import extract_keyvalue_params
    +from utilities.utilities import unique_string
    +from utilities.validate_args import get_cols
    +from utilities.validate_args import unquote_ident
    +from utilities.validate_args import table_exists
    +from utilities.validate_args import columns_exist_in_table
    +from utilities.validate_args import table_is_empty
    +
    +
    +def validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
    +	out_table, **kwargs):
    +	"""
    +	Validates graph tables (vertex and edge) as well as the output table.
    +	"""
    +	_assert(out_table and out_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid output table name!")
    +	_assert(not table_exists(out_table),
    +		"Graph SSSP: Output table already exists!")
    +
    +	_assert(vertex_table and vertex_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid vertex table name!")
    +	_assert(table_exists(vertex_table),
    +		"Graph SSSP: Vertex table ({0}) is missing!".format(vertex_table))
    +	_assert(not table_is_empty(vertex_table),
    +		"Graph SSSP: Vertex table ({0}) is empty!".format(vertex_table))
    +
    +	_assert(edge_table and edge_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid edge table name!")
    +	_assert(table_exists(edge_table),
    +		"Graph SSSP: Edge table ({0}) is missing!".format(edge_table))
    +	_assert(not table_is_empty(edge_table),
    +		"Graph SSSP: Edge table ({0}) is empty!".format(edge_table))
    +
    +	existing_cols = set(unquote_ident(i) for i in get_cols(vertex_table))
    +	_assert(vertex_id in existing_cols,
    +		"""Graph SSSP: The vertex column {vertex_id} is not present in vertex
    +		table ({vertex_table}) """.format(**locals()))
    +	_assert(columns_exist_in_table(edge_table, edge_params.values()),
    +		"Graph SSSP: Not all columns from {0} present in edge table ({1})".
    +		format(edge_params.values(), edge_table))
    +
    +	return None
    +
    +def get_graph_usage(schema_madlib, func_name, other_text):
    +
    +	usage = """
    +----------------------------------------------------------------------------
    +                            USAGE
    +----------------------------------------------------------------------------
    + SELECT {schema_madlib}.{func_name}(
    +    vertex_table  TEXT, -- Name of the table that contains the vertex data.
    +    vertex_id     TEXT, -- Name of the column containing the vertex ids.
    +    edge_table    TEXT, -- Name of the table that contains the edge data.
    +    edge_args     TEXT, -- A comma-delimited string containing multiple
    +    			-- named arguments of the form "name=value".
    +    {other_text}
    +    out_table     TEXT  -- Name of the table to store the result of SSSP.
    +);
    --- End diff --
    
    Having a mandatory param such as `output_table` after `{other_text}` might not work well if we have optional params. It seems fine in sssp since other_text is essentially the starting vertex id (a mandatory param), but that might not be true for other modules. I suggest we move `other_text` to after `out_table`.
    
    An example of a graph-based algorithm that uses optional params is PageRank. We will have optional params such as `max_iter` and `threshold` that will be listed after `out_table`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #105: Graph:

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/105#discussion_r104527390
  
    --- Diff: src/ports/postgres/modules/graph/graph_utils.py_in ---
    @@ -0,0 +1,102 @@
    +# coding=utf-8
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +
    +# Graph Methods
    +
    +# Please refer to the graph.sql_in file for the documentation
    +
    +"""
    +@file graph.py_in
    +
    +@namespace graph
    +"""
    +
    +import plpy
    +from utilities.control import MinWarning
    +from utilities.utilities import _assert
    +from utilities.utilities import extract_keyvalue_params
    +from utilities.utilities import unique_string
    +from utilities.validate_args import get_cols
    +from utilities.validate_args import unquote_ident
    +from utilities.validate_args import table_exists
    +from utilities.validate_args import columns_exist_in_table
    +from utilities.validate_args import table_is_empty
    +
    +
    +def validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
    +	out_table, **kwargs):
    +	"""
    +	Validates graph tables (vertex and edge) as well as the output table.
    +	"""
    +	_assert(out_table and out_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid output table name!")
    +	_assert(not table_exists(out_table),
    +		"Graph SSSP: Output table already exists!")
    +
    +	_assert(vertex_table and vertex_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid vertex table name!")
    +	_assert(table_exists(vertex_table),
    +		"Graph SSSP: Vertex table ({0}) is missing!".format(vertex_table))
    +	_assert(not table_is_empty(vertex_table),
    +		"Graph SSSP: Vertex table ({0}) is empty!".format(vertex_table))
    +
    +	_assert(edge_table and edge_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid edge table name!")
    +	_assert(table_exists(edge_table),
    +		"Graph SSSP: Edge table ({0}) is missing!".format(edge_table))
    +	_assert(not table_is_empty(edge_table),
    +		"Graph SSSP: Edge table ({0}) is empty!".format(edge_table))
    +
    +	existing_cols = set(unquote_ident(i) for i in get_cols(vertex_table))
    +	_assert(vertex_id in existing_cols,
    +		"""Graph SSSP: The vertex column {vertex_id} is not present in vertex
    +		table ({vertex_table}) """.format(**locals()))
    +	_assert(columns_exist_in_table(edge_table, edge_params.values()),
    +		"Graph SSSP: Not all columns from {0} present in edge table ({1})".
    +		format(edge_params.values(), edge_table))
    +
    +	return None
    +
    +def get_graph_usage(schema_madlib, func_name, other_text):
    +
    +	usage = """
    +----------------------------------------------------------------------------
    +                            USAGE
    +----------------------------------------------------------------------------
    + SELECT {schema_madlib}.{func_name}(
    +    vertex_table  TEXT, -- Name of the table that contains the vertex data.
    +    vertex_id     TEXT, -- Name of the column containing the vertex ids.
    +    edge_table    TEXT, -- Name of the table that contains the edge data.
    +    edge_args     TEXT, -- A comma-delimited string containing multiple
    +    			-- named arguments of the form "name=value".
    +    {other_text}
    +    out_table     TEXT  -- Name of the table to store the result of SSSP.
    +);
    --- End diff --
    
    I think we might want to move `out_table` to the other parameters as well. For some functions like graph diameter, we don't have to create an output table. That will allow the pagerank to place its optional parameters after the `out_table`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #105: Graph:

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/105#discussion_r104523519
  
    --- Diff: src/ports/postgres/modules/graph/graph_utils.py_in ---
    @@ -0,0 +1,102 @@
    +# coding=utf-8
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +
    +# Graph Methods
    +
    +# Please refer to the graph.sql_in file for the documentation
    +
    +"""
    +@file graph.py_in
    +
    +@namespace graph
    +"""
    +
    +import plpy
    +from utilities.control import MinWarning
    +from utilities.utilities import _assert
    +from utilities.utilities import extract_keyvalue_params
    +from utilities.utilities import unique_string
    +from utilities.validate_args import get_cols
    +from utilities.validate_args import unquote_ident
    +from utilities.validate_args import table_exists
    +from utilities.validate_args import columns_exist_in_table
    +from utilities.validate_args import table_is_empty
    +
    +
    +def validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
    +	out_table, **kwargs):
    +	"""
    +	Validates graph tables (vertex and edge) as well as the output table.
    +	"""
    +	_assert(out_table and out_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid output table name!")
    +	_assert(not table_exists(out_table),
    +		"Graph SSSP: Output table already exists!")
    +
    +	_assert(vertex_table and vertex_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid vertex table name!")
    +	_assert(table_exists(vertex_table),
    +		"Graph SSSP: Vertex table ({0}) is missing!".format(vertex_table))
    +	_assert(not table_is_empty(vertex_table),
    +		"Graph SSSP: Vertex table ({0}) is empty!".format(vertex_table))
    +
    +	_assert(edge_table and edge_table.strip().lower() not in ('null', ''),
    +		"Graph SSSP: Invalid edge table name!")
    +	_assert(table_exists(edge_table),
    +		"Graph SSSP: Edge table ({0}) is missing!".format(edge_table))
    +	_assert(not table_is_empty(edge_table),
    +		"Graph SSSP: Edge table ({0}) is empty!".format(edge_table))
    +
    +	existing_cols = set(unquote_ident(i) for i in get_cols(vertex_table))
    +	_assert(vertex_id in existing_cols,
    +		"""Graph SSSP: The vertex column {vertex_id} is not present in vertex
    +		table ({vertex_table}) """.format(**locals()))
    +	_assert(columns_exist_in_table(edge_table, edge_params.values()),
    +		"Graph SSSP: Not all columns from {0} present in edge table ({1})".
    +		format(edge_params.values(), edge_table))
    +
    +	return None
    +
    +def get_graph_usage(schema_madlib, func_name, other_text):
    +
    +	usage = """
    +----------------------------------------------------------------------------
    +                            USAGE
    +----------------------------------------------------------------------------
    + SELECT {schema_madlib}.{func_name}(
    +    vertex_table  TEXT, -- Name of the table that contains the vertex data.
    +    vertex_id     TEXT, -- Name of the column containing the vertex ids.
    +    edge_table    TEXT, -- Name of the table that contains the edge data.
    +    edge_args     TEXT, -- A comma-delimited string containing multiple
    +    			-- named arguments of the form "name=value".
    +    {other_text}
    +    out_table     TEXT  -- Name of the table to store the result of SSSP.
    +);
    --- End diff --
    
    Yes, that might work better. We can have `other_madatory_params` and `optional_params`, before and after `out_table` respectively. We may have to follow this rule for other graph modules: `out_table` must be our last mandatory param, to maintain some consistency. 
    
    But this might not be in line with our existing modules. For example, I checked elastic_net and the output table is one of the mandatory params specified early on. There are several algorithm specific mandatory params following the output table name.
    We should also put a comment in the code specifying the reason, else it will look confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---